Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)

Published: (January 15, 2026 at 02:40 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

Background

Most object detection systems assume a fixed label set: you train a model on COCO, Open Images, or a custom dataset, and you’re limited to the classes you trained for.

Prompt‑Based Object Detection

I’ve been exploring a different approach: prompt‑based object detection, where the inputs are

  1. an image
  2. a free‑form natural language prompt

and the output is a set of localized detections that match the prompt, even when the concept isn’t a single predefined object class.

The tool I built supports complex, compositional prompts, not just simple object names. These prompts can combine attributes, relations, text, and world knowledge—things that don’t map cleanly to standard detector classes.

What It’s Not Designed For

  • Very small objects
  • Obscure, barely visible objects
  • Dense real‑time detection out of the box

It performs better on concepts that require reasoning and world knowledge rather than pixel‑level precision on tiny targets.

Motivation

The main motivation so far has been creating training data for highly specific detectors. Instead of manually labeling or training a new detector for every niche concept, this approach can be used to:

  • Bootstrap datasets
  • Explore whether a concept is learnable
  • Validate prompts before committing to full training pipelines

Demo

I’ve made the tool publicly available as a demo:

Detect Anything – Free AI Object Detection Online

  • No login required.
  • Images are processed transiently and not stored.
  • (Please don’t abuse it; inference is relatively expensive.)

Open Questions

I’m especially interested in:

  • Good real‑world use cases people see for this
  • Stress‑testing and failure modes
  • Situations where this approach breaks down compared to task‑specific detectors

If you’ve worked with grounding, referring‑expression comprehension, or prompt‑based vision models, I’d love to hear your thoughts.

Back to Blog

Related posts

Read more »

𝗗𝗲𝘀𝗶𝗴𝗻𝗲𝗱 𝗮 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻‑𝗥𝗲𝗮𝗱𝘆 𝗠𝘂𝗹𝘁𝗶‑𝗥𝗲𝗴𝗶𝗼𝗻 𝗔𝗪𝗦 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗘𝗞𝗦 | 𝗖𝗜/𝗖𝗗 | 𝗖𝗮𝗻𝗮𝗿𝘆 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁𝘀 | 𝗗𝗥 𝗙𝗮𝗶𝗹𝗼𝘃𝗲𝗿

!Architecture Diagramhttps://dev-to-uploads.s3.amazonaws.com/uploads/articles/p20jqk5gukphtqbsnftb.gif I designed a production‑grade multi‑region AWS architectu...