Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)

Published: (January 15, 2026 at 02:40 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

Background

Most object detection systems assume a fixed label set: you train a model on COCO, Open Images, or a custom dataset, and you’re limited to the classes you trained for.

Prompt‑Based Object Detection

I’ve been exploring a different approach: prompt‑based object detection, where the inputs are

  1. an image
  2. a free‑form natural language prompt

and the output is a set of localized detections that match the prompt, even when the concept isn’t a single predefined object class.

The tool I built supports complex, compositional prompts, not just simple object names. These prompts can combine attributes, relations, text, and world knowledge—things that don’t map cleanly to standard detector classes.

What It’s Not Designed For

  • Very small objects
  • Obscure, barely visible objects
  • Dense real‑time detection out of the box

It performs better on concepts that require reasoning and world knowledge rather than pixel‑level precision on tiny targets.

Motivation

The main motivation so far has been creating training data for highly specific detectors. Instead of manually labeling or training a new detector for every niche concept, this approach can be used to:

  • Bootstrap datasets
  • Explore whether a concept is learnable
  • Validate prompts before committing to full training pipelines

Demo

I’ve made the tool publicly available as a demo:

Detect Anything – Free AI Object Detection Online

  • No login required.
  • Images are processed transiently and not stored.
  • (Please don’t abuse it; inference is relatively expensive.)

Open Questions

I’m especially interested in:

  • Good real‑world use cases people see for this
  • Stress‑testing and failure modes
  • Situations where this approach breaks down compared to task‑specific detectors

If you’ve worked with grounding, referring‑expression comprehension, or prompt‑based vision models, I’d love to hear your thoughts.

Back to Blog

Related posts

Read more »

Rapg: TUI-based Secret Manager

We've all been there. You join a new project, and the first thing you hear is: > 'Check the pinned message in Slack for the .env file.' Or you have several .env...

Technology is an Enabler, not a Saviour

Why clarity of thinking matters more than the tools you use Technology is often treated as a magic switch—flip it on, and everything improves. New software, pl...