Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)

Published: 4 days ago (January 15, 2026 at 02:40 PM EST)

2 min read

Source: Dev.to

Background

Most object detection systems assume a fixed label set: you train a model on COCO, Open Images, or a custom dataset, and you’re limited to the classes you trained for.

Prompt‑Based Object Detection

I’ve been exploring a different approach: prompt‑based object detection, where the inputs are

an image
a free‑form natural language prompt

and the output is a set of localized detections that match the prompt, even when the concept isn’t a single predefined object class.

The tool I built supports complex, compositional prompts, not just simple object names. These prompts can combine attributes, relations, text, and world knowledge—things that don’t map cleanly to standard detector classes.

What It’s Not Designed For

Very small objects
Obscure, barely visible objects
Dense real‑time detection out of the box

It performs better on concepts that require reasoning and world knowledge rather than pixel‑level precision on tiny targets.

Motivation

The main motivation so far has been creating training data for highly specific detectors. Instead of manually labeling or training a new detector for every niche concept, this approach can be used to:

Bootstrap datasets
Explore whether a concept is learnable
Validate prompts before committing to full training pipelines

Demo

I’ve made the tool publicly available as a demo:

Detect Anything – Free AI Object Detection Online

No login required.
Images are processed transiently and not stored.
(Please don’t abuse it; inference is relatively expensive.)

Open Questions

I’m especially interested in:

Good real‑world use cases people see for this
Stress‑testing and failure modes
Situations where this approach breaks down compared to task‑specific detectors

If you’ve worked with grounding, referring‑expression comprehension, or prompt‑based vision models, I’d love to hear your thoughts.

Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)

Background

Prompt‑Based Object Detection

What It’s Not Designed For

Motivation

Demo

Open Questions

Related posts

Rapg: TUI-based Secret Manager

Quick Data Recovery using Snapshots - Amazon FSx for NetApp ONTAP

Technology is an Enabler, not a Saviour

Industry Survey: Faster Coding, Slower Debugging