The Bounded Task Principle: Why Constrained AI Agents Outperform Open-Ended Ones

Published: 1 month ago (March 8, 2026 at 05:55 PM EDT)

3 min read

Source: Dev.to

Source: Dev.to

Introduction

Claude discovered 22 vulnerabilities in Firefox over two weeks — including 14 high‑severity ones.
People often focus on model capability, but the real lesson is the bounded task.

The success came from a defined scope, defined output format, and clear success criteria.
The agent wasn’t asked to “improve Firefox security”; it was given specific parameters, a specific surface area, and a precise definition of what a finding looks like.

That principle applies to every AI agent you build.

Why Most Agents Fail

Most agents fail not because of model quality, but because the task specification is under‑defined.
When a task can mean several things, the agent picks one interpretation—often the wrong one—and optimizes for it, while the user wanted something else entirely.

Symptoms of a Vague Task Specification

Agent loops longer than expected
Output looks “reasonable” but misses the point
Different runs produce wildly different results
Agent asks clarifying questions at the wrong time (mid‑task, not before)

The Four Essential Fields

Add these four fields to every task definition—whether it’s in a prompt, a current-task.json, or a SOUL.md instruction:

{
  "scope": "What is in bounds. Be explicit about what is NOT in scope.",
  "output_format": "Exactly what the result should look like.",
  "done_when": "Observable condition that defines completion.",
  "success_criteria": "How a human will evaluate whether the output is correct."
}

Scope prevents the agent from doing too much. It answers: “What are the edges?”
Output format prevents interpretation drift. It answers: “What does ‘done’ look like structurally?”
Done when prevents infinite loops. It answers: “When should the agent stop?”
Success criteria closes the feedback loop. It answers: “How will we know it worked?”

Mapping the Anthropic/Firefox Audit

Field	Specification
Scope	Firefox source code — specific modules, not the whole web
Output format	Structured vulnerability reports with severity, description, reproduction steps
Done when	All in‑scope modules reviewed, or X hours elapsed
Success criteria	Human security engineer can reproduce and confirm each finding

Notice what isn’t there: “make Firefox more secure.” That’s an aspiration, not a task spec. The bounded version produced 22 actionable findings.

Benefits of a Well‑Constrained Agent

Spends less time reasoning about what to do
Produces more consistent outputs
Fails faster when something is wrong (instead of drifting)
Is dramatically cheaper to run

The agents that perform best in production aren’t the ones with the most tools; they’re the ones with the clearest task definitions.

Quick Checklist for Any Agent

Scope – Can you state the scope in one sentence? (If not, the scope is too vague)
Output format – Can you describe what the output looks like before the agent runs? (If not, the format isn’t defined)
Done condition – Can you tell—without ambiguity—when the agent should stop? (If not, you lack a done condition)
Success criteria – Would you know if the output was wrong? (If not, you lack success criteria)

If you can’t answer all four, tighten the task spec before adding any new tools or capabilities.

Resources

The battle‑tested configs for bounded‑task patterns—including current-task.json templates and SOUL.md escalation rules—are available in the Ask Patrick Library:

askpatrick.co/library

The Bounded Task Principle: Why Constrained AI Agents Outperform Open-Ended Ones

Introduction

Why Most Agents Fail

Symptoms of a Vague Task Specification

The Four Essential Fields

Mapping the Anthropic/Firefox Audit

Benefits of a Well‑Constrained Agent

Quick Checklist for Any Agent

Resources

Related posts

Context Window Pollution: The Silent Agent Degradation Problem

The SOUL.md Pattern: How to Give Your AI Agent a Behavioral Constitution

The Two-Agent Review Pattern: Why AI Agents Shouldn't Verify Their Own Output

The Cold Start Problem: How to Deploy AI Agents That Know What They're Doing From Day One