Why Your AI Agent Keeps Overreaching — And How to Fix It with a Boundary Contract
Source: Dev.to
A design protocol born from DeFi infrastructure, now applied to AI systems You’ve built an AI agent. It works — sometimes brilliantly. But then it starts doing things you didn’t ask for. It makes assumptions and acts on them It fills in missing data instead of saying “I don’t know” It optimizes when you only asked it to observe It gives confident answers when it should refuse This isn’t a model problem. It’s an architecture problem. Your agent has no boundary contract. I built a DeFi risk observer for Aave v3 — a system that watches on-chain positions and reports liquidation risk in real time. The hardest design decision wasn’t the data model or the state machine. It was this question: When should the system refuse to output anything at all? In DeFi, a wrong answer isn’t just useless — it can cause real financial loss. So I designed a system that explicitly separates: What is verified (direct from protocol) What is derived (computed from verified data) What is estimated (approximate, labeled as such) What should be refused (uncertain, inconsistent, or unsafe to show) When I applied this same philosophy to an AI agent I was building for content automation — something completely unrelated to DeFi — the agent’s overreach dropped significantly. The principle transferred. The boundary contract worked. Refusal over Uncertainty. Boundary over Prediction. Observability over Automation. Most AI systems are designed to always produce output. Silence feels like failure. Uncertainty gets smoothed over. Gaps get filled with plausible-sounding content. The result: agents that confidently do the wrong thing. A boundary contract inverts this default. Every AI output can be classified into one of four trust layers: Directly observable. The system retrieved this from a reliable source and can confirm it. “The article was published on June 1, 2026.” Derived deterministically from verified data. The logic is transparent and repeatable. “Based on the publication date, this is within the 30-day window.” An approximation. Useful, but explicitly labeled as such. Not to be treated as fact. “The reading time is approximately 4 minutes.” The system cannot produce a trustworthy output. It says nothing rather than something wrong. Output withheld. Reason: source data inconsistent. Pair the trust layer with an observable state:
State Meaning
STABLE Operating within safe boundaries
WATCH Approaching a boundary — caution advised
BOUNDARY_APPROACHING Near-limit — intervention may be needed
DEGRADED Output possible but quality is reduced
REFUSAL Output withheld intentionally
These aren’t errors. REFUSAL is a feature, not a failure. Here’s a practical example. Suppose your agent summarizes recent news articles. Without a boundary contract: Missing article → agent invents plausible content Stale data → agent presents it as current Conflicting sources → agent picks one and ignores the other With a boundary contract: Missing article → REFUSED with reason: “Source unavailable”
Stale data → ESTIMATED with label: “Data may be outdated”
Conflicting sources → DEGRADED with label: “Sources inconsistent”
The agent becomes honest about what it knows and doesn’t know. Here’s a minimal implementation in a system prompt: You are an observer agent. Your role is to report state, not to act.
For every output, classify it as one of:
- VERIFIED: directly confirmed from source
- CONSISTENT: derived from verified data
- ESTIMATED: approximate — label it clearly
- REFUSED: do not output if data is missing, inconsistent, or unsafe
Rules:
- Never fill gaps with assumptions
- Never produce output when sources conflict
- Never optimize, advise, or act — only observe and report
- When in doubt, refuse
Refusal is correct behavior. Silence is safer than a confident wrong answer.
This single addition changed the behavior of my agents more than any other prompt engineering technique I’ve tried. The underlying principle is simple: The protocol restricts transitions, not states. An AI agent can end up in a bad state through external circumstances — bad data, ambiguous input, conflicting context. That’s unavoidable. What you can control is whether the agent acknowledges that state and handles it explicitly, or papers over it with confident-sounding output. The boundary contract makes the agent’s epistemic state legible — to you, and to downstream systems. I’ve formalized this into a document: Boundary Contract for AI Systems v0.1 It includes: The full trust layer specification (VERIFIED / CONSISTENT / ESTIMATED / REFUSED) The state model with transition rules System prompt templates for common agent patterns The Non-Advisory Integrity Clause (what your agent must never do) Refusal protocol with trigger conditions https://arcthree.gumroad.com/l/etb-boundary-contract
Final Thought
The most reliable AI systems I’ve seen have one thing in common: They know what they don’t know. Building that awareness in requires explicit design. It doesn’t happen by default. A boundary contract is how you make it intentional. Built on the UEH (Universal Exchange Adapters) design philosophy. Originally developed for DeFi risk observation infrastructure. GitHub: github.com/ueh-labs/ueh-observer