The AI didn't break our backend. It just stopped lying for us.

Published: (April 23, 2026 at 09:31 PM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Background

Our internal AI agent can complete in a single prompt what previously required ten minutes of clicking through the UI. Users describe an outbound job in natural language, the agent talks to our MCPs, and the job is built without anyone touching the UI. Jobs have a lifecycle:

  • Drafts are inert and editable.
  • Published jobs run.

The whole point of a draft is that nothing happens.

Bug Bash

We ran a bug‑bash with a hostile mindset: prompt injection, role hijacking, capability probing, “break it”. We wanted to find the edges and understand the limits better than anyone outside the team, so we could identify missing pieces and gauge the product’s true state.

After an hour of nothing, we flipped the question. Instead of trying to break it, we simply asked it to do its job:

create a job with these settings

No publish step—just create.

The job landed in draft. We kept poking, then a log line indicated a task had executed against it. Tasks aren’t supposed to run on drafts; that’s the whole point of the state. I thought it was noise, but then a delivery from that job appeared in my inbox—a real send to a real recipient. The “weird log” became “live in production”.

Investigation

A high‑priority ticket landed on me. My first instinct, like many when something strange happens near an LLM, was that the AI did something it shouldn’t. The UI version had been in customer hands for a long time without incidents. The only new variable was the model, so I hunted:

  • RAG misconfiguration
  • MCP boundaries
  • Prompt injection
  • Roles and permissions
  • Tool encapsulation

Everything appeared clean. Nothing the agent touched should have been able to reach the queue that fired deliveries.

After about an hour I ran out of hypotheses. I changed the question: what does the AI do differently from the UI? I diffed both payloads. One key difference emerged:

{
  "dispatch": { "primary": true, "secondary": true }
}

The UI never sent this on a draft. The agent did, because it was reading the schema honestly. Downstream, a priority queue watched that field, decided the job looked overdue, and fired it immediately—from a draft.

Root Cause

The queue handler lacked a guard for draft status. The fix was a single line:

// queue handler guard
if (status === 'draft') return;

This guard should have existed since day one. The bug had been dormant, waiting for any client other than our UI to call the create endpoint with a schema‑honest payload. The agent was the first to do so.

Takeaways

  • The agent didn’t introduce the bug; it was the first client that respected the schema instead of the UI’s unwritten rules.
  • Every API your frontend consumes is held together by conventions the backend doesn’t enforce.
  • State guards belong in the handler, not in the discipline of the caller.
  • Trusting the frontend to “keep quiet” isn’t a contract—it’s a bet.

How many other endpoints in your backend are still making that bet?

0 views
Back to Blog

Related posts

Read more »