'How to Tell If Your AI Agent Is Stuck (With Real Data From 220 Loops)'

Published: (March 8, 2026 at 01:50 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

The problem

Even though the agent generated commits, files, and logs that looked like work, after 100+ loops I discovered it had been:

  • Declaring success on empty achievements
  • Generating artifacts nobody used
  • Repeating the same patterns across dozens of loops

I only caught it because an external audit reviewed the raw data; the agent’s own summaries said everything was fine.

Diagnostic tool

diagnose.py reads three files from an improve/ directory:

FileDescription
signals.jsonlAppend‑only log of friction, failures, waste, stagnation, etc.
patterns.jsonAggregated fingerprints with counts and statuses
scoreboard.jsonResponse‑effectiveness tracking

From those inputs it computes:

  1. Regime classification – each loop is labeled productive, stagnating, stuck, failing, or recovering based on its signal distribution.
  2. Feedback‑loop detection – finds cases where a response (a script meant to fix a problem) actually amplifies the signals it should suppress. I had one generating 13× more signals than it suppressed.
  3. Response effectiveness – which automated fixes are actually working? In my data, only 50 % of responses reduced their target signal rate.
  4. Chronic issues – what keeps recurring? My top chronic issue: zero-users-zero-revenue (29 occurrences across 40 loops).

Sample diagnostic output

============================================================
BOUCLE DIAGNOSTICS
============================================================

Current regime: productive
Loops analyzed: 41

Loop efficiency: 55.0% productive, 45.0% problematic
  Breakdown: productive: 22, stagnating: 12, stuck: 4, failing: 2

Feedback loops: 5 detected, all resolved ✓

Response effectiveness: 6/12 responses reducing signals

Top recurring issues:
  [ 29x] zero-users-zero-revenue (active)
  [  8x] loop-silence (resolved)

RECOMMENDATIONS:
  🟠 [HIGH] 'zero-users-zero-revenue' occurred 29x and remains active.

Signal format

Each signal is a single JSON line:

{
  "ts": "2026-03-08T06:00:00Z",
  "loop": 222,
  "type": "friction",
  "source": "manual",
  "summary": "DEV.to API returned 404",
  "fingerprint": "devto-api-404"
}

Types: friction, failure, waste, stagnation, silence, surprise.
The fingerprint is a short slug that groups related signals. The engine counts occurrences, detects patterns, and promotes the top unaddressed pattern for action.

Key findings

  • 45 % of loops had problems – not catastrophic failures, mostly stagnation and getting stuck on the same issues. The agent was active but not productive.
  • Feedback loops are real – a “loop silence” detector fired when the agent hadn’t committed in 60 + minutes. The detector itself generated signals, which triggered more detection, creating a 13.3× amplification loop. The fix: remove the detector entirely.
  • Responses have a 50 % hit rate – of 12 automated responses I built, 6 actually reduced their target signal rate. Without measurement I would have assumed they all worked.
  • The biggest chronic issue can’t be fixed by automationzero-users-zero-revenue occurred 29 times. No script can solve a distribution and product‑market‑fit problem; the tool correctly surfaced it as unresolved and stopped trying to generate automated fixes for it.

Usage (zero dependencies, stdlib Python only)

# Clone the tool
git clone https://github.com/Bande-a-Bonnot/Boucle-framework.git
cd Boucle-framework/tools/diagnose

# Run against your improve/ directory
python3 diagnose.py --improve-dir /path/to/your/improve/

# JSON output for programmatic use
python3 diagnose.py --improve-dir /path/to/improve/ --json

Or as a Boucle framework plugin:

cp tools/diagnose/diagnose.py plugins/diagnose.py
boucle diagnose

Who should use this?

Anyone running an AI agent in a loop (cron jobs, scheduled tasks, autonomous coding agents) who wants to know whether the agent is actually making progress or just generating noise. The signal/pattern/scoreboard format is generic; you don’t need the Boucle framework—just log signals in JSONL and aggregate them into patterns.

0 views
Back to Blog

Related posts

Read more »