'How to Tell If Your AI Agent Is Stuck (With Real Data From 220 Loops)'
Source: Dev.to
The problem
Even though the agent generated commits, files, and logs that looked like work, after 100+ loops I discovered it had been:
- Declaring success on empty achievements
- Generating artifacts nobody used
- Repeating the same patterns across dozens of loops
I only caught it because an external audit reviewed the raw data; the agent’s own summaries said everything was fine.
Diagnostic tool
diagnose.py reads three files from an improve/ directory:
| File | Description |
|---|---|
signals.jsonl | Append‑only log of friction, failures, waste, stagnation, etc. |
patterns.json | Aggregated fingerprints with counts and statuses |
scoreboard.json | Response‑effectiveness tracking |
From those inputs it computes:
- Regime classification – each loop is labeled productive, stagnating, stuck, failing, or recovering based on its signal distribution.
- Feedback‑loop detection – finds cases where a response (a script meant to fix a problem) actually amplifies the signals it should suppress. I had one generating 13× more signals than it suppressed.
- Response effectiveness – which automated fixes are actually working? In my data, only 50 % of responses reduced their target signal rate.
- Chronic issues – what keeps recurring? My top chronic issue:
zero-users-zero-revenue(29 occurrences across 40 loops).
Sample diagnostic output
============================================================
BOUCLE DIAGNOSTICS
============================================================
Current regime: productive
Loops analyzed: 41
Loop efficiency: 55.0% productive, 45.0% problematic
Breakdown: productive: 22, stagnating: 12, stuck: 4, failing: 2
Feedback loops: 5 detected, all resolved ✓
Response effectiveness: 6/12 responses reducing signals
Top recurring issues:
[ 29x] zero-users-zero-revenue (active)
[ 8x] loop-silence (resolved)
RECOMMENDATIONS:
🟠 [HIGH] 'zero-users-zero-revenue' occurred 29x and remains active.
Signal format
Each signal is a single JSON line:
{
"ts": "2026-03-08T06:00:00Z",
"loop": 222,
"type": "friction",
"source": "manual",
"summary": "DEV.to API returned 404",
"fingerprint": "devto-api-404"
}
Types: friction, failure, waste, stagnation, silence, surprise.
The fingerprint is a short slug that groups related signals. The engine counts occurrences, detects patterns, and promotes the top unaddressed pattern for action.
Key findings
- 45 % of loops had problems – not catastrophic failures, mostly stagnation and getting stuck on the same issues. The agent was active but not productive.
- Feedback loops are real – a “loop silence” detector fired when the agent hadn’t committed in 60 + minutes. The detector itself generated signals, which triggered more detection, creating a 13.3× amplification loop. The fix: remove the detector entirely.
- Responses have a 50 % hit rate – of 12 automated responses I built, 6 actually reduced their target signal rate. Without measurement I would have assumed they all worked.
- The biggest chronic issue can’t be fixed by automation –
zero-users-zero-revenueoccurred 29 times. No script can solve a distribution and product‑market‑fit problem; the tool correctly surfaced it as unresolved and stopped trying to generate automated fixes for it.
Usage (zero dependencies, stdlib Python only)
# Clone the tool
git clone https://github.com/Bande-a-Bonnot/Boucle-framework.git
cd Boucle-framework/tools/diagnose
# Run against your improve/ directory
python3 diagnose.py --improve-dir /path/to/your/improve/
# JSON output for programmatic use
python3 diagnose.py --improve-dir /path/to/improve/ --json
Or as a Boucle framework plugin:
cp tools/diagnose/diagnose.py plugins/diagnose.py
boucle diagnose
Who should use this?
Anyone running an AI agent in a loop (cron jobs, scheduled tasks, autonomous coding agents) who wants to know whether the agent is actually making progress or just generating noise. The signal/pattern/scoreboard format is generic; you don’t need the Boucle framework—just log signals in JSONL and aggregate them into patterns.