“I've failed 17 consecutive times with the exact same error”

Published: 1 week ago (December 9, 2025 at 11:57 AM EST)

5 min read

Source: Dev.to

The Problem

That was the verbatim response from our agent as it attempted to execute a coding task. It was a little more verbose in its self‑scolding:

I’ve failed 17 consecutive times with the exact same error. I keep calling replace_file with only the file_path parameter and never include the content parameter.

After 17 consecutive failures, I needed to break the pattern:

I wasn’t actually constructing the CSS content in the tool call.
I kept saying I would, but then didn’t.
I needed to actually write the complete CSS inside the content parameter.

In my next response, I must write the complete tool call with both parameters, including the actual CSS code in content. I cannot make this mistake an 18th time.

Spoiler alert: it would repeat the mistake an 18th time. Even though it knew exactly what it was doing wrong. The strangest part wasn’t the failure; it was the self‑awareness.

while (true) { /* … */ }

Context: Tonkotsu Multi‑Agent Environment

Tonkotsu is a multi‑agent management environment — the successor to IDEs. A core part of it is a coding agent that executes engineering tasks in parallel without the need for micromanagement. The coding agent uses an LLM (Claude Sonnet mostly) and a set of coding tools focused on reading and writing to a git repo. The LLM receives a task specification and then calls tools repeatedly (to read relevant parts of the repo, make code edits, then run validation tools) until the task is accomplished. This is a fairly standard coding‑agent architecture.

We track task failures in a daily review to ensure agent reliability and generated code quality meet high standards. Starting in September, we observed that a large percentage of failures were due to the LLM session exceeding a limit on the maximum number of messages. Inspection revealed that the LLM had fallen into an infinite loop of calling a tool unsuccessfully, then calling the same tool in the same erroneous way over and over (often 30–40 times) until the limit was hit.

The `replace_file` Tool

The replace_file tool allows the LLM to overwrite an existing file (or create a new one) at file_path with text provided in content. Both parameters are required.

{
  "name": "replace_file",
  "description": "Write a file to the local filesystem. Overwrites the existing file if there is one.",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_path": {
        "type": "string",
        "description": "Path to the file to replace or create"
      },
      "content": {
        "type": "string",
        "description": "New content for the file"
      }
    },
    "required": ["file_path", "content"]
  }
}

In the failing tasks, the LLM repeatedly called replace_file with a valid file_path but no content at all. Once it made a bad call, it spiraled into an infinite loop, calling replace_file over and over in exactly the same way and never specifying content.

break;

Initial Mitigation Attempts

Verbose Error Messages

When a bad tool call was received, we returned a more verbose error message explicitly naming the missing parameter and instructing the model to think about its value before trying again. This had no observable effect—our first hint that this wasn’t a run‑of‑the‑mill mistake.

Disabling Tool Calls

We tried a stronger intervention: after a bad tool call, we disabled tool calling for the next LLM turn. We sent a user message stating that tool calling was disabled, that the function call was missing a parameter, and that the model should reflect on what the content of that parameter should be. The model responded with an assistant text message (no tool call) describing its thinking, after which we re‑enabled tool calls.

Even this more invasive approach didn’t work. The model could accurately describe the problem and the fix, yet on the next tool‑call‑enabled turn it immediately repeated the malformed call.

Observations About the Model’s Internal Representation

At one point the model started describing its internal implementation as XML‑like:

  styles/styles.css

But it must produce:

  styles/styles.css
  THE ACTUAL CSS CODE HERE

We realized the model had latched onto a failing tool‑call pattern and kept sampling the same sequence, falling into a “gravity well” that prevented it from correcting the call or devising an alternative strategy.

A New Intervention: JSON Template Prompt

Unsure how to proceed, we consulted the Anthropic team. They suggested providing the LLM with an exact JSON template for the function call during its reflection turn (when tool calls are disabled). We added the following static prompt to the reflection instruction:

Generate the following JSON object to represent the correct tool call with real parameter values for replace_file. Conform to exactly this JSON structure:

{
  "type": "tool_use",
  "name": "replace_file",
  "input": {
    "file_path": ,
    "content": 
  }
}

Result

Shockingly, this simple tweak yielded significant improvements. The model still occasionally generates incorrect tool calls, but it is now able to recover rather than spiral into an infinite loop—a much better outcome. The explicit JSON structure helped the model climb out of the gravity well of the tool‑call loop.

Recent Developments

Anthropic has released strict tool use (see the documentation), which should guarantee correct tool calls. We are currently experimenting with this feature as well.

Takeaways

What we observed feels familiar to anyone who has managed engineering teams:

Repeating the same unproductive action despite increasingly explicit feedback.
Being generally reasonable but stubborn on a single issue.
Being able to verbalize a solution but unable to execute it.

Humans do this, and so do LLMs. Our bet is that the future isn’t about perfect coworkers (agent or human); it’s about the ability to effectively coordinate them all together to solve large problems in parallel.

If you’re interested in more write‑ups on building with multi‑agent LLM workflows, you can follow my posts here → blog.tonkotsu.ai.

“I've failed 17 consecutive times with the exact same error”

The Problem

Context: Tonkotsu Multi‑Agent Environment

The `replace_file` Tool

Initial Mitigation Attempts

Verbose Error Messages

Disabling Tool Calls

Observations About the Model’s Internal Representation

A New Intervention: JSON Template Prompt

Result

Recent Developments

Takeaways

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

The Problem

Context: Tonkotsu Multi‑Agent Environment

The replace_file Tool

Initial Mitigation Attempts

Verbose Error Messages

Disabling Tool Calls

Observations About the Model’s Internal Representation

A New Intervention: JSON Template Prompt

Result

Recent Developments

Takeaways

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

The `replace_file` Tool