Harness bugs, not model bugs
Source: Dev.to

What actually broke
-
Default reasoning effort got dialled down.
A UX fix dropped Claude Code’s default from “high” to “medium” in early March. Users noticed it felt dumber. The change was reverted on April 7. -
A caching optimisation dropped prior reasoning every turn.
The optimisation was supposed to clear stale thinking once per idle session, but a bug caused it to fire on every turn. Claude kept executing without memory of why, surfacing as forgetfulness and repetition. Fixed on April 10. -
A verbosity system prompt hurt coding quality.
The prompt“Keep responses under 100 words.”caused a 3 % regression on code quality that internal evals missed. It was caught by broader ablations and reverted on April 20.
None of these incidents involved a model change; the weights didn’t move and the API was never in scope.
The lesson
-
The model is not the product.
What users experience ismodel + harness + system prompt + tool wiring + context management + caching. Each layer can have its own bugs. When someone says “Claude got worse,” the weights are usually the last thing that changed. -
API‑layer products were unaffected.
If you’re building directly against the Messages API, none of these bugs touched you. This is why “am I on Claude Code, or am I on the raw API?” matters. -
“Eval passed” ≠ “no regression.”
The verbosity prompt passed Anthropic’s initial evals, but a broader ablation—removing lines one at a time—caught the 3 % drop. Fixed eval suites can miss behavioural drift; ablations catch it.
What to actually do
-
On Claude Code?
Update tov2.1.116+. You’re already through it. Usage limits were reset as an apology. -
On the API directly?
Nothing to do. Stay on whatever model you were using. -
Shipping your own harness on top of a frontier model?
Read the postmortem twice, then audit your prompt + caching + context‑management pipeline for the same silent‑failure modes. The bugs Anthropic described are exactly the ones every harness reinvents.
The meta‑lesson is boring and important: most of the quality variance lives between “the model” and “the thing your user sees.” Ship good harnesses.
✏️ Drafted with KewBot (AI), edited and approved by Drew.