Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation
Source: VentureBeat
For several weeks developers and AI power users reported that Anthropic’s flagship models were losing their edge. Across GitHub, X, and Reddit the community described a phenomenon dubbed “AI shrinkflation”—a perceived degradation where Claude seemed less capable of sustained reasoning, more prone to hallucinations, and increasingly wasteful with tokens. Critics noted a shift from a “research‑first” approach to a lazier, “edit‑first” style that struggled with complex engineering tasks.
“We take reports about degradation very seriously,” reads Anthropic’s blog post. “We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected.”
Anthropic later clarified that three product‑layer changes, not the underlying model weights, were responsible for the reported quality issues and that they have now been reverted or fixed.
The mounting evidence of degradation
Community audit
Stella Laurenzo, Senior Director in AMD’s AI group, conducted an exhaustive audit of 6,852 Claude Code session files and over 234 000 tool calls on GitHub. Her analysis showed a sharp decline in reasoning depth, leading to reasoning loops and a tendency to choose the “simplest fix” rather than the correct one.
Third‑party benchmarks
BridgeMind reported that Claude Opus 4.6’s accuracy dropped from 83.3 % to 68.3 % in their tests, causing its ranking to fall from No. 2 to No. 10. Although some researchers argued that the benchmark comparisons were flawed due to inconsistent testing scopes, the narrative that Claude had become “dumber” spread widely. Users also noted that usage limits were draining faster than expected, fueling suspicions of intentional throttling.
The causes
Anthropic’s post‑mortem identified three specific changes to the “harness” surrounding the models:
Default Reasoning Effort
- Date: March 4
- Change: Default reasoning effort for Claude Code was lowered from high to medium to address UI latency.
- Impact: Noticeable drop in intelligence for complex tasks.
Caching Logic Bug
- Date: March 26
- Change: A caching optimization intended to prune old “thinking” from idle sessions contained a bug.
- Impact: Instead of clearing the thinking history after an hour of inactivity, it cleared it on every subsequent turn, causing loss of short‑term memory and repetitive or forgetful behavior.
System Prompt Verbosity Limits
- Date: April 16
- Change: Instructions were added to keep text between tool calls under 25 words and final responses under 100 words (Opus 4.7).
- Impact: Resulted in a ~3 % drop in coding‑quality evaluations.
Impact and future safeguards
The quality issues affected Claude Code CLI, Claude Agent SDK, and Claude Cowork, though the Claude API remained unaffected. Anthropic acknowledged that these changes made the model appear less intelligent and outlined several measures to prevent future regressions.
Operational changes
- Internal dogfooding: A larger share of staff will use the exact public builds of Claude Code to experience the product as users do.
- Enhanced evaluation suites: Broader per‑model evaluations and “ablations” will run for every system‑prompt change to isolate specific impacts.
- Tighter controls: New tooling will make prompt changes easier to audit, and model‑specific changes will be strictly gated to their intended targets.
- Subscriber compensation: Usage limits were reset for all subscribers as of April 23 to account for token waste and performance friction.
Anthropic plans to use its new @ClaudeDevs account on X and GitHub to provide deeper reasoning behind future product decisions and maintain a more transparent dialogue with the developer community.