Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation

Published: 15 hours ago (April 23, 2026 at 05:50 PM EDT)

3 min read

Source: VentureBeat

For several weeks developers and AI power users reported that Anthropic’s flagship models were losing their edge. Across GitHub, X, and Reddit the community described a phenomenon dubbed “AI shrinkflation”—a perceived degradation where Claude seemed less capable of sustained reasoning, more prone to hallucinations, and increasingly wasteful with tokens. Critics noted a shift from a “research‑first” approach to a lazier, “edit‑first” style that struggled with complex engineering tasks.

“We take reports about degradation very seriously,” reads Anthropic’s blog post. “We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected.”

Anthropic later clarified that three product‑layer changes, not the underlying model weights, were responsible for the reported quality issues and that they have now been reverted or fixed.

The mounting evidence of degradation

Community audit

Stella Laurenzo, Senior Director in AMD’s AI group, conducted an exhaustive audit of 6,852 Claude Code session files and over 234 000 tool calls on GitHub. Her analysis showed a sharp decline in reasoning depth, leading to reasoning loops and a tendency to choose the “simplest fix” rather than the correct one.

Third‑party benchmarks

BridgeMind reported that Claude Opus 4.6’s accuracy dropped from 83.3 % to 68.3 % in their tests, causing its ranking to fall from No. 2 to No. 10. Although some researchers argued that the benchmark comparisons were flawed due to inconsistent testing scopes, the narrative that Claude had become “dumber” spread widely. Users also noted that usage limits were draining faster than expected, fueling suspicions of intentional throttling.

The causes

Anthropic’s post‑mortem identified three specific changes to the “harness” surrounding the models:

Default Reasoning Effort

Date: March 4
Change: Default reasoning effort for Claude Code was lowered from high to medium to address UI latency.
Impact: Noticeable drop in intelligence for complex tasks.

Caching Logic Bug

Date: March 26
Change: A caching optimization intended to prune old “thinking” from idle sessions contained a bug.
Impact: Instead of clearing the thinking history after an hour of inactivity, it cleared it on every subsequent turn, causing loss of short‑term memory and repetitive or forgetful behavior.

System Prompt Verbosity Limits

Date: April 16
Change: Instructions were added to keep text between tool calls under 25 words and final responses under 100 words (Opus 4.7).
Impact: Resulted in a ~3 % drop in coding‑quality evaluations.

Impact and future safeguards

The quality issues affected Claude Code CLI, Claude Agent SDK, and Claude Cowork, though the Claude API remained unaffected. Anthropic acknowledged that these changes made the model appear less intelligent and outlined several measures to prevent future regressions.

Operational changes

Internal dogfooding: A larger share of staff will use the exact public builds of Claude Code to experience the product as users do.
Enhanced evaluation suites: Broader per‑model evaluations and “ablations” will run for every system‑prompt change to isolate specific impacts.
Tighter controls: New tooling will make prompt changes easier to audit, and model‑specific changes will be strictly gated to their intended targets.
Subscriber compensation: Usage limits were reset for all subscribers as of April 23 to account for token waste and performance friction.

Anthropic plans to use its new @ClaudeDevs account on X and GitHub to provide deeper reasoning behind future product decisions and maintain a more transparent dialogue with the developer community.

Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation

The mounting evidence of degradation

Community audit

Third‑party benchmarks

The causes

Default Reasoning Effort

Caching Logic Bug

System Prompt Verbosity Limits

Impact and future safeguards

Operational changes

Related posts

Claude IA: O que é, Como Funciona e Por que Está se Destacando

Claude Code to be removed from Anthropic's Pro plan?

Claude can now help you build the perfect Spotify playlist

Claude Code no longer included in Pro tier