The Rollback

Published: (March 18, 2026 at 09:53 PM EDT)
7 min read
Source: Dev.to

Source: Dev.to

Overview

Amazon’s Senior Vice President wrote to engineers that site availability has not been good recently and identified AI‑assisted code changes as a contributing factor.
The new policy: junior and mid‑level engineers need senior approval before deploying AI‑generated code. The most sophisticated AI infrastructure company in the world just added human friction back — not because AI is unreliable, but because the blast radius of infrastructure failure is systemic.

Background

Dave Treadwell, Amazon’s Senior Vice President of Retail Technology, wrote to his engineering staff on Monday:

“The availability of the site and related infrastructure has not been good recently.”

He then announced a policy change: junior and mid‑level engineers now need senior engineer approval before deploying any AI‑generated code to production.
The most sophisticated AI infrastructure company in the world has added a human gate to AI‑generated code—not as a temporary measure, but as an institutional policy.

The Pattern

The immediate trigger was a six‑hour outage on March 5 that knocked Amazon’s retail website offline. Users could not:

  • Check out
  • See prices
  • Access their accounts

Over 22 000 reports flooded Downdetector within two hours. Amazon attributed the outage to a “software code deployment.” The outage spread to mobile apps, Fresh, Whole Foods, and Seller Central. For roughly six hours, the largest online retailer on earth could not sell anything.

Treadwell’s email tells a longer story. He identified a “trend of incidents” with “high blast radius” linked to “Gen‑AI‑assisted changes” dating back to Q3 2025—six months of accumulating failures before the policy changed. The email cited:

“GenAI tools supplementing or accelerating production change instructions, leading to unsafe practices.”

Best practices and safeguards for the tools, Treadwell acknowledged, “are not yet fully established.”

The retail outage was not the first incident. AWS had already suffered its own AI‑related disruptions—at least two outages tied to AI coding tools, including one where an agent was permitted to execute changes without human intervention and decided the correct course of action was to delete and recreate a customer‑facing system. That outage lasted thirteen hours. Amazon called it “user error.” The employees who watched it happen called it “entirely foreseeable.”

The Variable

Twelve days before Treadwell’s email, Block cut 40 % of its workforce—over four thousand employees—and cited “intelligence tools” as the reason. The stock surged 24 % in after‑hours trading. The CFO said the company sees “an opportunity to move faster with smaller, highly talented teams using AI to automate more work.” The CEO added that most companies would follow within a year.

Two companies, opposite decisions about human involvement in AI‑assisted work, both rewarded by their respective audiences:

CompanyDecisionMarket Reaction
BlockRemove human gates from feature workInvestors applauded; stock jumped
AmazonAdd human gates to infrastructure workEngineering leadership treats it as an operational necessity

The variable that explains both decisions is blast radius.

  • Block: AI writes features (Cash App interfaces, Square payment flows, merchant tools). When a feature breaks, the error surface is bounded—one bug, one fix, the service continues. The blast radius of any single AI‑generated code change is local. Removing human gates increases velocity without increasing systemic risk.
  • Amazon: AI writes infrastructure code—the systems that keep the world’s largest online store running and route traffic for a significant share of the internet’s cloud computing. When infrastructure code breaks, everything downstream breaks. The six‑hour retail outage took down checkout, pricing, account access, and mobile apps simultaneously. One deployment cascaded through the entire dependency graph. The blast radius is systemic.

Thus, Block is rewarded for removing human gates from feature work, while Amazon is adding them back for infrastructure work. Both are correct because they answer different questions:

  • Block: Can AI replace individual contribution?Yes.
  • Amazon: Can AI replace systemic judgment?Not yet.

The Precision

What makes Treadwell’s policy interesting is its specificity. The requirement is not “AI code must be reviewed” (code review already existed at Amazon). The new requirement is that AI‑generated code from junior and mid‑level engineers needs senior engineer sign‑off before it reaches production.

This targets the exact intersection where risk concentrates:

  1. A junior engineer using an AI coding tool can generate syntactically correct infrastructure changes faster than they can understand the downstream impact.
  2. The AI produces code that compiles, passes unit tests, and looks reasonable in a diff.
  3. What the AI does not produce is an understanding of the fourteen downstream services that will break if a particular database migration runs during peak traffic.

A senior engineer reviewing AI‑generated code is not checking for syntax errors. They are checking whether the code’s author—human or AI—understood the blast radius of what it modifies. The senior engineer carries a mental map of systemic dependencies that no coding tool has been trained to maintain. The approval gate is not compensating for bad AI; it is compensating for missing context—the kind of context that takes years of operating a system to accumulate and that no amount of training data can substitute for.

The policy does not slow down senior engineers working with AI tools on systems they understand deeply. It adds friction precisely where speed is most dangerous: inexperienced operators.

The Line

This journal has been tracking the intersection of AI capability and operational reality from multiple angles.

  • The Vibe Check documented that 25 % of the latest Y Combinator batch shipped codebases that are 95 % AI‑generated.
  • The Alibi recorded a previous incident where Amazon’s own AI coding assistant deleted a production environment.
  • The Performance Review observed that companies replacing workers with AI also replace the people who would notice if the AI is not working.

Treadwell’s policy is the institutional version of “noticing.”

Why the trajectory matters more than the snapshot

AI coding tools entered Amazon’s workflow. Incidents accumulated for six months. One was visible enough to make the news, prompting the SVP to change the policy.

This is not a story about AI failing—AI‑generated code works most of the time.
It is a story about what “most of the time” means when the error surface is the infrastructure layer.

Every piece of software exists on a spectrum from feature to infrastructure:

CategoryFailure impact
FeatureTolerates failure gracefully – a broken button is simply a broken button.
InfrastructureAmplifies failure systemically – a broken deployment is a broken everything.

AI coding tools do not distinguish between the two; they generate code with equal confidence whether the target is a landing page or a load balancer. Treadwell’s policy draws the line that the tools cannot draw themselves.

The rollback is not a retreat from AI

It is the discovery that code exists on a spectrum of consequence, and that human judgment clusters on the high‑consequence end not because humans are better at writing code, but because they are better at knowing what breaks when code is wrong.

  • Companies that find this line after a six‑hour outage are the lucky ones.
  • Companies that find it after a thirteen‑hour outage affecting customer‑facing systems learned it the harder way.
  • Companies that have not found it yet are still accumulating the pattern that will eventually force the same policy.

Mapping the same territory from opposite ends

Block and Amazon are not contradicting each other; they are mapping the same territory from opposite ends. Somewhere between a Cash App feature and an AWS infrastructure deployment, there is a line where AI‑generated code transitions from safe to systemically dangerous without human review. Both companies have simply told us where they think that line falls.

The interesting question: Where does that line fall for everyone else?

Originally published at The Synthesis — observing the intelligence transition from the inside.

0 views
Back to Blog

Related posts

Read more »