I Broke 50 PRs With One Config Change. Here's How I Built a Time Machine to Prevent It.

Published: 1 day ago (March 8, 2026 at 11:10 PM EDT)

6 min read

Source: Dev.to

We’ve all been there

You decide it’s time to improve code quality. “No more console.log in production code,” you declare. You add a simple ESLint rule, push the config, and merge.

Ten minutes later, your Slack blows up.

“Why is the build failing on my PR?”
“I can’t deploy the hotfix!”
“Who turned on the fun police?”

You just broke 50 open pull requests because you didn’t know how widespread the violation was. You revert the change, apologize, and the codebase remains messy.

This fear of “Policy Shock”—the disruption caused by enforcing new rules—is why many teams are afraid to tighten their governance.

But what if you could time‑travel? What if you could test your new rule against the last 100 PRs in your repo before you merged it?

That’s exactly what we built. Below is the technical deep‑dive into how we created a Policy Impact Simulator for GitHub.

The Problem: Governance is a Guessing Game

Most CI/CD pipelines are binary: pass or fail. When you introduce a new check, it applies to everything immediately. There is no “try before you buy.”

We needed a system that could:

Draft a policy (e.g., “Max PR size: 20 files”).
Fetch historical data (snapshots of past PRs).
Replay the draft policy against that history.
Visualize the Blast Radius—how many legit PRs would have been blocked?

Architecture

We built this using a Node.js backend (Express) and a React frontend. The core logic lives in a PolicySimulationService that acts as our time machine.

1. The Snapshot Engine

The first challenge is getting data. We don’t want to clone repos and run npm install 100 times—that’s too slow. Instead, we fetch lightweight metadata snapshots via the GitHub API.

We treat a PR as a collection of facts:

files_count
extensions used (.ts, .js, .py)
test_coverage ratios
Diff stats (additions / deletions)

// backend/src/services/policySimulation.service.js

async function collectSnapshots(repo, daysBack) {
  // 1. Fetch merged PRs from the last N days
  const prs = await github.fetchHistoricalPRs(repo, daysBack);

  // 2. Extract lightweight "Fact Snapshots"
  return prs.map(pr => ({
    id: pr.number,
    files_count: pr.changed_files,
    has_tests: pr.files.some(f => f.filename.includes('.test.')),
    extensions: [...new Set(pr.files.map(f => path.extname(f.filename)))],
    // ... other metadata
  }));
}

By abstracting the code into metadata facts, we can run thousands of simulations in seconds without touching the filesystem.

2. The Simulation Loop (The “Judge”)

Once we have the snapshots, we feed them into our evaluation engine. This is where the magic happens. We call this component The Judge.

The Judge takes a Draft Policy (JSON logic) and a Snapshot, and returns a verdict: PASS or BLOCK.

// The core simulation loop
async function executeSimulation(draftRules, snapshots) {
  const results = {
    blocked: 0,
    passed: 0,
    impacted_prs: []
  };

  for (const snapshot of snapshots) {
    // The Judge evaluates the rule
    const verdict = evaluate(draftRules, snapshot);

    if (verdict === 'BLOCK') {
      results.blocked++;
      results.impacted_prs.push({
        pr: snapshot.id,
        reason: `Violated rule: ${draftRules.type} (Limit: ${draftRules.value})`
      });
    } else {
      results.passed++;
    }
  }

  return results;
}

This deterministic loop lets us tweak a threshold—say, changing max file count from 20 to 50—and see the impact graph update instantly.

3. Front‑end Visualization

On the front‑end we use React to make the data actionable. The PolicySimulation component lets users:

Select a target repo.
Configure a draft policy (e.g., “Require 2 reviewers”).
Hit Simulate.

Results are rendered with Recharts to show the Blast Radius.

// frontend/src/components/governance/PolicySimulation.tsx
export const PolicySimulation = () => {
  const [result, setResult] = useState(null);

  // ...setup logic...

  return (
    <div>
      <h2>Simulation Configuration</h2>
      <label>
        Max PR Size
        <input type="number" />
      </label>
      <label>
        Test Coverage
        <input type="number" />
      </label>

      <button onClick={/* simulate */}>Simulate Impact</button>

      {result && (
        <Alert type={result.blast_radius > 50 ? "destructive" : "default"}>
          Blast Radius Alert
          <p>
            This policy would have blocked {result.total_blocked} out of {result.total_scanned} PRs.
            {result.blast_radius > 50
              ? " This is too disruptive!"
              : " Safe to merge."}
          </p>
        </Alert>
      )}
      {/* Charts go here */}
    </div>
  );
};

We intentionally calculate a “Friction Index.” If a policy blocks > 20 % of historical PRs, we flag it as High Friction. This simple heuristic has saved us from merging overly aggressive rules countless times.

Lessons Learned

Building this tool taught us three key lessons about developer experience (DX):

Metadata > Source Code – You rarely need the full AST to make high‑level governance decisions. Metadata (file types, sizes, authors) covers ~80 % of use cases and is ~100× faster to process.
Feedback Loops Matter – When you can see the impact of a rule instantly, you write better rules. Governance becomes a collaborative conversation rather than a punitive gate.
Safety‑First Defaults – By default we simulate against a wide historical window and surface a “high friction” warning, encouraging teams to iterate on policies before they go live.

TL;DR

Policy Shock doesn’t have to cripple your team. By snapshotting historical PRs, replay‑testing draft policies, and visualizing the blast radius, you can ship governance changes with confidence. The Policy Impact Simulator gives you a risk‑free sandbox to tighten standards without breaking the day‑to‑day flow of development.

## Ratic "Gate" into a Design Problem

- **JSON Schema is Powerful**: Defining policies as JSON (rather than hard‑coded functions) allows us to version them, diff them, and—crucially—simulate them without deploying code.

Future Work: AI Analysis

Our next step is integrating LLMs to explain why a policy failed. Instead of just saying “Blocked,” we want the system to look at the PR description and say, “Blocked because this PR touches the payment gateway but lacks a ‘Security’ label.”

We have a prototype running using a translate-natural-language endpoint that converts plain English (e.g., “Block PRs with no tests”) into our JSON schema.

// Transforming English to Policy Config
const result = await api.post('/v1/policies/translate-natural-language', {
  description: "Block huge PRs"
});
// Output: { type: "pr_size", max_files: 50 }

Try It Yourself

This simulator is part of our broader initiative to make governance invisible and helpful, rather than painful.

If you’re tired of guessing whether your new lint rule will cause a revolt, I highly recommend building a simple “dry‑run” script for your CI. Even a basic script that greps through your last 50 PRs can save you a headache.

What tools do you use to test your dev processes? Let me know in the comments—I’d love to see how others are solving the “Policy Shock” problem.

Thanks for reading! If you found this technical breakdown useful, drop a star or comment below.

I Broke 50 PRs With One Config Change. Here's How I Built a Time Machine to Prevent It.

We’ve all been there

The Problem: Governance is a Guessing Game

Architecture

1. The Snapshot Engine

2. The Simulation Loop (The “Judge”)

3. Front‑end Visualization

Lessons Learned

TL;DR

Future Work: AI Analysis

Try It Yourself

Related posts

Why Your Docking Station Fails to Detect an External Monitor: A Deep Dive into Multi-Display Architecture

Extract Structured Data from Car Listings Using AI in .NET 10

Build your openclaw superstack under a minute

Your Agent Is a Small, Low-Stakes HAL