I Fixed 110 Failing E2E Tests in 2 Hours Without Writing a Single Line of Test Code

Published: 3 days ago (February 22, 2026 at 04:09 PM EST)

4 min read

Source: Dev.to

110 failing Playwright tests. Login flows, multi‑step form wizards, search filters, file uploads, complex user workflows. Some failures were missing UI steps, some were dirty state from previous runs, and some were stale selectors. I fixed all of them in 2 hours without writing a single line of test code.

I built a playwright‑autopilot plugin that does it.

How the debugging workflow actually works

When you run a test through the plugin, a lightweight capture hook injects into Playwright’s worker process. It monkey‑patches BrowserContext._initialize to add an instrumentation listener—no modifications to Playwright’s source code, works with any existing installation.

From that point, every browser action is recorded:

DOM snapshots – full ARIA tree of the page captured before and after each click, fill, select, and navigation. When a test fails you see exactly what the page looked like at the moment of failure and one step before.
Network requests – URL, method, status code, timing, request body, response body. You can filter by status (e.g., 400+), by URL pattern, or by method.
Console output – errors, warnings, and logs tied to the specific action that produced them. Scoped to the relevant step rather than a wall of text.
Screenshots – captured at the point of failure.

The AI does not dump all of this data at once. Built on MCP (Model Context Protocol), it pulls data on demand: first the action timeline, then the failing step, then the DOM snapshot, network response, and console logs. With 32 tools returning only what’s needed, it stays token‑efficient by design.

It thinks in user flows, not selectors

Before touching code, the agent maps the intended user journey (e.g., “a user logs in, fills out a multi‑step form, uploads a file, submits”). It walks through the steps a real user would perform and compares that against what the test actually did.

When a step is missing—a dropdown never selected, a required field never filled, a radio button never clicked—the agent finds the existing page‑object method in your codebase and adds the call. No new abstractions, minimal diff.

It follows your architecture

Works with Page Object Model, business/service layers, or any pattern your team uses.
Uses getByRole(), getByTestId(), web‑first assertions.
Avoids page.evaluate() hacks, waitForTimeout, or try/catch around Playwright actions.

If the application itself is broken (e.g., 500 errors, unhandled exceptions), the plugin reports that instead of trying to work around it.

It learns and remembers

After a test passes, the plugin automatically saves the verified user flow—the exact sequence of interactions that constitute the happy path. The next time that test breaks, the agent already knows the intended journey and jumps straight to identifying what changed.

Running e2e_build_flows once across your suite captures every test’s journey; the agent gets faster over time.

A real example

A checkout test was failing with “locator resolved to hidden element.” The usual debugging path:

Open trace viewer.
Find the failing step.
Read the DOM.
Realize a country dropdown was never selected, so the shipping section never rendered.

That can take ~20 minutes for an experienced developer.

The plugin found the same root cause in a single run. It pulled the DOM snapshot at the failing step, saw the unselected dropdown in the ARIA tree, searched the page objects for selectCountry(), added the call in the service layer, re‑ran the test, and it passed. One fix, ~12 seconds of AI processing.

Get started

# Add the marketplace entry
/plugin marketplace add kaizen-yutani/playwright-autopilot

# Install the plugin
/plugin install kaizen-yutani/playwright-autopilot

# Then prompt the AI
Fix all failing e2e tests

Try it out: https://github.com/kaizen-yutani/playwright-autopilot – star the repo, run it on your flakiest test, and let us know what breaks.

I Fixed 110 Failing E2E Tests in 2 Hours Without Writing a Single Line of Test Code

How the debugging workflow actually works

It thinks in user flows, not selectors

It follows your architecture

It learns and remembers

A real example

Get started

Related posts

How We Fixed Firefox's localStorage Race in Playwright: Two Navigation Helpers

Browserless alternative: hosted browser automation without the ops burden

How We Made Our E2E Tests 12x Faster

Understanding Variables and Data Types in JavaScript