I Stopped Using Playwright. Here's What Replaced It.

Published: (April 24, 2026 at 04:13 AM EDT)
4 min read
Source: Dev.to

Source: Dev.to

What Playwright gets wrong

Playwright was designed for single‑user, deterministic UI flows. You write selectors, set up auth fixtures, mock state, and run scripts that click through a fixed sequence. For a simple login‑and‑checkout flow, it’s fine.

Most real apps have multiple roles interacting with shared state. A customer submits a request, an operator reviews it and assigns a specialist, the specialist does work, the customer pays, the operator ships. Each step depends on the previous one, and each actor is a different authenticated user.

In Playwright this means:

  • Multiple auth‑state files (one per role)
  • Fixtures that seed the database before each test
  • Selectors that break every time the UI changes
  • Hundreds of lines of boilerplate before you’ve tested a single real interaction

You end up maintaining a parallel codebase just to describe what users already do naturally.

What agent‑browser does instead

agent‑browser is a CLI that lets AI agents control a browser via the accessibility tree. Instead of writing

page.locator('[data-testid="submit"]').click()

you describe what you want in plain language and the agent figures out how to do it.

  • No selectors.
  • No brittle CSS paths. If the button exists and has a label, the agent finds it.
  • If the UI changes, the test doesn’t break — the agent adapts.

Role isolation

Use Chrome profile directories, one per role, logged in once:

mkdir -p ~/.config/google-chrome/app-customer
mkdir -p ~/.config/google-chrome/app-operator
mkdir -p ~/.config/google-chrome/app-specialist
npx agent-browser \
  --profile ~/.config/google-chrome/app-operator \
  --headed \
  open https://yourapp.com/sign-in

The session persists. Every subsequent run using --profile picks it up automatically.

Use yopmail for test accounts — disposable inboxes, no registration, magic links work out of the box. If you hit email rate limits, generate the magic‑link URL directly via the admin API and navigate to it, no email sent:

curl -X POST https://.supabase.co/auth/v1/admin/generate_link \
  -H "Authorization: Bearer $SERVICE_ROLE_KEY" \
  -H "Content-Type: application/json" \
  -d '{"type":"magiclink","email":"test@example.com"}' \
  | python3 -c "import json,sys; print(json.load(sys.stdin)['action_link'])"

The orchestrator pattern

For multi‑role golden‑path testing, one Claude session acts as an orchestrator. It spawns one sub‑agent at a time, each operating as a specific role. State flows forward through a shared JSON file.

Orchestrator (main Claude session)
  ├── spawn Agent(customer)    → submits request  → writes request_id
  ├── spawn Agent(operator)    → assigns handler  → writes handler_id
  ├── spawn Agent(specialist)  → does work
  ├── spawn Agent(operator)    → reviews + approves
  ├── spawn Agent(customer)    → pays or confirms
  └── spawn Agent(customer)    → leaves feedback

State file

{
  "run_id": "run-2026-04-24",
  "current_request_id": null,
  "confirmation_token": null,
  "steps_completed": []
}

Each sub‑agent receives a self‑contained prompt with the current state injected. It reports back any new values — IDs, tokens visible in URLs — and the orchestrator writes them before spawning the next agent.

Results are appended to a log file, log‑and‑continue, never stopping on failure:

## [operator] Assign Specialist — PASS
## [specialist] Submit Work — PASS
## [operator] Approve Work — FAIL: approve button not found
## [customer] Confirm Receipt — PASS

The cost argument is dead

The main counterargument to AI‑based testing has always been cost. Claude API calls aren’t free, Playwright is.

If you’re running Claude Code on a subscription, that argument disappears. Sub‑agents run against your subscription, not per‑token billing. A full 10‑step golden‑path run costs nothing extra.

The only remaining case for Playwright is raw speed — milliseconds per test vs. minutes for an agent run. That matters only if you need tests on every commit in a tight CI loop. For pre‑deploy checks, QA runs, or anything not in a sub‑second CI pipeline, there’s no practical reason to choose Playwright.

What this means in practice

I haven’t written a Playwright test in months. The agent tests cover more ground, break less often, and took a fraction of the time to set up. The only thing I gave up is being able to run them on every commit — which, for integration tests covering a 6‑role flow, was never realistic anyway.

If you’re still writing Playwright tests for multi‑role integration flows, try this setup once. You probably won’t go back.

0 views
Back to Blog

Related posts

Read more »