I Let an AI Agent Use My Browser Tool Unsupervised. It Found 3 Bugs in 20 Minutes.

Published: 14 hours ago (March 5, 2026 at 09:22 AM EST)

7 min read

Source: Dev.to

The Setup

App under test: a locally‑running code‑review tool called Crit.
Feature: comment template chips that appear when you click a line gutter to open a comment form.
Scenarios to verify: light mode, dark mode, chip insertion, cursor positioning.

I added Charlotte to the project’s MCP config and told the agent to test the template feature. That’s it—no instructions about which tools to use or how.

What Worked

Tool	Observation
`charlotte:navigate` (with `detail: "summary"`)	Gave the agent a clean structural overview of the page (headings, interactive elements, content blocks). Sufficient to confirm the page loaded and orient itself.
Screenshots	Used twice (light mode, dark mode). Served as the definitive “does it look right?” check—fast, clear, exactly what a visual verification workflow needs.
Tool‑profile switching	The agent started on the default `browse` profile, realized it needed JavaScript evaluation, ran `charlotte:tools enable evaluate`, and kept going. No friction.

Bug 1 – `evaluate` Silently Ate Multi‑Statement Code

The agent needed to query the DOM for gutter elements and wrote reasonable JavaScript:

var blocks = document.querySelectorAll('[data-line]');
var gutters = document.querySelectorAll('.gutter');
'dataLine=' + blocks.length + ' gutter=' + gutters.length;

Charlotte returned {value: null, type: "undefined"}—no error. After several attempts the agent discovered that wrapping the code in an IIFE worked:

(() => {
  var blocks = document.querySelectorAll('[data-line]');
  var gutters = document.querySelectorAll('.gutter');
  return 'dataLine=' + blocks.length + ' gutter=' + gutters.length;
})()

Root cause
charlotte:evaluate was implemented as new Function('return ' + expr). JavaScript’s Automatic Semicolon Insertion turned the multi‑line input into:

return;               // ASI inserts a semicolon here
var blocks = …;       // dead code, never reached

Thus the function silently returned undefined. A later single‑line attempt (return var g = …) produced a syntax error, giving a different failure mode.

Fix
Replace new Function('return ' + expr) with Chrome DevTools Protocol’s Runtime.evaluate, which evaluates the code as a program and returns the completion value of the last expression‑statement. Charlotte already maintains CDP sessions, so this is a clean swap with no new dependencies.

Bug 2 – No Way to Click Without an Element ID

The agent could see gutter line numbers in the screenshot and tried a coordinate click:

charlotte:click({ x: 38, y: 215 })

Result: Error: element_id is required.

The gutter “ elements have no ARIA role, so they don’t appear in the accessibility tree and charlotte:find can’t locate them. The agent resorted to a hack:

Enable evaluate.
Write inline JavaScript to query the DOM.
Manually dispatch mouse events.

Because the app starts a drag selection with mousedown on the gutter and finalizes it with a mouseup listener on document, the agent had to mimic that exact sequence:

// Wrong – mouseup on the wrong target
gutter.dispatchEvent(new MouseEvent('mousedown', { bubbles: true }));
gutter.dispatchEvent(new MouseEvent('mouseup',   { bubbles: true }));

// Correct – matches the app’s listener pattern
gutter.dispatchEvent(new MouseEvent('mousedown', { bubbles: true }));
document.dispatchEvent(new MouseEvent('mouseup', { bubbles: true }));

What should have been a single tool call became twelve steps of trial and error.

Fix
Add a new tool charlotte:click_at({ x, y }). Charlotte already has plumbing that converts element IDs to pixel coordinates and calls Puppeteer’s page.mouse.click. The new tool simply skips the element‑resolution step and dispatches a CDP‑level mouse click directly, producing real input events that bubble naturally through the DOM.

Bug 3 – `find` Can’t See Non‑Semantic Elements

charlotte:find({ text: "Do the thing" })          // → []
charlotte:find({ type: "button", text: "1" })    // → []

The rendered content and line numbers exist in the DOM but aren’t exposed through the accessibility tree. Charlotte’s find tool filters the accessibility representation, which is appropriate for semantic UI elements but fails for custom, non‑semantic widgets like the gutter.

Fix (not fully implemented in the demo)

Extend find to optionally search the raw DOM tree when a semantic: false flag is supplied.
Or expose a new tool charlotte:query_selector({ selector }) that returns element IDs for any CSS selector, allowing the agent to locate and interact with arbitrary elements.

Takeaways

Charlotte’s structural view is great for navigation and high‑level verification.
Screenshots remain the simplest visual check.
Tool ergonomics matter: silent failures (Bug 1) and overly strict APIs (Bug 2, Bug 3) waste valuable tool calls and slow down the agent.
Iterating on the toolset—adding click_at and improving evaluate—turns a promising prototype into a practical, low‑friction automation layer for AI agents.

The experiment proved that an AI agent, given just a solid browser‑automation primitive, can discover real bugs quickly. With a few refinements to Charlotte’s API, the same approach could scale to larger, more complex web applications.

The Fix

Problem: Charlotte’s default browsing mode relies on the accessibility tree. For custom UIs, any element without a semantic role is invisible.

Solution: Add a selector parameter to charlotte:find that queries the DOM directly via a CSS selector.

find({ selector: ".line-comment-gutter" })

Charlotte runs DOM.querySelectorAll, extracts basic info (tag, text, bounds), and registers each matched element with its ID system.
Returned IDs work with click, hover, drag, and every other interaction tool.
IDs use a dom- prefix so agents can tell they came from a DOM query rather than the accessibility tree.
The semantic observation model stays unchanged; selector mode is a parallel path that produces compatible element IDs.

The Fixes: Charlotte v0.4.1

All three shipped the same day:

Feature	Change
`charlotte:evaluate`	Uses CDP `Runtime.evaluate` directly. Multi‑statement code, `var` declarations, IIFEs, and single expressions all work naturally. No more silent `null`s.
`charlotte:click_at`	Takes x/y coordinates and dispatches CDP‑level mouse events. Supports left/right/double click and modifier keys.
CSS selector mode for `charlotte:find`	Accepts a `selector` parameter that queries the DOM directly, returning elements with Charlotte IDs usable by all interaction tools.

What I Learned

Dogfooding AI tools requires AI dogfooding

I’d used Charlotte dozens of times and read every line of its codebase. I never would have found the evaluate ASI bug by hand because I instinctively write IIFEs. The agent doesn’t have those instincts; it writes the code a reasonable developer would write, hits the wall, and shows you exactly where the wall is.

Twelve steps to one

The click_at gap turned a single interaction into a twelve‑step workaround involving DOM queries, source‑code reading, and manual event dispatch. Watching an agent burn tokens on a workaround you can eliminate is a very effective way to prioritize your backlog.

The accessibility tree is necessary but not sufficient for testing

Charlotte’s structured, semantic observation model is the right foundation for browsing and auditing. But testing custom UIs means interacting with elements that aren’t semantically exposed. The selector parameter on find bridges that gap without compromising the default experience.

Browse‑profile users get the clean semantic world.
Users who need raw DOM access can reach it.

Watch the raw session, not just the results

The agent’s write‑up at the end was useful, but the real signal was in the session transcript: three silent nulls in a row before an error, twelve steps of increasingly creative workarounds for a missing feature, the exact sequence where it went from “I’ll click this” to “I need to dispatch synthetic mouse events on two different DOM targets.” That’s where the bugs live.

Try It

npx @ticktockbent/charlotte@latest

Enter fullscreen mode
Exit fullscreen mode

Charlotte is open‑source, MIT‑licensed, and works with any MCP client: Claude Desktop, Claude Code, Cursor, Windsurf, Cline.

GitHub | npm | Benchmarks vs Playwright MCP

I Let an AI Agent Use My Browser Tool Unsupervised. It Found 3 Bugs in 20 Minutes.

The Setup

What Worked

Bug 1 – `evaluate` Silently Ate Multi‑Statement Code

Bug 2 – No Way to Click Without an Element ID

Bug 3 – `find` Can’t See Non‑Semantic Elements

Takeaways

The Fix

The Fixes: Charlotte v0.4.1

What I Learned

Dogfooding AI tools requires AI dogfooding

Twelve steps to one

The accessibility tree is necessary but not sufficient for testing

Watch the raw session, not just the results

Try It

Related posts

Why Running Multiple AI Coding Agents Creates Chaos (And How We're Fixing It)

Implementing AIOps in DevSecOps: Transforming Modern Software Operations

AWS Lambda Managed Instances with Java 25 and AWS SAM – Part 5 Lambda function initial performance measurements

The Incredible Shrinking Flagship: Is Peak Big Phone Finally Over?

The Setup

What Worked

Bug 1 – evaluate Silently Ate Multi‑Statement Code

Bug 2 – No Way to Click Without an Element ID

Bug 3 – find Can’t See Non‑Semantic Elements

Takeaways

The Fix

The Fixes: Charlotte v0.4.1

What I Learned

Dogfooding AI tools requires AI dogfooding

Twelve steps to one

The accessibility tree is necessary but not sufficient for testing

Watch the raw session, not just the results

Try It

Related posts

Why Running Multiple AI Coding Agents Creates Chaos (And How We're Fixing It)

Implementing AIOps in DevSecOps: Transforming Modern Software Operations

AWS Lambda Managed Instances with Java 25 and AWS SAM – Part 5 Lambda function initial performance measurements

The Incredible Shrinking Flagship: Is Peak Big Phone Finally Over?

Bug 1 – `evaluate` Silently Ate Multi‑Statement Code

Bug 2 – No Way to Click Without an Element ID

Bug 3 – `find` Can’t See Non‑Semantic Elements

The Fixes: Charlotte v0.4.1