PageAgent: The GUI Agent Living in Your Web Page

Published: (February 27, 2026 at 02:23 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Hero Banner

Most AI agent frameworks need a server, a headless browser, and a whole automation stack just to click a button on a web page. The page itself has no say in the process.

PageAgent takes a different approach. It’s a JavaScript library that runs directly in your page. Add it, and users can give natural‑language commands — the AI reads the live DOM, understands the UI, and acts. No server, no external process, no automation stack.

This means your web app isn’t being automated — it’s doing the automating. You control what the AI sees, how it behaves, and which LLM powers it. The intelligence lives in your page, not on someone else’s server.

Star PageAgent on GitHub — MIT licensed, open source, 600+ ⭐.

Zero Infrastructure

Zero Infrastructure

For npm projects, the programmatic API is just as clean:

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-5.1',
  baseURL: 'https://api.openai.com/v1',
  apiKey: YOUR_KEY,
})

await agent.execute('Fill the expense report for last Friday')

No screenshots, no OCR, no vision models. PageAgent works with a text‑based DOM — fast and lightweight. See the integration docs for all setup options.

Human‑in‑the‑Loop

Most AI agents are fire‑and‑forget. PageAgent is collaborative.

A built‑in panel shows the agent’s thinking in real time. It asks users for clarification on ambiguous steps. Users can stop, correct, or redirect at any point. This is what separates a demo from something you’d actually ship.

Human in the Loop

Already have a chatbot? Plug PageAgent behind it. Instead of your bot telling users “click the Submit button in the top right corner”, it actually clicks it — right in front of them. Your assistant stops advising and starts acting.

Bring Your Own LLM

OpenAI, Claude, DeepSeek, Qwen, Gemini, Grok — or fully offline via Ollama. PageAgent has no backend and calls no external service. Data flows directly from the page to whichever LLM you configure. The library is MIT‑licensed and fully auditable. (GitHub)

Going Cross‑Page

PageAgent runs inside your web page — ideal for SPAs where the agent has full context of the app state.

Some tasks span multiple pages. An optional browser extension adds multi‑tab awareness for those cases. It’s a power‑up, not a dependency.

Extension Bridge

What’s different here: your page drives the browser, not the other way around.

const result = await window.PAGE_AGENT_EXT.execute(
  'Compare the top 3 results for "wireless keyboard" on Amazon',
  {
    baseURL: 'https://api.openai.com/v1',
    apiKey: YOUR_KEY,
    model: 'gpt-5.1',
    onStatusChange: (status) => updateUI(status),
  }
)

Your page initiates tasks, controls the LLM, and receives real‑time callbacks. Access requires explicit user authorization via token.

Because PageAgent runs in the user’s real browser, it operates within their authenticated sessions. No credential sharing, no cookie management, no server‑side login flows. The user is already logged in — the agent just acts.

This unlocks scenarios that server‑side agents can’t touch:

  • A procurement tool that reorders supplies from the company’s supplier portal — the user is logged in, the agent navigates the ordering flow directly.
  • Travel booking that works through the user’s corporate booking system — operating the actual booking flow, not crawling public fares.
  • A project tracker that creates tasks in the team’s board — no API integration needed; the agent uses the same UI the user does.

Who Is This For?

  • SaaS developers — ship an AI copilot without rewriting the backend.
  • Enterprise teams — let users describe what they want in plain language instead of navigating 20‑click workflows in ERP, CRM, and admin systems.
  • AI builders — use @page-agent/core as a tool inside your existing agent, or plug it behind a customer‑service bot so it operates the UI instead of just giving instructions.

Ready to bring AI directly into your web UI? Try PageAgent today!

Modular and Extensible

Architecture

Use the full package for a turnkey solution, import the headless core for a custom UI, or use individual packages (DOM controller, LLM client, UI panel) à la carte. Custom tools, lifecycle hooks, prompt customization, and data masking are all built in.

Get Started


⭐ Star on GitHub — and help us grow.

Try the live demo — no sign‑up needed. Or drag the bookmarklet to try it on any site.

Read the docs — CDN, npm, and programmatic setup guides.

Install the extension — for multi‑page tasks.

PageAgent is open source under the MIT license. The free testing API on the demo site is for evaluation only — for production use, bring your own LLM API key. Terms of Use

0 views
Back to Blog

Related posts

Read more »