How to add browser automation to any MCP server using PageBolt
Source: Dev.to
Introduction
You’re building an AI agent with Claude. Your agent needs to interact with the web — take screenshots, generate PDFs, record demo videos, inspect page structure.
You could:
- Write Python scripts to call Puppeteer (fragile, maintenance burden)
- Manage your own headless browser pool (infrastructure overhead)
- Use PageBolt’s MCP server (tool call, done)
Option 3 takes 5 minutes.
PageBolt provides an open MCP server that gives Claude, Cursor, and Windsurf native access to browser‑automation tools. Your agent calls take_screenshot() directly. No Python. No subprocess management. No infrastructure.
What is PageBolt MCP?
MCP (Model Context Protocol) is a standard for AI agents to call external tools. PageBolt implements the MCP spec so AI agents can:
- Take screenshots of any URL
- Generate PDFs from HTML or web pages
- Record browser interactions as narrated videos
- Inspect page structure (CSS selectors, element text)
- Run multi‑step browser sequences
All via direct function calls in Claude Code, Cursor IDE, Windsurf. PageBolt MCP translates these calls into hosted API requests—no infrastructure on your end.
Installation (2 minutes)
Step 1 – Install PageBolt MCP globally
npm install -g pagebolt-mcp
Step 2 – Get your API key
Sign up at (free tier: 100 requests/month) and copy your API key.
Step 3 – Configure Claude Desktop
Edit ~/.claude/claude_desktop_config.json:
{
"mcpServers": {
"pagebolt": {
"command": "pagebolt-mcp",
"env": {
"PAGEBOLT_API_KEY": "YOUR_API_KEY_HERE"
}
}
}
}
Restart Claude Desktop. Done.
Step 4 – Configure Cursor (optional)
Edit .cursor/mcp.json in your project root (or the global Cursor settings):
{
"mcpServers": {
"pagebolt": {
"command": "pagebolt-mcp",
"env": {
"PAGEBOLT_API_KEY": "YOUR_API_KEY_HERE"
}
}
}
}
Restart Cursor. Your agent can now call PageBolt tools.
Tools available
Once installed, your agent can call the following functions.
take_screenshot(url, options)
Capture a screenshot of any URL.
Agent: "Take a screenshot of example.com on mobile"
Tool: take_screenshot("https://example.com", {
"device": "iphone_14_pro",
"fullPage": true
})
Result: base64 PNG image
generate_pdf(url, options)
Generate a PDF from a URL or HTML.
Agent: "Generate a PDF of the checkout page"
Tool: generate_pdf("https://mystore.com/checkout", {
"format": "A4",
"margin": "1in"
})
Result: base64 PDF
record_video(steps, narration)
Record a browser interaction as an MP4 with narration.
Agent: "Record a demo of adding a product to cart and checking out"
Tool: record_video([
{"action": "navigate", "url": "https://mystore.com"},
{"action": "click", "selector": "button.add-to-cart"},
{"action": "click", "selector": "a.checkout"}
], {
"narration": true,
"audioGuide": "Adding product to cart. Proceeding to checkout."
})
Result: base64 MP4 video
inspect_page(url)
Get a structured map of page elements—buttons, inputs, links with CSS selectors.
Agent: "Inspect the login page and find the submit button"
Tool: inspect_page("https://myapp.com/login")
Result: {
"forms": [{
"selector": "form.login-form",
"inputs": [
{"selector": "#email", "type": "text", "placeholder": "Email"},
{"selector": "#password", "type": "password", "placeholder": "Password"}
],
"buttons": [{"selector": "button[type=submit]", "text": "Sign In"}]
}]
}
run_sequence(steps, options)
Execute a multi‑step browser automation sequence.
Agent: "Run through the checkout flow and tell me if it succeeds"
Tool: run_sequence([
{"action": "navigate", "url": "https://mystore.com/checkout"},
{"action": "fill", "selector": "input[name=email]", "value": "test@example.com"},
{"action": "click", "selector": "button[type=submit]"},
{"action": "wait_for", "selector": ".order-confirmation", "timeout": 5000}
], {
"captureScreenshots": true
})
Result: success/failure + screenshot evidence
Practical examples
Example 1 – Screenshot‑taking agent
User: "Take a screenshot of each competitor's pricing page"
Agent: Calls take_screenshot() for each URL
Result: PNG images for comparison analysis
Your agent can now autonomously capture and analyze screenshots without writing any Puppeteer code.
Example 2 – Demo video generator
User: "Record a narrated demo of our checkout flow"
Agent: Calls record_video() with checkout steps + narration script
Result: MP4 demo video auto‑generated, saved to S3
Instead of manually recording a 5‑minute screencast, your agent does it in ~30 seconds.
Example 3 – PDF report automation
User: "Generate a PDF report of all product pages"
Agent:
1. Inspect each product page with inspect_page()
2. Collect relevant data (titles, prices, images)
3. Build an HTML template
4. Call generate_pdf() on the template
Result: Consolidated PDF report
Example 4 – Automated Web Testing
User: “Test if our checkout flow works on mobile and desktop”
Agent:
- Run checkout sequence on iPhone preset
- Run checkout sequence on desktop preset
- Compare screenshots for visual regression
Result: Pass/fail with evidence (screenshots)
Real‑World Use Case: AI Agent That Demos Products
User: "Record a demo of the new dashboard feature"
Agent:
1. Calls inspect_page() on staging dashboard
2. Gets CSS selectors for "Add Widget" button, filter controls, etc.
3. Calls record_video() with step‑by‑step interactions
- Navigate to dashboard
- Click "Add Widget"
- Select "Sales Chart"
- Configure chart options
- Click "Save"
4. Adds narration: "Adding a sales chart widget to your dashboard..."
5. Returns MP4 file
Result: Professional demo video, auto‑generated, ready to ship
No manual screencast. No waiting for a video editor. Done in ~30 seconds.
Cost Comparison: Manual vs. Agent‑Driven
| Task | Manual | AI Agent (PageBolt MCP) |
|---|---|---|
| Screenshot 10 URLs | 2 min | 10 s |
| Record 5‑minute demo | 20 min | 1 min |
| Generate PDF report | 15 min | 2 min |
| Test checkout flow | 10 min | 30 s |
Total time saved per week: ≈ 10 hours
Getting Started
-
Install
npm install -g pagebolt-mcp -
Get API key – Sign up at
-
Configure – Add the key to Claude Desktop / Cursor config
-
Use – Your agent now has browser tools natively
No infrastructure. No setup. Your AI agent is now a power user of the web. 🚀
Limits and Caveats
- Authentication: Pass cookies/headers if needed
- JavaScript rendering: Full Chromium, waits for network idle
- Localhost: Not accessible (we’re a hosted service)
- Rate limits: Free tier = 100 requests/month; paid tiers scale
Next Steps
Your agent can now:
- ✅ Screenshot any website
- ✅ Generate PDFs on demand
- ✅ Record and narrate videos
- ✅ Inspect page structure
- ✅ Run complex browser sequences
Use these tools to automate tasks that previously required manual work or fragile Python scripts.
Start free — 100 requests/month, no credit card. Add browser automation to your MCP server in 5 minutes.