How to scrape Google AI Mode: Detailed Guide in 2025
Source: Dev.to
Introduction
Google AI Mode has emerged as one of the fastest and most comprehensive AI search experiences available. Unlike standalone chatbots like ChatGPT and Claude that rely on their training data, AI Mode uses live Google Search results and a “query fan‑out” technique to simultaneously search multiple data sources in real‑time. Because both the Gemini AI model and the search infrastructure are developed by Google, the system seamlessly integrates capabilities from Google Search, Lens, and Image Search for exceptionally fast performance.
For SEO professionals and businesses, AI Mode represents a critical shift in how users discover content. This emerging field, known as GEO (Generative Engine Optimization), focuses on appearing in AI‑generated responses rather than traditional search results. Unlike the classic top‑10 rankings, AI Mode draws from a much broader pool of sources, creating opportunities for brands to get featured even if they don’t rank on page one. When your brand appears in these AI responses, it can:
- Drive traffic
- Generate qualified leads
- Influence purchase decisions at the exact moment users are researching solutions
Tracking AI Mode visibility is quickly becoming as important as monitoring traditional search rankings.
In this article we’ll explore methods for scraping Google AI Mode results. We’ll start with a custom scraper that uses Playwright and proxy servers, then look at a more scalable, production‑ready solution that works reliably at scale without constant maintenance.
What Google AI Mode Contains
Let’s begin by understanding the information that Google AI Mode provides. It contains the following data points:
- Prompt – the user’s query
- Answer – the AI‑generated response
- Links – URLs referenced in the answer
- Citations – links to the source pages
Most importantly, AI Mode responses vary by region. The same query will return different results depending on whether you’re searching from the United States, France, or any other location. As mentioned previously, all these data points and the ability to localize responses are essential for GEO and AI Search tracking.
In this article we’ll use Python as our primary coding language. The techniques shown can be adapted to other languages as needed. With this background in mind, let’s start with the first method: writing custom code.
Challenges of Web Scraping Google AI Mode
A simple implementation won’t work for scraping AI Mode. There are several reasons for this:
Challenge 1 – Google’s Anti‑Scraping Detection
- Requests without proxies are almost immediately blocked by a CAPTCHA.
- Using a premium proxy service (e.g., Residential Proxies) solves most blocking issues, but you should still expect occasional CAPTCHAs and slow page loads.
Challenge 2 – Layout Changes Break Everything
Google frequently updates its page layouts and HTML selectors. Your selectors will inevitably break, causing scraping failures. For occasional scraping this might be manageable, but for production use (hundreds of queries daily) constantly updating selectors becomes a significant maintenance burden.
Challenge 3 – Geo and Language Mismatches
AI Mode responses are heavily region‑dependent, so selecting proxies with the correct geolocation is critical for accurate results. Some proxy providers let you specify the proxy’s geolocation, making them ideal for this use case. Additionally, you’ll need to set the Accept-Language header in your requests to match your target locale.
Challenge 4 – Longer, High‑Maintenance Code
These challenges result in complex code that requires constant upkeep: high‑quality proxies, selector updates, performance monitoring, and resource‑intensive browsers (Playwright/Selenium) that consume significant CPU and memory. The maintenance overhead quickly exceeds initial expectations, making custom scrapers impractical for production environments.
Custom AI Mode Web Scraper
To create a Google AI Mode scraper, you have three popular headless‑browser options (see the comparison table on the linked page): Selenium, Playwright, and Puppeteer. We’ll focus on Playwright because it’s popular, easy to use, and offers several advantages for modern web scraping.
Install the Stealth Version of Playwright
pip install playwright-stealth
Note: The stealth plugin helps bypass some of Google’s bot‑detection mechanisms.
The code below works today, but expect it to break over time due to selector changes, blocking issues, and other factors discussed earlier.
import json
from playwright.sync_api import sync_playwright
from playwright_stealth import Stealth
query = "most comfortable sneakers for running"
with sync_playwright() as p:
# Launch a headless Chromium browser with stealth capabilities
browser = p.chromium.launch(headless=True)
context = browser.new_context()
Stealth(context) # Apply stealth tricks
# Set proxy and locale (example values – replace with your own)
context.set_default_navigation_timeout(60000)
context.set_extra_http_headers({
"Accept-Language": "en-US,en;q=0.9"
})
# If you need a proxy, uncomment and configure:
# context = browser.new_context(proxy={"server": "http://my-proxy:3128"})
page = context.new_page()
# Navigate to Google and trigger AI Mode
page.goto("https://www.google.com")
page.wait_for_load_state("networkidle")
# Accept cookies / dismiss dialogs if they appear
try:
page.click("text=I agree")
except Exception:
pass
# Type the query and press Enter
page.fill("input[name='q']", query)
page.keyboard.press("Enter")
page.wait_for_load_state("networkidle")
# Click the “AI Mode” button (selector may change)
try:
page.click("text=AI Mode")
page.wait_for_load_state("networkidle")
except Exception as e:
print("AI Mode button not found:", e)
# Extract the answer, links, and citations
result = {
"prompt": query,
"answer": None,
"links": [],
"citations": []
}
# The selectors below are examples – inspect the page to get the current ones
try:
result["answer"] = page.inner_text("css=div[data-tts='answer']")
except Exception:
pass
# Extract links inside the answer
link_elements = page.query_selector_all("css=div[data-tts='answer'] a")
for el in link_elements:
href = el.get_attribute("href")
if href:
result["links"].append(href)
# Extract citation URLs (usually at the bottom of the AI response)
citation_elements = page.query_selector_all("css=div[data-tts='citation'] a")
for el in citation_elements:
href = el.get_attribute("href")
if href:
result["citations"].append(href)
print(json.dumps(result, indent=2))
# Clean up
context.close()
browser.close()
Key points to remember
- Proxy & Locale – Use residential proxies with the correct geolocation and set the
Accept-Languageheader. - Stealth – The
playwright-stealthpackage helps reduce the chance of being flagged as a bot. - Selector Maintenance – Regularly verify the CSS/XPath selectors; Google changes them frequently.
- Error Handling – Wrap interactions in
try/exceptblocks to gracefully handle CAPTCHAs or missing elements.
Next Steps
- Scale up – Move the scraper to a queue‑based architecture (e.g., RabbitMQ + worker pool) to handle many queries concurrently.
- CAPTCHA Solving – Integrate a third‑party CAPTCHA solving service for the occasional challenge.
- Monitoring – Set up alerts for selector failures, increased latency, or proxy bans.
By following the approach above—and staying vigilant about Google’s frequent UI changes—you can build a functional, production‑ready pipeline for extracting Google AI Mode results.
Google AI‑Mode Scraper (Playwright)
import json
from playwright.sync_api import sync_playwright
from playwright_stealth import Stealth
query = "your search query here"
with sync_playwright() as p:
browser = p.chromium.launch(
headless=False,
args=[
"--disable-blink-features=AutomationControlled",
"--disable-dev-shm-usage",
"--no-sandbox",
],
# Uncomment this to use proxies.
# proxy={
# "server": "http://pr.oxylabs.io:7777",
# "username": "customer-USERNAME",
# "password": "PASSWORD",
# }
)
context = browser.new_context(
user_agent=(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/143.0.0.0 Safari/537.36"
)
)
page = context.new_page()
Stealth().use_sync(page)
page.goto(
f"https://www.google.com/search?q={query.replace(' ', '+')}"
"&udm=50&hl=en&gl=US"
)
page.wait_for_load_state("networkidle")
container = None
text_content = ""
# Look for a suitable result container.
candidates = page.locator("#search div, #rso > div, div[role='main'] div").all()
for candidate in candidates[:30]:
if not candidate.is_visible():
continue
text = candidate.inner_text()
if len(text) > 200 and "http" not in text[:100]:
container = candidate
text_content = text
break
# Fallback: try to locate the query text directly.
if not container:
match = page.get_by_text(query).first
if match.is_visible():
container = match.locator("xpath=./ancestor::div[3]")
text_content = container.inner_text()
# Final fallback: use the whole page body.
if not container or len(text_content) < 100:
container = page.locator("body")
text_content = page.inner_text("body")
# Extract links from the chosen container.
links = []
if container:
for link in container.locator("a").all():
href = link.get_attribute("href")
title = link.inner_text()
if href and href.startswith("http"):
links.append({"title": title.strip(), "url": href})
output_data = {
"content": text_content.strip(),
"links": list({l["url"]: l for l in links}.values()),
}
print(json.dumps(output_data, indent=2))
with open("ai_mode_data.json", "w") as f:
json.dump(output_data, f, indent=2)
browser.close()
print("Done!")
Running the code should save a JSON file that contains the scraped AI‑Mode response and citations.
Note: A CAPTCHA or other blocks may hinder execution.
The Best Solution: AI‑Mode Scraper API
Custom code can be overly complex, lengthy, and unreliable. A much simpler approach is to use a dedicated service like Oxylabs Web Scraper API, which includes built‑in support for Google AI‑Mode scraping. The API handles proxies, browser rendering, CAPTCHAs, and selector changes for you.
Install the requests library
pip install requests
Minimal API example
import json
import requests
# API parameters.
payload = {
"source": "google_ai_mode",
"query": "most comfortable sneakers for running",
"render": "html",
"parse": True,
"geo_location": "United States",
}
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
# Free trial available at dashboard.oxylabs.io
auth=("USERNAME", "PASSWORD"),
json=payload,
)
response.raise_for_status()
print(response.text)
with open("AI_Mode_scraper_data.json", "w") as f:
json.dump(response.json(), f, indent=2)
print("Done!")
After execution, the saved JSON file will contain a structured result similar to the screenshot below (links collapsed for brevity):

Understanding the payload
payload = {
"source": "google_ai_mode",
"query": "most comfortable sneakers for running",
"render": "html",
"parse": True,
"geo_location": "United States",
}
source– selects the scraper to use;google_ai_modefetches AI‑Mode results.render– returns the fully rendered HTML, ensuring all dynamic content is loaded.parse– enables automatic data parsing, so you don’t need custom parsers.geo_location– localizes results. You can specify any country, state, city, or precise coordinates, e.g.:
"geo_location": "New York,New York,United States"
With a single subscription you also gain access to many other pre‑built sources (Google Search, Amazon, ChatGPT, etc.) and can scale to hundreds or thousands of requests without worrying about blocks, interruptions, or maintenance.
Exit fullscreen mode
For more details, see the AI Mode scraper documentation.
Advantages of using a web scraping API
The Google AI Mode scraper API makes AI‑response extraction effortless, with no custom code required. Here’s why:
- No infrastructure to maintain – No browsers to manage, no retry logic to look after, no IP rotation to code yourself. Just send an API request and get your results.
- Premium proxies under the hood – The API has built‑in proxy servers that are managed by a smart ML‑driven engine, handling proxy management and CAPTCHAs for you.
- Resilience to Google layout changes – When Google updates its UI, Oxylabs updates its backend. Your code stays untouched.
Final Thoughts
Scraping Google AI Mode can be straightforward or challenging, depending on the approach you choose.
- Writing your own code gives you full control, but maintenance becomes a burden over time.
- A custom solution requires:
- Smart browser‑environment management
- Logic to bypass strict anti‑scraping systems
- Integration of premium proxy servers
- Custom data parsing
- Continuous maintenance, among many other considerations.
The Oxylabs Web Scraper API handles all of these hurdles for you. Just send a request and receive parsed data in seconds. The API also includes pre‑built scrapers and parsers for popular sites like Google Search, Amazon, and ChatGPT, so you don’t have to build and maintain separate solutions for each website.
