How to scrape Google AI Mode: Detailed Guide in 2025

Published: 1 week ago (December 17, 2025 at 07:34 AM EST)

9 min read

Source: Dev.to

Introduction

Google AI Mode has emerged as one of the fastest and most comprehensive AI search experiences available. Unlike standalone chatbots like ChatGPT and Claude that rely on their training data, AI Mode uses live Google Search results and a “query fan‑out” technique to simultaneously search multiple data sources in real‑time. Because both the Gemini AI model and the search infrastructure are developed by Google, the system seamlessly integrates capabilities from Google Search, Lens, and Image Search for exceptionally fast performance.

For SEO professionals and businesses, AI Mode represents a critical shift in how users discover content. This emerging field, known as GEO (Generative Engine Optimization), focuses on appearing in AI‑generated responses rather than traditional search results. Unlike the classic top‑10 rankings, AI Mode draws from a much broader pool of sources, creating opportunities for brands to get featured even if they don’t rank on page one. When your brand appears in these AI responses, it can:

Drive traffic
Generate qualified leads
Influence purchase decisions at the exact moment users are researching solutions

Tracking AI Mode visibility is quickly becoming as important as monitoring traditional search rankings.

In this article we’ll explore methods for scraping Google AI Mode results. We’ll start with a custom scraper that uses Playwright and proxy servers, then look at a more scalable, production‑ready solution that works reliably at scale without constant maintenance.

What Google AI Mode Contains

Let’s begin by understanding the information that Google AI Mode provides. It contains the following data points:

Prompt – the user’s query
Answer – the AI‑generated response
Links – URLs referenced in the answer
Citations – links to the source pages

Most importantly, AI Mode responses vary by region. The same query will return different results depending on whether you’re searching from the United States, France, or any other location. As mentioned previously, all these data points and the ability to localize responses are essential for GEO and AI Search tracking.

In this article we’ll use Python as our primary coding language. The techniques shown can be adapted to other languages as needed. With this background in mind, let’s start with the first method: writing custom code.

Challenges of Web Scraping Google AI Mode

A simple implementation won’t work for scraping AI Mode. There are several reasons for this:

Challenge 1 – Google’s Anti‑Scraping Detection

Requests without proxies are almost immediately blocked by a CAPTCHA.
Using a premium proxy service (e.g., Residential Proxies) solves most blocking issues, but you should still expect occasional CAPTCHAs and slow page loads.

Challenge 2 – Layout Changes Break Everything

Google frequently updates its page layouts and HTML selectors. Your selectors will inevitably break, causing scraping failures. For occasional scraping this might be manageable, but for production use (hundreds of queries daily) constantly updating selectors becomes a significant maintenance burden.

Challenge 3 – Geo and Language Mismatches

AI Mode responses are heavily region‑dependent, so selecting proxies with the correct geolocation is critical for accurate results. Some proxy providers let you specify the proxy’s geolocation, making them ideal for this use case. Additionally, you’ll need to set the Accept-Language header in your requests to match your target locale.

Challenge 4 – Longer, High‑Maintenance Code

These challenges result in complex code that requires constant upkeep: high‑quality proxies, selector updates, performance monitoring, and resource‑intensive browsers (Playwright/Selenium) that consume significant CPU and memory. The maintenance overhead quickly exceeds initial expectations, making custom scrapers impractical for production environments.

Custom AI Mode Web Scraper

To create a Google AI Mode scraper, you have three popular headless‑browser options (see the comparison table on the linked page): Selenium, Playwright, and Puppeteer. We’ll focus on Playwright because it’s popular, easy to use, and offers several advantages for modern web scraping.

Install the Stealth Version of Playwright

pip install playwright-stealth

Note: The stealth plugin helps bypass some of Google’s bot‑detection mechanisms.

The code below works today, but expect it to break over time due to selector changes, blocking issues, and other factors discussed earlier.

import json
from playwright.sync_api import sync_playwright
from playwright_stealth import Stealth

query = "most comfortable sneakers for running"

with sync_playwright() as p:
    # Launch a headless Chromium browser with stealth capabilities
    browser = p.chromium.launch(headless=True)
    context = browser.new_context()
    Stealth(context)                     # Apply stealth tricks

    # Set proxy and locale (example values – replace with your own)
    context.set_default_navigation_timeout(60000)
    context.set_extra_http_headers({
        "Accept-Language": "en-US,en;q=0.9"
    })
    # If you need a proxy, uncomment and configure:
    # context = browser.new_context(proxy={"server": "http://my-proxy:3128"})

    page = context.new_page()

    # Navigate to Google and trigger AI Mode
    page.goto("https://www.google.com")
    page.wait_for_load_state("networkidle")

    # Accept cookies / dismiss dialogs if they appear
    try:
        page.click("text=I agree")
    except Exception:
        pass

    # Type the query and press Enter
    page.fill("input[name='q']", query)
    page.keyboard.press("Enter")
    page.wait_for_load_state("networkidle")

    # Click the “AI Mode” button (selector may change)
    try:
        page.click("text=AI Mode")
        page.wait_for_load_state("networkidle")
    except Exception as e:
        print("AI Mode button not found:", e)

    # Extract the answer, links, and citations
    result = {
        "prompt": query,
        "answer": None,
        "links": [],
        "citations": []
    }

    # The selectors below are examples – inspect the page to get the current ones
    try:
        result["answer"] = page.inner_text("css=div[data-tts='answer']")
    except Exception:
        pass

    # Extract links inside the answer
    link_elements = page.query_selector_all("css=div[data-tts='answer'] a")
    for el in link_elements:
        href = el.get_attribute("href")
        if href:
            result["links"].append(href)

    # Extract citation URLs (usually at the bottom of the AI response)
    citation_elements = page.query_selector_all("css=div[data-tts='citation'] a")
    for el in citation_elements:
        href = el.get_attribute("href")
        if href:
            result["citations"].append(href)

    print(json.dumps(result, indent=2))

    # Clean up
    context.close()
    browser.close()

Key points to remember

Proxy & Locale – Use residential proxies with the correct geolocation and set the Accept-Language header.
Stealth – The playwright-stealth package helps reduce the chance of being flagged as a bot.
Selector Maintenance – Regularly verify the CSS/XPath selectors; Google changes them frequently.
Error Handling – Wrap interactions in try/except blocks to gracefully handle CAPTCHAs or missing elements.

Next Steps

Scale up – Move the scraper to a queue‑based architecture (e.g., RabbitMQ + worker pool) to handle many queries concurrently.
CAPTCHA Solving – Integrate a third‑party CAPTCHA solving service for the occasional challenge.
Monitoring – Set up alerts for selector failures, increased latency, or proxy bans.

By following the approach above—and staying vigilant about Google’s frequent UI changes—you can build a functional, production‑ready pipeline for extracting Google AI Mode results.

Google AI‑Mode Scraper (Playwright)

import json
from playwright.sync_api import sync_playwright
from playwright_stealth import Stealth

query = "your search query here"

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=False,
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-dev-shm-usage",
            "--no-sandbox",
        ],
        # Uncomment this to use proxies.
        # proxy={
        #     "server": "http://pr.oxylabs.io:7777",
        #     "username": "customer-USERNAME",
        #     "password": "PASSWORD",
        # }
    )
    context = browser.new_context(
        user_agent=(
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/143.0.0.0 Safari/537.36"
        )
    )
    page = context.new_page()
    Stealth().use_sync(page)

    page.goto(
        f"https://www.google.com/search?q={query.replace(' ', '+')}"
        "&udm=50&hl=en&gl=US"
    )
    page.wait_for_load_state("networkidle")

    container = None
    text_content = ""

    # Look for a suitable result container.
    candidates = page.locator("#search div, #rso > div, div[role='main'] div").all()
    for candidate in candidates[:30]:
        if not candidate.is_visible():
            continue
        text = candidate.inner_text()
        if len(text) > 200 and "http" not in text[:100]:
            container = candidate
            text_content = text
            break

    # Fallback: try to locate the query text directly.
    if not container:
        match = page.get_by_text(query).first
        if match.is_visible():
            container = match.locator("xpath=./ancestor::div[3]")
            text_content = container.inner_text()

    # Final fallback: use the whole page body.
    if not container or len(text_content) < 100:
        container = page.locator("body")
        text_content = page.inner_text("body")

    # Extract links from the chosen container.
    links = []
    if container:
        for link in container.locator("a").all():
            href = link.get_attribute("href")
            title = link.inner_text()
            if href and href.startswith("http"):
                links.append({"title": title.strip(), "url": href})

    output_data = {
        "content": text_content.strip(),
        "links": list({l["url"]: l for l in links}.values()),
    }

    print(json.dumps(output_data, indent=2))

    with open("ai_mode_data.json", "w") as f:
        json.dump(output_data, f, indent=2)

    browser.close()
print("Done!")

Running the code should save a JSON file that contains the scraped AI‑Mode response and citations.
Note: A CAPTCHA or other blocks may hinder execution.

The Best Solution: AI‑Mode Scraper API

Custom code can be overly complex, lengthy, and unreliable. A much simpler approach is to use a dedicated service like Oxylabs Web Scraper API, which includes built‑in support for Google AI‑Mode scraping. The API handles proxies, browser rendering, CAPTCHAs, and selector changes for you.

Install the `requests` library

pip install requests

Minimal API example

import json
import requests

# API parameters.
payload = {
    "source": "google_ai_mode",
    "query": "most comfortable sneakers for running",
    "render": "html",
    "parse": True,
    "geo_location": "United States",
}

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    # Free trial available at dashboard.oxylabs.io
    auth=("USERNAME", "PASSWORD"),
    json=payload,
)
response.raise_for_status()
print(response.text)

with open("AI_Mode_scraper_data.json", "w") as f:
    json.dump(response.json(), f, indent=2)

print("Done!")

After execution, the saved JSON file will contain a structured result similar to the screenshot below (links collapsed for brevity):

JSON result

Understanding the payload

payload = {
    "source": "google_ai_mode",
    "query": "most comfortable sneakers for running",
    "render": "html",
    "parse": True,
    "geo_location": "United States",
}

source – selects the scraper to use; google_ai_mode fetches AI‑Mode results.
render – returns the fully rendered HTML, ensuring all dynamic content is loaded.
parse – enables automatic data parsing, so you don’t need custom parsers.
geo_location – localizes results. You can specify any country, state, city, or precise coordinates, e.g.:

"geo_location": "New York,New York,United States"

With a single subscription you also gain access to many other pre‑built sources (Google Search, Amazon, ChatGPT, etc.) and can scale to hundreds or thousands of requests without worrying about blocks, interruptions, or maintenance.

Exit fullscreen mode

For more details, see the AI Mode scraper documentation.

Advantages of using a web scraping API

The Google AI Mode scraper API makes AI‑response extraction effortless, with no custom code required. Here’s why:

No infrastructure to maintain – No browsers to manage, no retry logic to look after, no IP rotation to code yourself. Just send an API request and get your results.
Premium proxies under the hood – The API has built‑in proxy servers that are managed by a smart ML‑driven engine, handling proxy management and CAPTCHAs for you.
Resilience to Google layout changes – When Google updates its UI, Oxylabs updates its backend. Your code stays untouched.

Final Thoughts

Scraping Google AI Mode can be straightforward or challenging, depending on the approach you choose.

Writing your own code gives you full control, but maintenance becomes a burden over time.
A custom solution requires:
- Smart browser‑environment management
- Logic to bypass strict anti‑scraping systems
- Integration of premium proxy servers
- Custom data parsing
- Continuous maintenance, among many other considerations.

The Oxylabs Web Scraper API handles all of these hurdles for you. Just send a request and receive parsed data in seconds. The API also includes pre‑built scrapers and parsers for popular sites like Google Search, Amazon, and ChatGPT, so you don’t have to build and maintain separate solutions for each website.

How to scrape Google AI Mode: Detailed Guide in 2025

Introduction

What Google AI Mode Contains

Challenges of Web Scraping Google AI Mode

Challenge 1 – Google’s Anti‑Scraping Detection

Challenge 2 – Layout Changes Break Everything

Challenge 3 – Geo and Language Mismatches

Challenge 4 – Longer, High‑Maintenance Code

Custom AI Mode Web Scraper

Install the Stealth Version of Playwright

Key points to remember

Next Steps

Google AI‑Mode Scraper (Playwright)

The Best Solution: AI‑Mode Scraper API

Install the `requests` library

Minimal API example

Understanding the payload

Exit fullscreen mode

Advantages of using a web scraping API

Final Thoughts

Related posts

Replacing Phone Addiction with Building a Real Project

A Definitive Guide to Warehouse Utilisation

CinemaSins: Everything Wrong With Red One In 18 Minutes Or Less

Ingesting 100M Heartbeats: Scaling Wearable Tech Without Going Broke

Introduction

What Google AI Mode Contains

Challenges of Web Scraping Google AI Mode

Challenge 1 – Google’s Anti‑Scraping Detection

Challenge 2 – Layout Changes Break Everything

Challenge 3 – Geo and Language Mismatches

Challenge 4 – Longer, High‑Maintenance Code

Custom AI Mode Web Scraper

Install the Stealth Version of Playwright

Key points to remember

Next Steps

Google AI‑Mode Scraper (Playwright)

The Best Solution: AI‑Mode Scraper API

Install the requests library

Minimal API example

Understanding the payload

Exit fullscreen mode

Advantages of using a web scraping API

Final Thoughts

Related posts

Replacing Phone Addiction with Building a Real Project

A Definitive Guide to Warehouse Utilisation

CinemaSins: Everything Wrong With Red One In 18 Minutes Or Less

Ingesting 100M Heartbeats: Scaling Wearable Tech Without Going Broke

Challenge 1 – Google’s Anti‑Scraping Detection

Challenge 2 – Layout Changes Break Everything

Challenge 3 – Geo and Language Mismatches

Challenge 4 – Longer, High‑Maintenance Code

Install the `requests` library