How to Bypass reCAPTCHA and Turnstile in Crawlee with CapSolver

Published: (December 24, 2025 at 03:47 AM EST)
8 min read
Source: Dev.to

Source: Dev.to

TL;DR

Modern web scraping with Crawlee is often halted by aggressive CAPTCHA challenges. By integrating CapSolver, you can programmatically bypass reCAPTCHA, Turnstile, and other anti‑bot mechanisms, keeping your scraping workflows stable and fully automated.

When developing robust web crawlers with libraries like Crawlee, encountering CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is inevitable. Aggressive bot‑protection services—including Google’s reCAPTCHA and Cloudflare’s Turnstile—are designed to block automated access, often bringing even the most sophisticated Playwright or Puppeteer crawlers to a standstill.

This guide provides a practical, code‑focused approach to integrating CapSolver with Crawlee to automatically detect and bypass these common CAPTCHA types. We’ll inject the solution tokens directly into the page context, allowing your crawler to proceed as if a human had completed the challenge.

Crawlee is a powerful, open‑source web‑scraping and browser‑automation library for Node.js. It’s built to create reliable, production‑ready crawlers that can mimic human behavior and evade basic bot detection.

Features

FeatureDescription
Unified APIA single interface for both fast HTTP‑based crawling (Cheerio) and full browser automation (Playwright/Puppeteer).
Anti‑Bot StealthBuilt‑in features for automatic browser fingerprint generation and session management to appear human‑like.
Smart QueuePersistent request‑queue management for breadth‑first or depth‑first crawling.
Proxy RotationSeamless integration with proxy providers for IP rotation and avoiding blocks.

Crawlee’s strength lies in its ability to handle complex navigation, but when a hard CAPTCHA barrier is hit, an external service is required.

CapSolver is a leading CAPTCHA‑bypass service that uses AI to solve various challenges quickly and accurately. It provides a simple REST API that makes it ideal for integration into automated workflows like Crawlee.

CapSolver supports a wide array of challenges

  • reCAPTCHA v2 (Checkbox and Invisible)
  • reCAPTCHA v3 (Score‑based)
  • Cloudflare Turnstile
  • AWS WAF

3️⃣ Core Integration: Setting up the CapSolver Service

1️⃣ Install the required packages

npm install crawlee playwright axios
# or
yarn add crawlee playwright axios

2️⃣ Create the service class (capsolver-service.ts)

// capsolver-service.ts
import axios from 'axios';

const CAPSOLVER_API_KEY = 'YOUR_CAPSOLVER_API_KEY';

interface TaskResult {
    status: string;
    solution?: {
        gRecaptchaResponse?: string;
        token?: string;
    };
    errorDescription?: string;
}

class CapSolverService {
    private apiKey: string;
    private baseUrl = 'https://api.capsolver.com';

    constructor(apiKey: string = CAPSOLVER_API_KEY) {
        this.apiKey = apiKey;
    }

    /** 1️⃣ Create a new CAPTCHA task and return the task ID */
    async createTask(taskData: object): Promise {
        const response = await axios.post(`${this.baseUrl}/createTask`, {
            clientKey: this.apiKey,
            task: taskData,
        });

        if (response.data.errorId !== 0) {
            throw new Error(`CapSolver error: ${response.data.errorDescription}`);
        }

        return response.data.taskId;
    }

    /** 2️⃣ Poll the API until the task is ready or fails */
    async getTaskResult(taskId: string, maxAttempts = 60): Promise {
        for (let i = 0; i  {
            return new Promise((resolve) => setTimeout(resolve, ms));
        }

        /** 3️⃣ Bypass reCAPTCHA v2 */
        async bypassReCaptchaV2(websiteUrl: string, websiteKey: string): Promise {
            const taskId = await this.createTask({
                type: 'ReCaptchaV2TaskProxyLess',
                websiteURL: websiteUrl,
                websiteKey,
            });

            const result = await this.getTaskResult(taskId);
            return result.solution?.gRecaptchaResponse ?? '';
        }

        /** 4️⃣ Bypass Cloudflare Turnstile */
        async bypassTurnstile(websiteUrl: string, websiteKey: string): Promise {
            const taskId = await this.createTask({
                type: 'AntiTurnstileTaskProxyLess',
                websiteURL: websiteUrl,
                websiteKey,
            });

            const result = await this.getTaskResult(taskId);
            return result.solution?.token ?? '';
        }

        /** 5️⃣ Bypass reCAPTCHA v3 */
        async bypassReCaptchaV3(
            websiteUrl: string,
            websiteKey: string,
            pageAction = 'submit'
        ): Promise {
            const taskId = await this.createTask({
                type: 'ReCaptchaV3TaskProxyLess',
                websiteURL: websiteUrl,
                websiteKey,
                pageAction,
            });

            const result = await this.getTaskResult(taskId);
            return result.solution?.gRecaptchaResponse ?? '';
        }
    }

export const capSolver = new CapSolverService();

Core Logic Overview

  1. Detect the CAPTCHA element on the page.
  2. Extract the data-sitekey and the page URL.
  3. Call the appropriate capSolver.bypass… method to obtain a token.
  4. Inject the returned token into the hidden form field.
  5. Submit the form to continue the scraping process.

Note:

  • reCAPTCHA v2 is typically visible as a checkbox. The token must be injected into the hidden “.
  • Cloudflare Turnstile uses a different hidden input field (cf-turnstile-response).

reCAPTCHA v2 Example

import { PlaywrightCrawler, Dataset } from 'crawlee';
import { capSolver } from './capsolver-service';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page, request, log }) {
        log.info(`Processing ${request.url}`);

        // 1️⃣ Detect reCAPTCHA v2 element
        const hasRecaptcha = await page.$('.g-recaptcha');

        if (hasRecaptcha) {
            log.info('reCAPTCHA v2 detected, initiating bypass...');

            // 2️⃣ Extract the site key
            const siteKey = await page.$eval(
                '.g-recaptcha',
                el => el.getAttribute('data-sitekey')
            );

            if (siteKey) {
                // 3️⃣ Get the bypass token from CapSolver
                const token = await capSolver.bypassReCaptchaV2(request.url, siteKey);

                // 4️⃣ Inject the token into the hidden textarea
                await page.$eval(
                    '#g-recaptcha-response',
                    (el: HTMLTextAreaElement, t: string) => {
                        // Optional: make it visible for debugging
                        el.style.display = 'block';
                        el.value = t;
                    },
                    token
                );

                // 5️⃣ Submit the form
                await page.click('button[type="submit"]');
                await page.waitForLoadState('networkidle');

                log.info('reCAPTCHA v2 successfully bypassed!');
            }
        }

        // Continue with data extraction…
        const title = await page.title();
        await Dataset.pushData({ title, url: request.url });
    },
});

await crawler.run(['https://example.com/protected-page']);

Cloudflare Turnstile Example

import { PlaywrightCrawler, Dataset } from 'crawlee';
import { capSolver } from './capsolver-service';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page, request, log }) {
        log.info(`Processing ${request.url}`);

        // 1️⃣ Detect Turnstile widget
        const hasTurnstile = await page.$('.cf-turnstile');

        if (hasTurnstile) {
            log.info('Cloudflare Turnstile detected, initiating bypass...');

            // 2️⃣ Extract the site key
            const siteKey = await page.$eval(
                '.cf-turnstile',
                el => el.getAttribute('data-sitekey')
            );

            if (siteKey) {
                // 3️⃣ Get the bypass token
                const token = await capSolver.bypassTurnstile(request.url, siteKey);

                // 4️⃣ Inject token into the hidden input
                await page.$eval(
                    'input[name="cf-turnstile-response"]',
                    (el: HTMLInputElement, t: string) => {
                        el.value = t;
                    },
                    token
                );

                // 5️⃣ Submit the form
                await page.click('button[type="submit"]');
                await page.waitForLoadState('networkidle');

                log.info('Turnstile successfully bypassed!');
            }
        }

        // Continue with data extraction…
        const title = await page.title();
        await Dataset.pushData({ title, url: request.url });
    },
});

await crawler.run(['https://example.com/turnstile-protected']);

Dynamic CAPTCHA Detection & Bypass (Production‑Grade)

Instead of writing separate handlers for each CAPTCHA type, you can create a utility that automatically detects the challenge and calls the correct bypass method.

// ------------------------------------------------
// Types & Helper
// ------------------------------------------------
interface CaptchaInfo {
    type: 'recaptcha-v2' | 'recaptcha-v3' | 'turnstile' | 'none';
    siteKey: string | null;
}

// ------------------------------------------------
// Detection
// ------------------------------------------------
async function detectCaptcha(page: any): Promise {
    // reCAPTCHA v2
    const recaptchaV2 = await page.$('.g-recaptcha');
    if (recaptchaV2) {
        const siteKey = await page.$eval(
            '.g-recaptcha',
            (el: Element) => el.getAttribute('data-sitekey')
        );
        return { type: 'recaptcha-v2', siteKey };
    }

    // Turnstile
    const turnstile = await page.$('.cf-turnstile');
    if (turnstile) {
        const siteKey = await page.$eval(
            '.cf-turnstile',
            (el: Element) => el.getAttribute('data-sitekey')
        );
        return { type: 'turnstile', siteKey };
    }

    // reCAPTCHA v3 (identified by script tag)
    const recaptchaV3Script = await page.$('script[src*="recaptcha/api.js?render="]');
    if (recaptchaV3Script) {
        const scriptSrc = await recaptchaV3Script.getAttribute('src') ?? '';
        const match = scriptSrc.match(/render=([^&]+)/);
        const siteKey = match ? match[1] : null;
        return { type: 'recaptcha-v3', siteKey };
    }

    // No known CAPTCHA found
    return { type: 'none', siteKey: null };
}

// ------------------------------------------------
// Bypass & Injection
// ------------------------------------------------
async function bypassAndInject(
    page: any,
    url: string,
    captchaInfo: CaptchaInfo
): Promise {
    if (!captchaInfo.siteKey || captchaInfo.type === 'none') return;

    let token: string;

    switch (captchaInfo.type) {
        case 'recaptcha-v2':
            token = await capSolver.bypassReCaptchaV2(url, captchaInfo.siteKey);
            await page.$eval(
                '#g-recaptcha-response',
                (el: HTMLTextAreaElement, t: string) => {
                    el.style.display = 'block';
                    el.value = t;
                },
                token
            );
            break;

        case 'recaptcha-v3':
            token = await capSolver.bypassReCaptchaV3(url, captchaInfo.siteKey);
            await page.$eval(
                'input[name="g-recaptcha-response"]',
                (el: HTMLInputElement, t: string) => {
                    el.value = t;
                },
                token
            );
            break;

        case 'turnstile':
            token = await capSolver.bypassTurnstile(url, captchaInfo.siteKey);
            await page.$eval(
                'input[name="cf-turnstile-response"]',
                (el: HTMLInputElement, t: string) => {
                    el.value = t;
                },
                token
            );
            break;
    }

    // Submit the form (generic selector – adjust if needed)
    await page.click('button[type="submit"]');
    await page.waitForLoadState('networkidle');
}

Using the Helpers in a Crawler

import { PlaywrightCrawler, Dataset } from 'crawlee';
import { capSolver } from './capsolver-service';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page, request, log }) {
        log.info(`Processing ${request.url}`);

        // Detect any supported CAPTCHA
        const captchaInfo = await detectCaptcha(page);
        if (captchaInfo.type !== 'none') {
            log.info(`${captchaInfo.type} detected, bypassing...`);
            await bypassAndInject(page, request.url, captchaInfo);
            log.info(`${captchaInfo.type} successfully bypassed!`);
        }

        // Continue with normal data extraction
        const title = await page.title();
        await Dataset.pushData({ title, url: request.url });
    },
});

await crawler.run(['https://example.com/mixed-protected']);

Takeaways for Production‑Ready Crawlers

  • Error handling – wrap bypass calls in try/catch and implement retries.
  • Session management – reuse authenticated sessions to avoid repeated challenges.
  • Dynamic detection – the detectCaptcha helper lets you support new CAPTCHA types with minimal code changes.
  • Logging & monitoring – keep detailed logs (type detected, success/failure) for easier debugging.

With these patterns you can robustly handle reCAPTCHA v2, reCAPTCHA v3, Cloudflare Turnstile, and extend to additional challenges as they appear.

CAPTCHA Bypass with CapSolver & Crawlee

Below is a concise guide on how to integrate CapSolver into a Crawlee‑based scraper (using Playwright). The code snippets are ready to copy‑paste, and the explanatory text has been reformatted for clarity while preserving the original information.

1. Injecting the Turnstile token

// ... inside your request handler
case 'turnstile':
    // Get the token from CapSolver
    const token = await capSolver.bypassTurnstile(url, captchaInfo.siteKey);

    // Inject the token into the hidden input field
    await page.$eval(
        'input[name="cf-turnstile-response"]',
        (el: HTMLInputElement, t: string) => { el.value = t; },
        token
    );
    break;
}

// Submit the form after the token has been injected
const submitBtn = await page.$('button[type="submit"], input[type="submit"]');
if (submitBtn) {
    await submitBtn.click
Back to Blog

Related posts

Read more »