I got rate-limited scraping 100 pages. Here's what actually worked

Published: 1 month ago (March 30, 2026 at 04:13 PM EDT)

2 min read

Source: Dev.to

Source: Dev.to

Background

I needed product data from an e-commerce site – just the name, price, and availability. Their API required an enterprise plan ($500 / month), so I decided to scrape the public pages instead.

My first run was impatient: I sent requests as fast as possible and got rate‑limited on page 47, losing all the data and having to start over.

First attempt

import requests
from bs4 import BeautifulSoup

for page in range(1, 101):
    response = requests.get(f'https://example.com/products?page={page}')
    soup = BeautifulSoup(response.text, 'html.parser')
    # Extract data...

Result: banned at page 47, zero data collected.

What actually worked

1. Add random delays

import time
import random

time.sleep(random.uniform(2, 5))  # 2–5 second delays

2. Rotate user agents

import random
import requests

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...',
    # Add 3–4 more
]

headers = {'User-Agent': random.choice(user_agents)}
response = requests.get(url, headers=headers)

3. Save progress

import json

with open('progress.json', 'w') as f:
    json.dump({'last_page': page, 'data': results}, f)

If the scraper crashes, you can restart from the last saved page instead of starting from page 1.

Results

Scraping slowly (with delays, rotating UA, and periodic saves) avoided bans.
User‑agent rotation matters because many sites check this header.
Saving progress every 10–20 pages prevents total data loss.
The second run completed all 100 pages in about 15 minutes (instead of the 2 minutes the fast run attempted).

For larger jobs I now use tools like ParseForge that handle throttling and rotation automatically, but the above approach works well for smaller projects.

I got rate-limited scraping 100 pages. Here's what actually worked

Background

First attempt

What actually worked

1. Add random delays

2. Rotate user agents

3. Save progress

Results

Related posts

How to Scrape Real Estate Data in 2026: Zillow, Redfin, Realtor.com, and Trulia

USPTO Has a Free Patent API — Search 8M+ Patents (No Key Required)

I spent 8 weeks building a Nash Equilibrium calculator from scratch in Python

Solving the venv headache with a small utility?