I got rate-limited scraping 100 pages. Here's what actually worked

Published: (March 30, 2026 at 04:13 PM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Background

I needed product data from an e-commerce site – just the name, price, and availability. Their API required an enterprise plan ($500 / month), so I decided to scrape the public pages instead.

My first run was impatient: I sent requests as fast as possible and got rate‑limited on page 47, losing all the data and having to start over.


First attempt

import requests
from bs4 import BeautifulSoup

for page in range(1, 101):
    response = requests.get(f'https://example.com/products?page={page}')
    soup = BeautifulSoup(response.text, 'html.parser')
    # Extract data...

Result: banned at page 47, zero data collected.


What actually worked

1. Add random delays

import time
import random

time.sleep(random.uniform(2, 5))  # 2–5 second delays

2. Rotate user agents

import random
import requests

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...',
    # Add 3–4 more
]

headers = {'User-Agent': random.choice(user_agents)}
response = requests.get(url, headers=headers)

3. Save progress

import json

with open('progress.json', 'w') as f:
    json.dump({'last_page': page, 'data': results}, f)

If the scraper crashes, you can restart from the last saved page instead of starting from page 1.


Results

  • Scraping slowly (with delays, rotating UA, and periodic saves) avoided bans.
  • User‑agent rotation matters because many sites check this header.
  • Saving progress every 10–20 pages prevents total data loss.
  • The second run completed all 100 pages in about 15 minutes (instead of the 2 minutes the fast run attempted).

For larger jobs I now use tools like ParseForge that handle throttling and rotation automatically, but the above approach works well for smaller projects.

0 views
Back to Blog

Related posts

Read more »