Scraper worked on my laptop. Deployed to server and got instant 403s.

Published: (March 31, 2026 at 10:46 AM EDT)
2 min read
Source: Dev.to

Source: Dev.to

What broke

The target site was checking the User-Agent header. My laptop sent requests with a normal browser user agent because I was using Playwright for something else and had set it globally in my profile.

The server, a fresh Ubuntu install, used the default Python requests User-Agent:

python-requests/2.31.0

The site rejected that and returned 403 Forbidden for every request.

Fixed it

Added a custom User-Agent to the request headers:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}

response = requests.get('https://example.com/products', headers=headers)

if response.status_code == 200:
    # Parse the data
    products = response.json()
else:
    print(f"Failed: {response.status_code}")

With this change the site started returning 200 OK again.

Other things that sometimes matter

Besides User-Agent, some sites also check:

  • Referer header – they may require a valid referer to allow the request.

    headers = {
        'User-Agent': 'Mozilla/5.0...',
        'Referer': 'https://example.com/'
    }
  • Accept headers – real browsers send a variety of accept headers.

    headers = {
        'User-Agent': 'Mozilla/5.0...',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate, br'
    }

Most of the time, setting a proper User-Agent is enough. When it isn’t, adding these additional headers usually resolves the issue.

Tip: Always check response.status_code before parsing the response. This prevents trying to parse an error page (e.g., a 403) as JSON and encountering confusing parsing errors.

0 views
Back to Blog

Related posts

Read more »