Scrapy Requests and Responses: The Complete Beginner's Guide (With Secrets the Docs Don't Tell You)

Published: 2 days ago (December 23, 2025 at 02:47 AM EST)

6 min read

Source: Dev.to

1. The Basics

Concept	What it means in Scrapy
Request	An object that says “I want to visit this URL”.
Response	An object that contains what the website sent back (HTML, JSON, etc.).

Think of web scraping like a conversation:

Request:  "Hey website, can you show me this page?"
Response: "Sure, here's the HTML!"

2. What Scrapy does behind the scenes

A minimal spider:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'
    start_urls = ['https://example.com']

    def parse(self, response):
        # Do something with response
        pass

What really happens

def start_requests(self):
    for url in self.start_urls:
        yield scrapy.Request(url=url, callback=self.parse)

Scrapy automatically creates a Request for each URL in start_urls and sends it to parse.

3. Creating Requests Manually

You can build requests yourself to control every detail:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        yield scrapy.Request(
            url='https://example.com',
            callback=self.parse,
            method='GET',
            headers={'User-Agent': 'My Custom Agent'},
            cookies={'session': 'abc123'},
            meta={'page_num': 1},
            dont_filter=False,
            priority=0,
        )

    def parse(self, response):
        # Process response
        pass

3.1 Request Parameters (quick reference)

Parameter	Description
`url`	Target URL (required).
`callback`	Function that will receive the `Response`. Defaults to `parse`.
`method`	HTTP method (`GET`, `POST`, `PUT`, `DELETE`, …). Default: `GET`.
`body`	Raw request body (useful for `POST`, `PUT`).
`headers`	Custom request headers.
`cookies`	Cookies to send with the request.
`meta`	Arbitrary dict passed to the `Response` (`response.meta`). Great for sharing data between callbacks.
`dont_filter`	If `True`, Scrapy will not filter this URL as a duplicate.
`priority`	Integer priority; higher values are processed first (default = 0).

Examples

# 1️⃣ Simple URL
yield scrapy.Request(url='https://example.com/products')

# 2️⃣ Custom callback
yield scrapy.Request(
    url='https://example.com/products',
    callback=self.parse_products,
)

def parse_products(self, response):
    # Handle response here
    pass

# 3️⃣ POST request with JSON body
yield scrapy.Request(
    url='https://example.com/api',
    method='POST',
    body='{"key": "value"}',
    headers={'Content-Type': 'application/json'},
)

# 4️⃣ Custom headers
yield scrapy.Request(
    url='https://example.com',
    headers={
        'User-Agent': 'Mozilla/5.0',
        'Accept': 'text/html',
        'Referer': 'https://google.com',
    },
)

# 5️⃣ Cookies
yield scrapy.Request(
    url='https://example.com',
    cookies={'session_id': '12345', 'user': 'john'},
)

# 6️⃣ Passing data via meta
yield scrapy.Request(
    url='https://example.com/details',
    meta={'product_name': 'Widget', 'price': 29.99},
    callback=self.parse_details,
)

def parse_details(self, response):
    name = response.meta['product_name']
    price = response.meta['price']
    # Do something with name & price

# 7️⃣ Bypass duplicate filter
yield scrapy.Request(
    url='https://example.com',
    dont_filter=True,
)

# 8️⃣ Prioritise a request
yield scrapy.Request(
    url='https://example.com/important',
    priority=10,   # processed before priority 0 requests
)

4. The `Response` Object

When a request finishes, Scrapy passes a Response to the callback. Here’s what you typically get:

def parse(self, response):
    # Basic attributes
    url      = response.url               # Final URL (after redirects)
    body     = response.body              # Raw bytes
    text     = response.text              # Decoded string (default UTF‑8)
    status   = response.status            # HTTP status code (200, 404, …)
    headers  = response.headers          # Response headers (case‑insensitive dict)

    # Links back to the request
    request  = response.request           # The original Request object
    meta     = response.meta              # Meta dict passed from the request

4.1 Selecting data

# CSS selectors (most readable)
titles = response.css('h1.title::text').getall()
first_title = response.css('h1.title::text').get()

# XPath selectors (more powerful)
titles = response.xpath('//h1[@class="title"]/text()').getall()

4.2 Following links

# Manual way (verbose)
next_page = response.css('a.next::attr(href)').get()
if next_page:
    full_url = response.urljoin(next_page)
    yield scrapy.Request(full_url, callback=self.parse)

# Preferred way – `response.follow`
next_page = response.css('a.next::attr(href)').get()
if next_page:
    yield response.follow(next_page, callback=self.parse)

# You can even pass a selector directly:
yield response.follow(
    response.css('a.next::attr(href)').get(),
    callback=self.parse,
)

# Or iterate over all <a> tags:
for link in response.css('a'):
    yield response.follow(link, callback=self.parse_page)

response.follow() automatically:

Handles relative URLs (urljoin internally).
Extracts the href attribute when you give it a selector.
Creates the Request object for you (including default callback).

5. Debugging & Introspection

Sometimes you need to peek at the original request that produced a response (especially after redirects).

def parse(self, response):
    # Original request data
    original_url     = response.request.url
    original_headers = response.request.headers
    original_meta    = response.request.meta

    # Log useful info
    self.logger.info(f'Requested: {original_url}')
    self.logger.info(f'Got back: {response.url}')   # May differ after redirects

TL;DR

Requests are fully configurable objects (url, method, headers, cookies, meta, priority, …).
Responses give you everything you need to extract data (url, body, text, status, headers, plus a back‑reference to the original request).
Use response.follow() for clean, concise link‑following logic.
Leverage meta to pass data between callbacks, and priority/dont_filter to control crawl order and duplicate handling.

Armed with these details, you can move beyond the basics and write robust, efficient Scrapy spiders that do exactly what you need—no hidden surprises. Happy crawling!

Scrapy Quick‑Reference Cheat Sheet

Below is a cleaned‑up, well‑structured collection of useful Scrapy patterns. Everything is kept exactly as in the original snippets – only the formatting has been improved.

1. Working with Response Headers

def parse(self, response):
    # Get all headers
    all_headers = response.headers

    # Get a specific header
    content_type = response.headers.get('Content-Type')

    # Check cookies the server sent back
    cookies = response.headers.getlist('Set-Cookie')

    # Useful for debugging blocks
    server = response.headers.get('Server')
    self.logger.info(f'Server type: {server}')

2. Preserving `meta` Across Redirects

def start_requests(self):
    yield scrapy.Request(
        'https://example.com/redirect',
        meta={'important': 'data'},   # custom meta data
        callback=self.parse
    )

def parse(self, response):
    # Even after a redirect, the meta dict is still there!
    data = response.meta['important']

    # The final URL may be different
    self.logger.info(f'Ended up at: {response.url}')

3. Controlling Crawl Order with priority

def parse_listing(self, response):
    # High priority for product pages (process first)
    for product in response.css('.product'):
        url = product.css('a::attr(href)').get()
        yield response.follow(
            url,
            callback=self.parse_product,
            priority=10               # higher number → earlier processing
        )

    # Low priority for pagination (process later)
    next_page = response.css('.next::attr(href)').get()
    if next_page:
        yield response.follow(
            next_page,
            callback=self.parse_listing,
            priority=0                # default priority
        )

Tip: Use higher priorities for “must‑have” pages and lower ones for pagination or auxiliary content.

4. Submitting Forms – `FormRequest`

a) Simple POST request

import scrapy

class LoginSpider(scrapy.Spider):
    name = 'login'

    def start_requests(self):
        yield scrapy.FormRequest(
            url='https://example.com/login',
            formdata={
                'username': 'myuser',
                'password': 'mypass'
            },
            callback=self.after_login
        )

    def after_login(self, response):
        if 'Welcome' in response.text:
            self.logger.info('Login successful!')
        else:
            self.logger.error('Login failed!')

b) Auto‑fill a form from the page (`from_response`)

class LoginSpider(scrapy.Spider):
    name = 'login'
    start_urls = ['https://example.com/login']

    def parse(self, response):
        # Automatically locate the form, keep hidden fields (e.g., CSRF)
        # and submit the supplied data.
        yield scrapy.FormRequest.from_response(
            response,
            formdata={
                'username': 'myuser',
                'password': 'mypass'
            },
            callback=self.after_login
        )

    def after_login(self, response):
        # Now you're logged in – continue crawling.
        yield response.follow('/dashboard', callback=self.parse_dashboard)

What FormRequest.from_response() does for you

Finds the first <form> element (or the one matching formname/formid).
Extracts all form fields, preserving hidden inputs (e.g., CSRF tokens).
Overwrites the fields you provide in formdata.
Submits the request.

5. Pagination with `meta` (keeping track of the page number)

import scrapy

class ProductSpider(scrapy.Spider):
    name = 'products'

    def start_requests(self):
        yield scrapy.Request(
            'https://example.com/products?page=1',
            meta={'page': 1},
            callback=self.parse
        )

    def parse(self, response):
        page = response.meta['page']
        self.logger.info(f'Scraping page {page}')

        # Scrape products on the current page
        for product in response.css('.product'):
            yield {
                'name': product.css('h2::text').get(),
                'price': product.css('.price::text').get(),
                'page': page
            }

        # Follow the next page, incrementing the page counter
        next_page = response.css('.next::attr(href)').get()
        if next_page:
            yield response.follow(
                next_page,
                meta={'page': page + 1},
                callback=self.parse
            )

6. Chaining Requests – From a listing to a detail page

import scrapy

class DetailSpider(scrapy.Spider):
    name = 'details'
    start_urls = ['https://example.com/products']

    def parse(self, response):
        """Scrape product listings and queue detail pages."""
        for product in response.css('.product'):
            item = {
                'name': product.css('h2::text').get(),
                'price': product.css('.price::text').get()
            }

            detail_url = product.css('a::attr(href)').get()
            yield response.follow(
                detail_url,
                callback=self.parse_detail,
                meta={'item': item}          # pass the partially‑filled item forward
            )

    def parse_detail(self, response):
        """Enrich the item with data from the detail page."""
        item = response.meta['item']
        item['description'] = response.css('.description::text').get()
        item['rating'] = response.css('.rating::text').get()
        item['reviews'] = response.css('.reviews::text').get()
        yield item

Scrapy Requests and Responses: The Complete Beginner's Guide (With Secrets the Docs Don't Tell You)

1. The Basics

2. What Scrapy does behind the scenes

3. Creating Requests Manually

3.1 Request Parameters (quick reference)

4. The `Response` Object

4.1 Selecting data

4.2 Following links

5. Debugging & Introspection

TL;DR

Scrapy Quick‑Reference Cheat Sheet

1. Working with Response Headers

2. Preserving `meta` Across Redirects

3. Controlling Crawl Order with priority

4. Submitting Forms – `FormRequest`

a) Simple POST request

b) Auto‑fill a form from the page (`from_response`)

6. Chaining Requests – From a listing to a detail page

Related posts

Day 28 of improving my Data Science skills

Contributing to Larger Open Source Project - Scrapy

Scraping a Forum With Python Without Triggering Anti-Bot Measures

The Modern Scrapy Developer's Guide (Part 1): Building Your First Spider

1. The Basics

2. What Scrapy does behind the scenes

3. Creating Requests Manually

3.1 Request Parameters (quick reference)

4. The Response Object

4.1 Selecting data

4.2 Following links

5. Debugging & Introspection

TL;DR

Scrapy Quick‑Reference Cheat Sheet

1. Working with Response Headers

2. Preserving meta Across Redirects

3. Controlling Crawl Order with priority

4. Submitting Forms – FormRequest

a) Simple POST request

b) Auto‑fill a form from the page (from_response)

5. Pagination with meta (keeping track of the page number)

6. Chaining Requests – From a listing to a detail page

Related posts

Day 28 of improving my Data Science skills

Contributing to Larger Open Source Project - Scrapy

Scraping a Forum With Python Without Triggering Anti-Bot Measures

The Modern Scrapy Developer's Guide (Part 1): Building Your First Spider

4. The `Response` Object

2. Preserving `meta` Across Redirects

4. Submitting Forms – `FormRequest`

b) Auto‑fill a form from the page (`from_response`)

5. Pagination with `meta` (keeping track of the page number)