High-Frequency eBay Scraping: Sync Prices and Stock Without Getting Banned

Published: (February 9, 2026 at 12:30 AM EST)
12 min read
Source: Dev.to

Source: Dev.to

High‑Frequency eBay Scraping: Sync Prices & Stock Without Getting Banned

Source: Dev.to – High‑Frequency eBay Scraping – Sync Prices and Stock without Getting Banned


Table of Contents

  1. Why Scrape eBay?
  2. Challenges & Anti‑Bot Measures
  3. Core Strategies
  4. Implementation Walk‑through
  5. Testing & Monitoring
  6. Best Practices & Tips
  7. Conclusion

Why Scrape eBay?

  • Price arbitrage – compare eBay listings with other marketplaces.
  • Inventory management – keep your own stock in sync with sellers.
  • Market research – monitor trends, popular items, and competitor activity.

Note: eBay provides official APIs, but they have strict rate limits and may not expose all the data you need. A well‑behaved scraper can fill those gaps when used responsibly.


Challenges & Anti‑Bot Measures

MechanismWhat It DoesHow to Bypass (Responsibly)
IP throttlingBlocks IPs that exceed request thresholds.Rotate residential or datacenter proxies; respect a safe request‑per‑minute ceiling.
CAPTCHAChallenges suspicious traffic.Use services like 2Captcha or implement a fallback to manual solving.
User‑Agent fingerprintingDetects non‑browser clients.Randomize a pool of real browser UA strings.
Cookie & session validationChecks for missing or stale cookies.Persist cookies per proxy session; refresh periodically.
JavaScript challengesRequires a real browser to execute JS.Use headless browsers (Playwright/Puppeteer) only when absolutely necessary.

Core Strategies

Respect Rate Limits

# Example: 1 request per 2 seconds per proxy
import time

def safe_request(session, url, delay=2):
    response = session.get(url)
    time.sleep(delay)          # pause before next request
    return response
  • Rule of thumb: ≤ 30 requests/min per IP (adjust based on observed bans).
  • Use exponential back‑off when you receive HTTP 429 or 503.

Rotate Proxies & User‑Agents

import random
from itertools import cycle

PROXIES = [
    "http://user:pass@proxy1.example.com:3128",
    "http://user:pass@proxy2.example.com:3128",
    # … add more
]

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
    "(KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
    "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6 Safari/605.1.15",
    # … add more
]

proxy_pool = cycle(PROXIES)
ua_pool     = cycle(USER_AGENTS)

def get_session():
    session = requests.Session()
    session.proxies.update({"http": next(proxy_pool), "https": next(proxy_pool)})
    session.headers.update({"User-Agent": next(ua_pool)})
    return session
  • Tip: Keep a health‑check endpoint for each proxy; drop dead proxies automatically.
def init_session():
    s = get_session()
    # Load persisted cookies if they exist
    try:
        s.cookies.update(pickle.load(open("cookies.pkl", "rb")))
    except FileNotFoundError:
        pass
    return s

def persist_cookies(session):
    with open("cookies.pkl", "wb") as f:
        pickle.dump(session.cookies, f)
  • Store cookies per proxy to avoid cross‑contamination.
  • Refresh cookies after a configurable number of requests (e.g., every 100 calls).

Error Handling & Retries

from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def configure_retries(session):
    retry_strategy = Retry(
        total=5,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
        method_whitelist=["GET", "POST"]
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("https://", adapter)
    session.mount("http://", adapter)
    return session
  • Log every failure with proxy, UA, and URL for later analysis.

Implementation Walk‑through

Project Structure

ebay-scraper/

├─ src/
│   ├─ __init__.py
│   ├─ scraper.py          # core scraping logic
│   ├─ parser.py          # HTML → data extraction
│   ├─ storage.py         # DB / CSV helpers
│   └─ utils.py           # proxy/UA helpers

├─ data/
│   └─ listings.csv

├─ requirements.txt
└─ README.md

Dependencies

requests>=2.28
beautifulsoup4>=4.11
lxml>=4.9
pandas>=2.0
urllib3>=1.26

(Add playwright or selenium only if you need JS rendering.)

Fetching Listings

# src/scraper.py
import requests
from .utils import get_session, configure_retries

BASE_URL = "https://www.ebay.com/sch/i.html"

def build_search_url(query, page=1):
    params = {
        "_nkw": query,
        "_pgn": page,
        "rt": "nc"          # turn off "newly listed" filter
    }
    return f"{BASE_URL}?{urllib.parse.urlencode(params)}"

def fetch_page(session, url):
    resp = session.get(url, timeout=15)
    resp.raise_for_status()
    return resp.text
  • Loop through pages until you hit the “no more results” sentinel.

Parsing the HTML

# src/parser.py
from bs4 import BeautifulSoup

def parse_listing(html):
    soup = BeautifulSoup(html, "lxml")
    items = []
    for li in soup.select("li.s-item"):
        title = li.select_one("h3.s-item__title")?.get_text(strip=True)
        price = li.select_one("span.s-item__price")?.get_text(strip=True)
        link  = li.select_one("a.s-item__link")?.get("href")
        stock = "Available" if li.select_one("span.s-item__stock") else "Out of stock"
        items.append({
            "title": title,
            "price": price,
            "url": link,
            "stock": stock
        })
    return items
  • Tip: Use lxml parser for speed.

Storing & Syncing Data

# src/storage.py
import pandas as pd
from pathlib import Path

DATA_FILE = Path("../data/listings.csv")

def load_existing():
    if DATA_FILE.exists():
        return pd.read_csv(DATA_FILE)
    return pd.DataFrame(columns=["title", "price", "url", "stock"])

def upsert(df_new):
    df_old = load_existing()
    df_combined = pd.concat([df_old, df_new]).drop_duplicates(subset=["url"], keep="last")
    df_combined.to_csv(DATA_FILE, index=False)
  • Run upsert after each batch of pages to keep the CSV up‑to‑date.

Testing & Monitoring

ToolPurpose
Prometheus + GrafanaTrack request rate per proxy, error counts, latency.
SentryCapture unhandled exceptions and stack traces.
LogrotateKeep log files manageable.
Unit tests (pytest)Validate parsing logic against saved HTML fixtures.
  • Health‑check endpoint: GET /health should return 200 OK if the scraper can successfully make a test request through the current proxy pool.

Best Practices & Tips

  1. Start slow – Begin with 1‑2 requests/min per IP, then gradually increase while monitoring bans.
  2. Diversify proxies – Mix residential, mobile, and datacenter IPs; avoid using the same proxy for > 500 requests.
  3. Randomize delays – Use a jitter range (e.g., delay = random.uniform(1.5, 3.0)).
  4. Respect robots.txt – eBay’s robots.txt disallows aggressive crawling of certain paths; stay within allowed sections.
  5. Cache static pages – If a listing hasn’t changed (same ETag/Last‑Modified), skip re‑parsing.
  6. Graceful shutdown – Persist cookies and the current page index on SIGINT/SIGTERM.

Conclusion

Scraping eBay at high frequency is feasible as long as you mimic human browsing patterns and rotate your network identity responsibly. By combining:

  • Rate‑limit awareness
  • Proxy & User‑Agent rotation
  • Robust session handling
  • Structured error recovery

you can keep your price/stock database fresh without triggering eBay’s anti‑bot defenses.

Remember: Always stay within eBay’s Terms of Service and local legal regulations. When possible, prefer their official APIs for production‑grade integrations.


Happy scraping!

Overview

In e‑commerce and dropshipping, stale data kills profit margins.
If a customer buys an item from your store but the eBay price has jumped 20 % or the item is out of stock, you face a lose‑lose choice:

  • Cancel the order → damage your seller rating.
  • Fulfill it at a loss.

To avoid this, you need to sync your inventory frequently.
Brute‑force re‑scraping 10 000 individual product pages every hour will get your IP black‑listed and blow your proxy budget. Real‑time monitoring requires a smarter approach.

1. Use List View Instead of Detail View

ViewRequests per ItemFidelityEfficiency
Detail View1 request = 1 itemHigh (full data)Low
List View (search results or “Seller’s Other Items”)1 request ≈ 200 itemsSufficient for price & stockHigh

Switching to List View can cut request volume by up to 98 %, making a 15‑minute sync financially feasible.

2. Persist State with SQLite

A JSON file works for a handful of items but becomes slow and prone to corruption as the list grows. SQLite is:

  • Serverless
  • Fast
  • Built‑in to Python

The database stores item_id, price, is_in_stock, and a last_checked timestamp.

import sqlite3

def setup_database():
    """Create (if needed) and return a connection to the SQLite database."""
    conn = sqlite3.connect('ebay_sync.db')
    cursor = conn.cursor()

    # Create table to store the 'state' of our inventory
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS products (
            item_id TEXT PRIMARY KEY,
            price REAL,
            is_in_stock INTEGER,
            last_checked TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    ''')
    conn.commit()
    return conn

# Initialize the DB
db_conn = setup_database()

3. Fetch a Batch of Items (List View)

The function below retrieves a search‑result page from eBay, parses the HTML with BeautifulSoup, and extracts the essential fields for each listing.

import requests
from bs4 import BeautifulSoup
from typing import List, Dict


def fetch_ebay_batch(url: str) -> List[Dict[str, object]]:
    """
    Pull a list page from eBay, parse it, and return a list of items.

    Each item dictionary contains:
        - ``item_id`` (str):   Unique identifier extracted from the listing URL.
        - ``price`` (float):   Numeric price (USD) of the item.
        - ``is_in_stock`` (int): ``1`` if the item is available, ``0`` otherwise.
    """
    # --------------------------------------------------------------
    # 1️⃣  Set request headers to mimic a real browser
    # --------------------------------------------------------------
    headers = {
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/119.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
    }

    # --------------------------------------------------------------
    # 2️⃣  Perform the HTTP GET request
    # --------------------------------------------------------------
    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        print(f"Failed to fetch: {response.status_code}")
        return []

    # --------------------------------------------------------------
    # 3️⃣  Parse the page with BeautifulSoup
    # --------------------------------------------------------------
    soup = BeautifulSoup(response.text, "html.parser")
    items: List[Dict[str, object]] = []

    # --------------------------------------------------------------
    # 4️⃣  Iterate over each listing container
    # --------------------------------------------------------------
    for wrapper in soup.select(".s-item__wrapper"):
        # ----- Extract Item ID -------------------------------------------------
        link_tag = wrapper.select_one(".s-item__link")
        if not link_tag:
            continue
        link = link_tag["href"]
        # e.g. https://www.ebay.com/itm/1234567890?... → "1234567890"
        item_id = link.split("?")[0].split("/")[-1]

        # ----- Extract and clean price -----------------------------------------
        price_tag = wrapper.select_one(".s-item__price")
        if not price_tag:
            continue
        try:
            # Handles ranges like "$25.99 to $30.00"
            price_text = price_tag.text.replace("$", "").replace(",", "").split(" to ")[0]
            price = float(price_text)
        except ValueError:
            continue

        # ----- Stock check ------------------------------------------------------
        status_tag = wrapper.select_one(".s-item__availability")
        is_in_stock = 0 if status_tag and "Out of stock" in status_tag.text else 1

        # ----- Append the cleaned record ----------------------------------------
        items.append(
            {
                "item_id": item_id,
                "price": price,
                "is_in_stock": is_in_stock,
            }
        )

    return items

4. Detect Changes & Sync to SQLite

Only when a change is detected do we update the DB (or trigger alerts).

def detect_and_sync_changes(scraped_items, conn):
    """
    Compare scraped items with the SQLite database and sync any changes.

    Parameters
    ----------
    scraped_items : list[dict]
        List of dictionaries containing the latest scraped data. Each dict must
        include the keys: 'item_id', 'price', and 'is_in_stock'.
    conn : sqlite3.Connection
        Active SQLite connection.

    Returns
    -------
    int
        Number of rows that were updated (i.e., changes detected).
    """
    cursor = conn.cursor()
    changes_detected = 0

    for item in scraped_items:
        # Look up the current record for this item
        cursor.execute(
            "SELECT price, is_in_stock FROM products WHERE item_id = ?",
            (item["item_id"],)
        )
        row = cursor.fetchone()

        if row is None:
            # New item discovered – insert it
            cursor.execute(
                """
                INSERT INTO products (item_id, price, is_in_stock)
                VALUES (?, ?, ?)
                """,
                (item["item_id"], item["price"], item["is_in_stock"])
            )
            print(f"New Item Tracked: {item['item_id']}")
        else:
            old_price, old_stock = row
            # Detect any change in price or stock status
            if old_price != item["price"] or old_stock != item["is_in_stock"]:
                print(
                    f"Change Found on {item['item_id']}: "
                    f"Price {old_price}{item['price']}"
                )
                cursor.execute(
                    """
                    UPDATE products
                    SET price = ?, is_in_stock = ?, last_checked = CURRENT_TIMESTAMP
                    WHERE item_id = ?
                    """,
                    (item["price"], item["is_in_stock"], item["item_id"])
                )
                changes_detected += 1

    conn.commit()
    return changes_detected

5. Main Loop (Continuous Monitoring)

Wrap everything in a resilient loop. Adjust TARGET_URLS to the seller’s store pages or search results you want to monitor.

import time

TARGET_URLS = [
    "https://www.ebay.com/sch/i.html?_ssn=some_seller_id&_ipg=200",
    # Add more list‑view URLs as needed
]

def main():
    db_conn = setup_database()
    print("Starting eBay Monitor...")

    while True:
        try:
            for url in TARGET_URLS:
                print(f"Scanning {url}")
                scraped = fetch_ebay_batch(url)
                changes = detect_and_sync_changes(scraped, db_conn)
                if changes:
                    print(f"{changes} change(s) detected and synced.")
        except Exception as e:
            print(f"Error during monitoring cycle: {e}")

        # Wait 15 minutes before the next cycle
        time.sleep(15 * 60)

if __name__ == "__main__":
    main()

What This Gives You

  • High‑frequency monitoring with ≤ 1 request per 200 items.
  • Persistent state via SQLite – fast lookups, no corruption risk.
  • Change‑driven actions – only act (e.g., Discord alerts, Shopify updates) when a price or stock status actually changes.

Deploy the script on a modest VPS or a serverless environment, and you’ll keep your eBay‑linked store in sync without blowing your proxy budget. Happy coding!

eBay Monitoring Script Overview

import time
import sqlite3
from concurrent.futures import ThreadPoolExecutor

def main():
    db_conn = sqlite3.connect("inventory.db")
    while True:
        try:
            for url in TARGET_URLS:
                print(f"Fetching {url}...")
                scraped_data = fetch_ebay_batch(url)
                changes = detect_and_sync_changes(scraped_data, db_conn)
                print(f"Scan complete. Changes detected: {changes}")

            print("Sleeping for 15 minutes...")
            time.sleep(900)   # 15 minutes

        except Exception as e:
            print(f"Error occurred: {e}")
            time.sleep(60)    # Wait a minute before retrying

if __name__ == "__main__":
    main()

Scaling Your Inventory Monitoring

  • Rate‑limit handling – eBay throttles requests aggressively.

    • Use residential proxies or smart proxy rotators.
    • These appear as genuine shoppers, making them far less likely to be blocked than data‑center IPs.
  • Dealing with “See Price” items – When the list view shows “See Price” instead of a numeric value:

    • Trigger a targeted detail‑view scrape for that specific item only.
    • This keeps the overall process fast while preserving accuracy.
  • Parallel fetching for many sellers – A simple sequential loop becomes a bottleneck.

    • Leverage ThreadPoolExecutor to request multiple pages at once.
from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(fetch_ebay_batch, TARGET_URLS))
  • State‑driven diffing – Focus on list views and compare each new scrape with the previous state.
    • This strategy can cut proxy costs by ≈ 98 % and dramatically lowers the chance of IP bans.

Next Steps

  • Integrate with external services – Connect detect_and_sync_changes to:

    • Shopify (update product listings)
    • Discord webhook (real‑time alerts)
  • Advanced techniques – Explore deeper guides on:

    • Rotating proxies with Python
    • Parsing HTML with BeautifulSoup

By following these recommendations, you’ll build a robust, cost‑effective eBay monitoring system that scales with your inventory while staying under the radar of eBay’s anti‑scraping measures.

0 views
Back to Blog

Related posts

Read more »