High-Frequency eBay Scraping: Sync Prices and Stock Without Getting Banned
Source: Dev.to
High‑Frequency eBay Scraping: Sync Prices & Stock Without Getting Banned
Source: Dev.to – High‑Frequency eBay Scraping – Sync Prices and Stock without Getting Banned
Table of Contents
- Why Scrape eBay?
- Challenges & Anti‑Bot Measures
- Core Strategies
- Implementation Walk‑through
- 4.1 Project Structure
- 4.2 Dependencies
- 4.3 Fetching Listings
- 4.4 Parsing the HTML
- 4.5 Storing & Syncing Data
- Testing & Monitoring
- Best Practices & Tips
- Conclusion
Why Scrape eBay?
- Price arbitrage – compare eBay listings with other marketplaces.
- Inventory management – keep your own stock in sync with sellers.
- Market research – monitor trends, popular items, and competitor activity.
Note: eBay provides official APIs, but they have strict rate limits and may not expose all the data you need. A well‑behaved scraper can fill those gaps when used responsibly.
Challenges & Anti‑Bot Measures
| Mechanism | What It Does | How to Bypass (Responsibly) |
|---|---|---|
| IP throttling | Blocks IPs that exceed request thresholds. | Rotate residential or datacenter proxies; respect a safe request‑per‑minute ceiling. |
| CAPTCHA | Challenges suspicious traffic. | Use services like 2Captcha or implement a fallback to manual solving. |
| User‑Agent fingerprinting | Detects non‑browser clients. | Randomize a pool of real browser UA strings. |
| Cookie & session validation | Checks for missing or stale cookies. | Persist cookies per proxy session; refresh periodically. |
| JavaScript challenges | Requires a real browser to execute JS. | Use headless browsers (Playwright/Puppeteer) only when absolutely necessary. |
Core Strategies
Respect Rate Limits
# Example: 1 request per 2 seconds per proxy
import time
def safe_request(session, url, delay=2):
response = session.get(url)
time.sleep(delay) # pause before next request
return response
- Rule of thumb: ≤ 30 requests/min per IP (adjust based on observed bans).
- Use exponential back‑off when you receive HTTP 429 or 503.
Rotate Proxies & User‑Agents
import random
from itertools import cycle
PROXIES = [
"http://user:pass@proxy1.example.com:3128",
"http://user:pass@proxy2.example.com:3128",
# … add more
]
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6 Safari/605.1.15",
# … add more
]
proxy_pool = cycle(PROXIES)
ua_pool = cycle(USER_AGENTS)
def get_session():
session = requests.Session()
session.proxies.update({"http": next(proxy_pool), "https": next(proxy_pool)})
session.headers.update({"User-Agent": next(ua_pool)})
return session
- Tip: Keep a health‑check endpoint for each proxy; drop dead proxies automatically.
Session & Cookie Management
def init_session():
s = get_session()
# Load persisted cookies if they exist
try:
s.cookies.update(pickle.load(open("cookies.pkl", "rb")))
except FileNotFoundError:
pass
return s
def persist_cookies(session):
with open("cookies.pkl", "wb") as f:
pickle.dump(session.cookies, f)
- Store cookies per proxy to avoid cross‑contamination.
- Refresh cookies after a configurable number of requests (e.g., every 100 calls).
Error Handling & Retries
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def configure_retries(session):
retry_strategy = Retry(
total=5,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
method_whitelist=["GET", "POST"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
return session
- Log every failure with proxy, UA, and URL for later analysis.
Implementation Walk‑through
Project Structure
ebay-scraper/
│
├─ src/
│ ├─ __init__.py
│ ├─ scraper.py # core scraping logic
│ ├─ parser.py # HTML → data extraction
│ ├─ storage.py # DB / CSV helpers
│ └─ utils.py # proxy/UA helpers
│
├─ data/
│ └─ listings.csv
│
├─ requirements.txt
└─ README.md
Dependencies
requests>=2.28
beautifulsoup4>=4.11
lxml>=4.9
pandas>=2.0
urllib3>=1.26
(Add playwright or selenium only if you need JS rendering.)
Fetching Listings
# src/scraper.py
import requests
from .utils import get_session, configure_retries
BASE_URL = "https://www.ebay.com/sch/i.html"
def build_search_url(query, page=1):
params = {
"_nkw": query,
"_pgn": page,
"rt": "nc" # turn off "newly listed" filter
}
return f"{BASE_URL}?{urllib.parse.urlencode(params)}"
def fetch_page(session, url):
resp = session.get(url, timeout=15)
resp.raise_for_status()
return resp.text
- Loop through pages until you hit the “no more results” sentinel.
Parsing the HTML
# src/parser.py
from bs4 import BeautifulSoup
def parse_listing(html):
soup = BeautifulSoup(html, "lxml")
items = []
for li in soup.select("li.s-item"):
title = li.select_one("h3.s-item__title")?.get_text(strip=True)
price = li.select_one("span.s-item__price")?.get_text(strip=True)
link = li.select_one("a.s-item__link")?.get("href")
stock = "Available" if li.select_one("span.s-item__stock") else "Out of stock"
items.append({
"title": title,
"price": price,
"url": link,
"stock": stock
})
return items
- Tip: Use
lxmlparser for speed.
Storing & Syncing Data
# src/storage.py
import pandas as pd
from pathlib import Path
DATA_FILE = Path("../data/listings.csv")
def load_existing():
if DATA_FILE.exists():
return pd.read_csv(DATA_FILE)
return pd.DataFrame(columns=["title", "price", "url", "stock"])
def upsert(df_new):
df_old = load_existing()
df_combined = pd.concat([df_old, df_new]).drop_duplicates(subset=["url"], keep="last")
df_combined.to_csv(DATA_FILE, index=False)
- Run
upsertafter each batch of pages to keep the CSV up‑to‑date.
Testing & Monitoring
| Tool | Purpose |
|---|---|
| Prometheus + Grafana | Track request rate per proxy, error counts, latency. |
| Sentry | Capture unhandled exceptions and stack traces. |
| Logrotate | Keep log files manageable. |
| Unit tests (pytest) | Validate parsing logic against saved HTML fixtures. |
- Health‑check endpoint:
GET /healthshould return200 OKif the scraper can successfully make a test request through the current proxy pool.
Best Practices & Tips
- Start slow – Begin with 1‑2 requests/min per IP, then gradually increase while monitoring bans.
- Diversify proxies – Mix residential, mobile, and datacenter IPs; avoid using the same proxy for > 500 requests.
- Randomize delays – Use a jitter range (e.g.,
delay = random.uniform(1.5, 3.0)). - Respect robots.txt – eBay’s
robots.txtdisallows aggressive crawling of certain paths; stay within allowed sections. - Cache static pages – If a listing hasn’t changed (same
ETag/Last‑Modified), skip re‑parsing. - Graceful shutdown – Persist cookies and the current page index on SIGINT/SIGTERM.
Conclusion
Scraping eBay at high frequency is feasible as long as you mimic human browsing patterns and rotate your network identity responsibly. By combining:
- Rate‑limit awareness
- Proxy & User‑Agent rotation
- Robust session handling
- Structured error recovery
you can keep your price/stock database fresh without triggering eBay’s anti‑bot defenses.
Remember: Always stay within eBay’s Terms of Service and local legal regulations. When possible, prefer their official APIs for production‑grade integrations.
Happy scraping!
Overview
In e‑commerce and dropshipping, stale data kills profit margins.
If a customer buys an item from your store but the eBay price has jumped 20 % or the item is out of stock, you face a lose‑lose choice:
- Cancel the order → damage your seller rating.
- Fulfill it at a loss.
To avoid this, you need to sync your inventory frequently.
Brute‑force re‑scraping 10 000 individual product pages every hour will get your IP black‑listed and blow your proxy budget. Real‑time monitoring requires a smarter approach.
1. Use List View Instead of Detail View
| View | Requests per Item | Fidelity | Efficiency |
|---|---|---|---|
| Detail View | 1 request = 1 item | High (full data) | Low |
| List View (search results or “Seller’s Other Items”) | 1 request ≈ 200 items | Sufficient for price & stock | High |
Switching to List View can cut request volume by up to 98 %, making a 15‑minute sync financially feasible.
2. Persist State with SQLite
A JSON file works for a handful of items but becomes slow and prone to corruption as the list grows. SQLite is:
- Serverless
- Fast
- Built‑in to Python
The database stores item_id, price, is_in_stock, and a last_checked timestamp.
import sqlite3
def setup_database():
"""Create (if needed) and return a connection to the SQLite database."""
conn = sqlite3.connect('ebay_sync.db')
cursor = conn.cursor()
# Create table to store the 'state' of our inventory
cursor.execute('''
CREATE TABLE IF NOT EXISTS products (
item_id TEXT PRIMARY KEY,
price REAL,
is_in_stock INTEGER,
last_checked TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
conn.commit()
return conn
# Initialize the DB
db_conn = setup_database()
3. Fetch a Batch of Items (List View)
The function below retrieves a search‑result page from eBay, parses the HTML with BeautifulSoup, and extracts the essential fields for each listing.
import requests
from bs4 import BeautifulSoup
from typing import List, Dict
def fetch_ebay_batch(url: str) -> List[Dict[str, object]]:
"""
Pull a list page from eBay, parse it, and return a list of items.
Each item dictionary contains:
- ``item_id`` (str): Unique identifier extracted from the listing URL.
- ``price`` (float): Numeric price (USD) of the item.
- ``is_in_stock`` (int): ``1`` if the item is available, ``0`` otherwise.
"""
# --------------------------------------------------------------
# 1️⃣ Set request headers to mimic a real browser
# --------------------------------------------------------------
headers = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/119.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
}
# --------------------------------------------------------------
# 2️⃣ Perform the HTTP GET request
# --------------------------------------------------------------
response = requests.get(url, headers=headers)
if response.status_code != 200:
print(f"Failed to fetch: {response.status_code}")
return []
# --------------------------------------------------------------
# 3️⃣ Parse the page with BeautifulSoup
# --------------------------------------------------------------
soup = BeautifulSoup(response.text, "html.parser")
items: List[Dict[str, object]] = []
# --------------------------------------------------------------
# 4️⃣ Iterate over each listing container
# --------------------------------------------------------------
for wrapper in soup.select(".s-item__wrapper"):
# ----- Extract Item ID -------------------------------------------------
link_tag = wrapper.select_one(".s-item__link")
if not link_tag:
continue
link = link_tag["href"]
# e.g. https://www.ebay.com/itm/1234567890?... → "1234567890"
item_id = link.split("?")[0].split("/")[-1]
# ----- Extract and clean price -----------------------------------------
price_tag = wrapper.select_one(".s-item__price")
if not price_tag:
continue
try:
# Handles ranges like "$25.99 to $30.00"
price_text = price_tag.text.replace("$", "").replace(",", "").split(" to ")[0]
price = float(price_text)
except ValueError:
continue
# ----- Stock check ------------------------------------------------------
status_tag = wrapper.select_one(".s-item__availability")
is_in_stock = 0 if status_tag and "Out of stock" in status_tag.text else 1
# ----- Append the cleaned record ----------------------------------------
items.append(
{
"item_id": item_id,
"price": price,
"is_in_stock": is_in_stock,
}
)
return items
4. Detect Changes & Sync to SQLite
Only when a change is detected do we update the DB (or trigger alerts).
def detect_and_sync_changes(scraped_items, conn):
"""
Compare scraped items with the SQLite database and sync any changes.
Parameters
----------
scraped_items : list[dict]
List of dictionaries containing the latest scraped data. Each dict must
include the keys: 'item_id', 'price', and 'is_in_stock'.
conn : sqlite3.Connection
Active SQLite connection.
Returns
-------
int
Number of rows that were updated (i.e., changes detected).
"""
cursor = conn.cursor()
changes_detected = 0
for item in scraped_items:
# Look up the current record for this item
cursor.execute(
"SELECT price, is_in_stock FROM products WHERE item_id = ?",
(item["item_id"],)
)
row = cursor.fetchone()
if row is None:
# New item discovered – insert it
cursor.execute(
"""
INSERT INTO products (item_id, price, is_in_stock)
VALUES (?, ?, ?)
""",
(item["item_id"], item["price"], item["is_in_stock"])
)
print(f"New Item Tracked: {item['item_id']}")
else:
old_price, old_stock = row
# Detect any change in price or stock status
if old_price != item["price"] or old_stock != item["is_in_stock"]:
print(
f"Change Found on {item['item_id']}: "
f"Price {old_price} → {item['price']}"
)
cursor.execute(
"""
UPDATE products
SET price = ?, is_in_stock = ?, last_checked = CURRENT_TIMESTAMP
WHERE item_id = ?
""",
(item["price"], item["is_in_stock"], item["item_id"])
)
changes_detected += 1
conn.commit()
return changes_detected
5. Main Loop (Continuous Monitoring)
Wrap everything in a resilient loop. Adjust TARGET_URLS to the seller’s store pages or search results you want to monitor.
import time
TARGET_URLS = [
"https://www.ebay.com/sch/i.html?_ssn=some_seller_id&_ipg=200",
# Add more list‑view URLs as needed
]
def main():
db_conn = setup_database()
print("Starting eBay Monitor...")
while True:
try:
for url in TARGET_URLS:
print(f"Scanning {url}")
scraped = fetch_ebay_batch(url)
changes = detect_and_sync_changes(scraped, db_conn)
if changes:
print(f"{changes} change(s) detected and synced.")
except Exception as e:
print(f"Error during monitoring cycle: {e}")
# Wait 15 minutes before the next cycle
time.sleep(15 * 60)
if __name__ == "__main__":
main()
What This Gives You
- High‑frequency monitoring with ≤ 1 request per 200 items.
- Persistent state via SQLite – fast lookups, no corruption risk.
- Change‑driven actions – only act (e.g., Discord alerts, Shopify updates) when a price or stock status actually changes.
Deploy the script on a modest VPS or a serverless environment, and you’ll keep your eBay‑linked store in sync without blowing your proxy budget. Happy coding!
eBay Monitoring Script Overview
import time
import sqlite3
from concurrent.futures import ThreadPoolExecutor
def main():
db_conn = sqlite3.connect("inventory.db")
while True:
try:
for url in TARGET_URLS:
print(f"Fetching {url}...")
scraped_data = fetch_ebay_batch(url)
changes = detect_and_sync_changes(scraped_data, db_conn)
print(f"Scan complete. Changes detected: {changes}")
print("Sleeping for 15 minutes...")
time.sleep(900) # 15 minutes
except Exception as e:
print(f"Error occurred: {e}")
time.sleep(60) # Wait a minute before retrying
if __name__ == "__main__":
main()
Scaling Your Inventory Monitoring
-
Rate‑limit handling – eBay throttles requests aggressively.
- Use residential proxies or smart proxy rotators.
- These appear as genuine shoppers, making them far less likely to be blocked than data‑center IPs.
-
Dealing with “See Price” items – When the list view shows “See Price” instead of a numeric value:
- Trigger a targeted detail‑view scrape for that specific item only.
- This keeps the overall process fast while preserving accuracy.
-
Parallel fetching for many sellers – A simple sequential loop becomes a bottleneck.
- Leverage
ThreadPoolExecutorto request multiple pages at once.
- Leverage
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=5) as executor:
results = list(executor.map(fetch_ebay_batch, TARGET_URLS))
- State‑driven diffing – Focus on list views and compare each new scrape with the previous state.
- This strategy can cut proxy costs by ≈ 98 % and dramatically lowers the chance of IP bans.
Next Steps
-
Integrate with external services – Connect
detect_and_sync_changesto:- Shopify (update product listings)
- Discord webhook (real‑time alerts)
-
Advanced techniques – Explore deeper guides on:
- Rotating proxies with Python
- Parsing HTML with BeautifulSoup
By following these recommendations, you’ll build a robust, cost‑effective eBay monitoring system that scales with your inventory while staying under the radar of eBay’s anti‑scraping measures.