Proxy 带宽优化：削减成本而不牺牲性能

发布: 1个月前 (2026年3月9日 GMT+8 00:22)

4 分钟阅读

原文: Dev.to

Source: Dev.to

住宅和移动代理带宽费用高——每 GB 5‑50 美元。每浪费一个字节就是浪费金钱。一个典型的网页大小为 2‑5 MB；如果你只需要价格或标题，就会把 99 % 的带宽浪费在图片、CSS、JavaScript 和广告上。沉重的资源、缓存不佳以及请求失败会迅速累积。

减少代理带宽的技术

在无头浏览器中阻止图像和媒体

# playwright_sync_example.py
from playwright.sync_api import sync_playwright

def create_optimized_page(browser):
    page = browser.new_page()

    # Block images, fonts, stylesheets, media, analytics, tracking, ads
    page.route("**/*.{png,jpg,jpeg,gif,svg,webp}", lambda route: route.abort())
    page.route("**/*.{woff,woff2,ttf,eot}", lambda route: route.abort())
    page.route("**/*.css", lambda route: route.abort())
    page.route("**/analytics*", lambda route: route.abort())
    page.route("**/tracking*", lambda route: route.abort())
    page.route("**/ads*", lambda route: route.abort())

    return page

阻止这些资源可以将带宽降低 60‑80 %。

优先使用直接 HTTP 请求而非无头浏览器

import requests

# Headless browser: Downloads 3‑5 MB per page
# Direct HTTP: Downloads 50‑200 KB per page
response = requests.get(
    url,
    proxies=proxy,
    headers={"Accept-Encoding": "gzip, deflate, br"},
    timeout=15,
)

仅在需要 JavaScript 渲染时才使用无头浏览器。

启用压缩

headers = {
    "Accept-Encoding": "gzip, deflate, br",  # Server will send compressed response
    # `requests` automatically decompresses the payload
}

压缩通常能将 HTML 负载减少 70‑80 %。

使用结构化 API 而非 HTML 抓取

# Scraping HTML: ~200 KB per product
html_resp = requests.get("https://site.com/product/123")

# Using API: ~2 KB per product (100× smaller)
api_resp = requests.get("https://api.site.com/products/123")

API 返回紧凑的 JSON，显著降低带宽。

实现本地缓存

import hashlib, json, time

class ProxyCache:
    def __init__(self, cache_ttl=3600):
        self.cache = {}
        self.ttl = cache_ttl

    def get(self, url):
        key = hashlib.md5(url.encode()).hexdigest()
        entry = self.cache.get(key)
        if entry and time.time() - entry["timestamp"] < self.ttl:
            return entry["data"]          # Cache hit – zero bandwidth
        return None

    def set(self, url, data):
        key = hashlib.md5(url.encode()).hexdigest()
        self.cache[key] = {"data": data, "timestamp": time.time()}

缓存命中可完全消除网络流量。

使用条件请求

# First request
resp = requests.get(url, proxies=proxy)
etag = resp.headers.get("ETag")
last_modified = resp.headers.get("Last-Modified")

# Subsequent requests
headers = {}
if etag:
    headers["If-None-Match"] = etag
if last_modified:
    headers["If-Modified-Since"] = last_modified

resp = requests.get(url, proxies=proxy, headers=headers)
if resp.status_code == 304:
    # Content unchanged – minimal bandwidth used
    pass

条件 GET 可以避免下载未改变的内容，节省 95 %+ 的负载。

智能重试逻辑

def smart_retry(url, proxy_manager, max_retries=3):
    for attempt in range(max_retries):
        proxy = proxy_manager.get_fresh_proxy()  # Different proxy each time
        try:
            response = requests.get(url, proxies=proxy, timeout=10)
            if response.status_code == 200:
                return response
            if response.status_code in (403, 429):
                proxy_manager.mark_failed(proxy)
                continue  # Try a different proxy
        except requests.Timeout:
            proxy_manager.mark_slow(proxy)
            continue
    return None

避免在同一代理上立即重试，以减少重复的带宽浪费。

带宽降低概述

技术	带宽降低
阻止图像/媒体	60‑80 %
HTTP 与无头浏览器	90‑95 %
启用压缩	70‑80 %
使用 API 与抓取	95‑99 %
缓存	缓存命中时 100 %
条件请求	内容未变时 95 %+

成本比较

方法	每页大小	每日带宽	每月费用（$10/GB）
无头，未优化	3 MB	30 GB	$300
无头，阻止资源	500 KB	5 GB	$50
直接 HTTP，已压缩	50 KB	500 MB	$5
API 请求	2 KB	20 MB	$0.20

优化可将您的代理成本降低至最高 99 %。

如需更多代理优化指南和节省成本策略，请访问 DataResearchTools。

Proxy 带宽优化：削减成本而不牺牲性能

减少代理带宽的技术

在无头浏览器中阻止图像和媒体

优先使用直接 HTTP 请求而非无头浏览器

启用压缩

使用结构化 API 而非 HTML 抓取

实现本地缓存

使用条件请求

智能重试逻辑

带宽降低概述

成本比较

相关文章

不糟糕的语义失效

你的撤销按钮只是一堆煎饼

没有 QA？没问题！用 Google Antigravity Agents 替代手动测试

已解决：如何防止 FE 回归？