3 Hours Wasted on asyncio Pitfalls That Almost Took Down Production

Published: 3 days ago (April 30, 2026 at 09:47 PM EDT)

4 min read

Source: Dev.to

Incident Overview

Last Friday at 5 PM, right when I was about to close my laptop and sneak out, the alert channel exploded — the online data collection service had a timeout rate spiking to 40 %, and all downstream reports were blank. I checked the logs and found that the crawler processing thousands of URLs was still using the old synchronous requests library, fetching them one by one. Each request averaged 1.2 seconds, so a full round took nearly 20 minutes, but the business requirement demanded completion within 5 minutes. The only thought that crossed my mind was to rewrite it with asyncio for concurrency and deploy before leaving.

That decision led to three major pitfalls, and I almost wrecked the service. Below are the hard‑learned lessons, hoping to save you those three hours.

Understanding asyncio

The core of asyncio is the event loop plus coroutines.

The event loop is a constantly polling scheduler.
Each coroutine is a task that can voluntarily pause and hand back control.

When a coroutine is waiting for a network response (I/O), the event loop immediately switches to another ready coroutine, keeping the CPU from spinning idle.

The biggest difference from traditional multithreading is that asyncio uses cooperative scheduling within a single thread, avoiding thread‑switching overhead and GIL contention. It especially shines in network‑request‑heavy scenarios.

Typical pattern:

async def coro():
    await async_io_operation()

Gather multiple coroutines with asyncio.gather(). The total duration depends on the slowest task, not the sum of all tasks.

Naïve implementation (what not to do)

import asyncio
import requests  # 同步库，不能用！

async def fetch(url):
    # 错误示范：直接把同步的 requests 放在协程里
    resp = requests.get(url, timeout=5)   # 这次调用会阻塞整个线程！
    return resp.status_code

async def main():
    urls = ["https://httpbin.org/delay/1"] * 10
    tasks = [fetch(url) for url in urls]
    results = await asyncio.gather(*tasks)
    print(results)

asyncio.run(main())

Running this shows all requests are still sequential. requests.get() is a blocking call; while it waits for the network it never yields control back to the event loop, so only one coroutine runs at a time. The event loop becomes effectively useless.

Correct async HTTP client

import asyncio
import aiohttp

async def fetch(session, url):
    # 使用 aiohttp 的异步请求，await 时将控制权交还事件循环
    async with session.get(url, timeout=aiohttp.ClientTimeout(total=5)) as resp:
        return await resp.text()

async def main():
    urls = ["https://httpbin.org/delay/1"] * 10
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in urls]
        results = await asyncio.gather(*tasks)
    print(f"完成 {len(results)} 个请求")

asyncio.run(main())

Now the event loop truly runs the ten 1‑second requests concurrently, finishing in just over 1 second instead of 10 seconds. In production the crawler went from 20 minutes to under 2 minutes.

Handling exceptions and concurrency limits

When the URL list grew to several hundred, occasional timeouts or DNS failures caused asyncio.gather() to raise an exception immediately, cancelling the remaining coroutines and wiping out the whole batch.

async def fetch_with_sem(sem, session, url):
    async with sem:   # 限制并发数，防止瞬间占满文件描述符
        try:
            async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as resp:
                return url, await resp.text()
        except Exception as e:
            return url, f"ERROR: {e}"

async def main():
    urls = [...]  # 几百个 URL
    sem = asyncio.Semaphore(50)  # 限制并发，避免触发系统或服务端限制
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_with_sem(sem, session, url) for url in urls]
        results = await asyncio.gather(*tasks, return_exceptions=True)  # 关键！
    for url, content in results:
        if isinstance(content, Exception):
            print(f"{url} 失败: {content}")
        else:
            process(content)

return_exceptions=True makes gather return exception objects instead of aborting the whole batch.
A semaphore caps concurrent connections (e.g., 50) to avoid exhausting file descriptors or hitting remote rate limits.
Swallowing exceptions and optionally retrying them yields a stable service.

Practical tips

Never call time.sleep() inside a coroutine – it blocks the entire thread. Use await asyncio.sleep() instead.
Beware of unlimited concurrency – spawning thousands of connections at once can exhaust file descriptor limits or trigger server rate limiting. Use asyncio.Semaphore or connection‑pool limits.
Implement back‑off and retries – transient network issues are normal. Combine return_exceptions=True with exponential backoff retries for robust production‑grade code.

Conclusion

Rewriting a synchronous I/O‑bound service with asyncio is one of the most satisfying optimizations you can make. But these pitfalls can easily turn it into a nightmare if you’re not careful. I lost three hours and almost a stable production Friday. I hope this post saves you from the same fate.

3 Hours Wasted on asyncio Pitfalls That Almost Took Down Production

Incident Overview

Understanding asyncio

Naïve implementation (what not to do)

Correct async HTTP client

Handling exceptions and concurrency limits

Practical tips

Conclusion

Related posts

The asyncio Mistake That Cost Me 3 Hours

3 Asyncio Pitfalls That Took Me 3 Hours to Debug and Almost Crashed Production

Is asyncio Really Better Than Multithreading? I Tested 100 Concurrent Requests, and the Difference Is Huge

3-Hour Debugging: How `time.sleep` in Async Functions Killed Our asyncio Concurrency