FastAPI Performance: The Hidden Thread Pool Overhead You Might Be Missing
Source: Dev.to
Understanding the Problem
FastAPI is an incredible framework for building high‑performance APIs in Python. Its async capabilities, automatic validation, and excellent documentation make it a joy to work with. However, a subtle performance issue is often overlooked: unnecessary thread‑pool delegation for synchronous dependencies.
How FastAPI Handles Dependencies
FastAPI distinguishes between async and sync callables:
async deffunctions – executed directly in the event loop.deffunctions – sent to a thread pool viaanyio.to_thread.run_sync.
This behavior applies to both path‑operation functions and dependencies. Internally FastAPI performs a simplified check:
import asyncio
from anyio import to_thread
# Simplified FastAPI logic
if asyncio.iscoroutinefunction(dependency):
# Run directly in event loop
result = await dependency()
else:
# Send to thread pool
result = await to_thread.run_sync(dependency)
Because class constructors (__init__) are always synchronous, class‑based dependencies are always routed to the thread pool.
The Thread Pool Overhead
- Default thread pool size: 40 threads.
- Each thread‑pool execution incurs context‑switching, thread synchronization, and possible queuing when all threads are busy.
Example: Multiple Class‑Based Dependencies
from fastapi import Depends, FastAPI
app = FastAPI()
class QueryParams:
def __init__(self, q: str | None = None, skip: int = 0, limit: int = 100):
self.q = q
self.skip = skip
self.limit = limit
@app.get("/items/")
async def read_items(params: QueryParams = Depends()):
return {"q": params.q, "skip": params.skip, "limit": params.limit}
Each request creates a QueryParams instance in the thread pool, even though it only performs simple assignments.
If an endpoint has several such dependencies, the overhead multiplies:
@app.get("/complex-endpoint/")
async def complex_operation(
auth: AuthParams = Depends(),
query: QueryParams = Depends(),
pagination: PaginationParams = Depends(),
filters: FilterParams = Depends(),
):
pass # Four dependencies → four thread‑pool tasks
With 100 concurrent requests, 400 thread‑pool tasks are queued, but only 40 can run simultaneously, causing latency spikes.
Real‑World Impact
- API with 50 endpoints
- Average 3 class‑based dependencies per endpoint
- 1 000 requests per second
→ ~150 000 unnecessary thread‑pool operations per second. Even if each operation is fast, the cumulative overhead can become a bottleneck.
The Solution: fastapi-async-safe-dependencies
A lightweight library that marks certain dependencies as safe to run directly in the event loop, bypassing the thread pool.
Installation
pip install fastapi-async-safe-dependencies
Basic Usage
from fastapi import Depends, FastAPI
from fastapi_async_safe import async_safe, init_app
app = FastAPI()
init_app(app) # Initialize the library
@async_safe # Mark as safe for async context
class QueryParams:
def __init__(self, q: str | None = None, skip: int = 0, limit: int = 100):
self.q = q
self.skip = skip
self.limit = limit
@app.get("/items/")
async def read_items(params: QueryParams = Depends()):
return {"q": params.q, "skip": params.skip, "limit": params.limit}
What changed?
- Call
init_app(app)at startup. - Decorate dependency classes with
@async_safe.
How It Works Under the Hood
When a class is decorated with @async_safe, the library creates an async wrapper:
# Simplified wrapper generated by @async_safe
async def _wrapper(**kwargs):
return YourClass(**kwargs) # Direct constructor call
Because the wrapper is a coroutine, asyncio.iscoroutinefunction returns True, so FastAPI runs it directly in the event loop—no thread‑pool involvement.
init_app() walks through all routes and dependencies, replacing class references with these wrappers. The wrapper itself performs no await; it simply executes the synchronous constructor instantly, which is safe when the constructor is non‑blocking.
Supporting Inheritance
from fastapi_async_safe import async_safe
@async_safe
class BaseParams:
def __init__(self, limit: int = 100):
self.limit = min(limit, 1000)
class QueryParams(BaseParams):
def __init__(self, q: str | None = None, **kwargs):
super().__init__(**kwargs)
self.q = q
If a subclass does need thread‑pool execution (e.g., performs I/O), mark it with @async_unsafe:
from fastapi_async_safe import async_unsafe
@async_safe
class BaseParams:
pass
@async_unsafe # Will be sent to thread pool
class HeavyParams(BaseParams):
def __init__(self):
self.data = some_blocking_operation()
Global Opt‑In
init_app(app, all_classes_safe=True) # Treat all class‑based dependencies as async‑safe
# Use @async_unsafe only for exceptions
Using with Synchronous Functions
The decorator also works with plain functions:
from fastapi_async_safe import async_safe
@async_safe
def get_common_params(q: str | None = None, skip: int = 0, limit: int = 100) -> dict:
return {"q": q, "skip": skip, "limit": limit}
@app.get("/items/")
async def read_items(params: dict = Depends(get_common_params)):
return params
Benchmarks & Results
| Scenario | Performance Gain |
|---|---|
| Single class dependency per endpoint | 15–25% ↑ requests/sec |
| Multiple class dependencies | 40–60% ↑ requests/sec |
| 1000+ concurrent requests (p95) | 30–50% ↓ latency |
| Thread‑pool saturation eliminated | — |
Best Practices
When to Use @async_safe
✅ Simple data classes
✅ Parameter‑validation classes
✅ Configuration objects
✅ Non‑blocking utility functions
✅ Pydantic model wrappers
❌ Do NOT use for:
- Database queries
- File I/O
- External API calls
- CPU‑intensive calculations
- Anything that truly blocks the event loop
Adoption Strategy
- Start Small – Apply to your most‑called endpoints.
- Monitor – Verify that latency improves and no regressions appear.
- Expand – Gradually mark more dependencies as async‑safe.
- Consider Global Opt‑In – Once confident, use
all_classes_safe=True.
Testing
Existing tests remain unchanged:
import pytest
from fastapi.testclient import TestClient
def test_endpoint():
client = TestClient(app)
response = client.get("/items/?q=test&limit=50")
assert response.status_code == 200
assert response.json()["q"] == "test"
Caveats
- Premature optimization – Only adopt if you observe performance issues.
- Blocking dependencies – Keep them in the thread pool (
@async_unsafe). - Profile first – Use tools like
uvicorn --log-level debugor external profilers to confirm bottlenecks before applying the library.