Rate Limiting Your API: Algorithms, Implementation, and the Strategic Thinking Behind It
Source: Dev.to
Every API you expose to the internet will eventually be abused—automated scrapers, credential‑stuffing bots, misbehaving integrations, or even a well‑meaning client with a loop that runs too fast. Without rate limiting, a single bad actor can consume all your server resources and degrade the experience for every other user.
Threats Addressed by Rate Limiting
- Resource protection – Prevent any single client from consuming a disproportionate share of CPU, memory, database connections, or bandwidth.
- Cost control – Unconstrained clients can rack up significant charges (e.g., AI inference APIs, SMS providers, payment processors) in minutes.
- Abuse prevention – Credential stuffing and enumeration attacks rely on volume; rate limiting raises the cost for attackers.
- Fair access – In multi‑tenant systems, rate limiting ensures one tenant’s spike doesn’t degrade everyone else’s experience.
Common Algorithms
Fixed Window
- Counts requests per client in a fixed time interval.
- Implementation in Redis: a single
INCRwithEXPIRE. - Weakness: boundary problem—a client can send the maximum at the end of one window and again at the start of the next, effectively doubling the rate briefly.
Sliding Window Log
- Tracks timestamps of every request in the window, eliminating the boundary problem.
- Drawback: memory‑intensive (e.g., 1,000 requests/min × 10,000 clients = 10 million timestamps).
- Best suited for low‑volume, high‑value endpoints like login or password reset.
Sliding Window Counter (recommended default)
- Maintains counters for the current and previous fixed windows, then computes a weighted count based on how far into the current window you are.
- Offers a good balance of accuracy, memory efficiency, and implementation simplicity.
Token Bucket
- Models rate limiting as a bucket that fills at a steady rate.
- Two parameters: refill rate (sustained throughput) and bucket capacity (burst tolerance).
- Used by most cloud providers and maps naturally to tiered pricing models.
Layered Rate Limiting
- Edge / Load Balancer (Nginx, Cloudflare, AWS API Gateway) – protects application servers from excessive traffic before it reaches them.
- API Gateway or Middleware – enforces business‑level limits by authenticated user, API key, subscription tier, or endpoint.
- Individual Services – in microservice architectures, prevents a misbehaving upstream service from overwhelming a downstream dependency.
Each layer mitigates different failure modes; don’t rely on a single line of defense.
Transparency to Clients
- Include rate‑limit headers in every response:
X-RateLimit-Limit– the maximum number of requests allowed.X-RateLimit-Remaining– how many requests are left in the current window.X-RateLimit-Reset– when the window resets (epoch time).
- On a
429 Too Many Requestsresponse, add aRetry-Afterheader indicating when the client may retry. - This turns rate limiting from a blunt instrument into a collaborative mechanism between your API and its consumers.
Practical Recommendations
- Default algorithm: Sliding Window Counter – close‑to‑accurate with minimal memory overhead, without the complexity of full logs.
- Identify clients by API key, not by IP address. IP‑based limiting is increasingly unreliable due to shared corporate proxies and botnet‑distributed requests.
- Treat rate limits as a product decision as well as a technical one: define tier limits, decide when to return
429versus graceful degradation, and size burst capacity to match user experience expectations.
Read the full article at for complete algorithm implementations, Redis code examples, and a guide to designing rate‑limit tiers for subscription‑based APIs.
Originally published at NovVista.