Rate Limiting: How to Stop Your API From Drowning in Requests
Source: Dev.to
Introduction
Hello! I’m Jairo
Your favorite dev.to writer.
Just kidding — I know I’m not. Just breaking the ice 😄
Last week I was reading an excellent book called System Design Interview by Alex Xu. If you work with backend systems and haven’t read it yet, you probably should.
One concept from the book reminded me of something interesting about software engineers: we all know rate limiting exists, but very few engineers really understand when to use it, how it works internally, and which strategy to choose.
So today let’s talk about one of the most important protections your API can have: rate limiting.
🌧️ A Simple Analogy
Your application is a person walking in the rain, and every raindrop represents an HTTP request hitting your server.
- At first, everything is fine.
- Then the rain gets heavier → more drops, more requests.
- Eventually your application becomes completely soaked — CPU usage spikes, the database struggles, and your server turns into soup.
Not ideal.
What do we do when it’s raining?
We grab an umbrella. (No, not the evil corporation from Resident Evil — a real umbrella.)
That umbrella represents a rate limiter. It doesn’t stop the rain entirely; it simply controls how much rain reaches you. In the same way, a rate limiter allows some requests to pass while blocking the excess ones, protecting your system from overload.
🚫 Why Rate Limiting Matters
Without rate limiting, a client could send:
- 10 requests per second
- 100 requests per second
- 1 000 requests per second
Your application will try to process every request it receives, eventually leading to:
- CPU overload
- Database contention
- Cascading failures
- Full system crash
With rate limiting in place, the server can simply respond with:
HTTP 429 – Too Many Requests
Which is basically your server saying:
“Slow down, my friend.”
📊 Common Rate‑Limiting Strategies
1. Token Bucket
- Idea: A bucket holds a number of tokens. Each incoming request consumes one token.
- Refill: Tokens are added back at a fixed rate.
Example configuration
| Parameter | Value |
|---|---|
| Bucket capacity | 10 tokens |
| Refill rate | 1 token per second |
- Allows short bursts (up to the bucket capacity).
- Once empty, requests must wait for new tokens.
Why popular? It supports bursts while still controlling overall traffic.
2. Leaky Bucket
-
Idea: Requests enter a bucket that leaks at a constant rate (like water through a hole).
-
If the bucket fills up, new requests are rejected.
-
Effect: Forces a steady, predictable request rate, smoothing traffic spikes.
-
Downside: Doesn’t handle bursts as well as the token bucket.
3. Sliding Window Log
-
Idea: Store the timestamp of every request.
-
For a limit of 5 requests per minute, the system checks all timestamps within the last 60 seconds.
-
Pros: Very accurate; always uses the real time window.
-
Cons: Requires storing many timestamps → can be expensive at large scale.
4. Sliding Window Counter (Optimized Sliding Window)
-
Idea: Keep only two counters:
- Requests in the current window
- Requests in the previous window
-
Compute a weighted average between the two counters to estimate the real request rate.
-
Pros: Drastically reduces memory usage while retaining good accuracy.
-
Common use: Large distributed systems.
🛠️ Simple Java Implementations
The following examples are simplified to illustrate the core ideas.
1. Fixed‑Window (One request per second per client)
import java.util.concurrent.ConcurrentHashMap;
public class SimpleRateLimiter {
private final ConcurrentHashMap<String, Long> lastRequest = new ConcurrentHashMap<>();
/** Returns true if the request is allowed */
public boolean allowRequest(String clientId) {
long now = System.currentTimeMillis();
Long last = lastRequest.get(clientId);
if (last == null || now - last > 1000) { // > 1 second
lastRequest.put(clientId, now);
return true;
}
return false;
}
}
2. Token Bucket (capacity = 10)
import java.util.concurrent.atomic.AtomicInteger;
public class TokenBucket {
private final int capacity = 10;
private final AtomicInteger tokens = new AtomicInteger(capacity);
/** Returns true if a token is available */
public boolean allowRequest() {
if (tokens.get() > 0) {
tokens.decrementAndGet();
return true;
}
return false;
}
/** Refill one token – typically called by a scheduled task */
public void refill() {
// refill logic here
}
}
A scheduled task can periodically call refill() (e.g., every second).
3. Using a Library – Resilience4j
import io.github.resilience4j.ratelimiter.RateLimiter;
import io.github.resilience4j.ratelimiter.RateLimiterConfig;
import io.vavr.control.Try;
import java.time.Duration;
import java.util.function.Supplier;
RateLimiterConfig config = RateLimiterConfig.custom()
.limitForPeriod(5) // max 5 calls
.limitRefreshPeriod(Duration.ofSeconds(1))
.timeoutDuration(Duration.ofMillis(0))
.build();
RateLimiter rateLimiter = RateLimiter.of("apiLimiter", config);
Supplier<String> decoratedSupplier =
RateLimiter.decorateSupplier(rateLimiter, () -> "Hello API");
Try.ofSupplier(decoratedSupplier)
.onFailure(e -> System.out.println("Rate limit exceeded"));
- Pros: Integrates nicely with Spring Boot, Micrometer, and other ecosystem tools.
- Cons: Adds an external dependency; you need to understand its configuration options.
📚 Takeaways
- Rate limiting protects your services from overload, abuse, and cascading failures.
- Choose a strategy that matches your traffic pattern:
- Token Bucket → bursty traffic, flexible.
- Leaky Bucket → smooth, constant flow.
- Sliding Window Log → precise limits, higher memory cost.
- Sliding Window Counter → good trade‑off for distributed systems.
- For production, prefer battle‑tested libraries (e.g., Resilience4j, Bucket4j, Spring Cloud Gateway) over hand‑rolled implementations.
Happy coding, and may your APIs stay dry! 🌂
Rate Limiting Overview
Rate limiting can be implemented in several layers of your architecture.
1. API Gateway
Tools like NGINX, Kong, Cloudflare, or AWS API Gateway commonly enforce limits before traffic even reaches your application.
2. Application Layer
Libraries such Resilience4j or Bucket4j allow developers to control request flow directly within the service.
3. Distributed Systems
Redis is often used to share rate‑limit counters across multiple instances.
Rate limiting looks simple on the surface, but once your system begins handling real traffic, it quickly becomes clear how important it is.
A well‑designed rate limiter protects your:
- API
- Infrastructure
- Databases
- Users
And sometimes, the difference between a stable system and an outage is surprisingly simple.
Sometimes your API just needs… a good umbrella ☔