How to Design a Rate Limiter in a System Design Interview?
Source: Dev.to
What is a Rate Limiter?
A Rate Limiter is a system component that restricts the number of actions a user (or client) can perform in a given timeframe.
Examples
- API Gateway – Only allow 100 requests per user per minute.
- Login System – Allow only 5 failed attempts in 10 minutes.
- Messaging App – Prevent users from sending more than 20 messages per second.
If users exceed these limits, the system should block their requests (often returning HTTP status code 429 Too Many Requests).

Key Rate‑Limiter System Requirements in Interviews
- Correctness – Requests beyond the limit are rejected.
- Efficiency – Handle millions of requests per second with low latency.
- Scalability – Work in a distributed system across multiple servers.
- Fairness – Avoid loopholes where burst traffic is allowed.
- Configurability – Easy to change limits per user, per API, etc.
You can also ask clarifying questions, e.g., “Do we need a limit per URL and per HTTP method?”
Common Algorithms for Rate Limiting
Fixed Window Counter
- Divide time into fixed windows (e.g., every minute).
- Count requests.
- Simple but can allow bursts at window boundaries.
Sliding Window Log
- Store timestamps of requests in a log (array/queue).
- Remove old timestamps.
- More accurate but requires memory proportional to request volume.
Sliding Window Counter
- Uses counters for the current and previous windows, weighted by time.
- Memory‑efficient, smoother than a fixed window.
Token Bucket / Leaky Bucket
- Tokens are added at a fixed rate, and requests consume tokens.
- Smooths traffic and is widely used in production systems.
How to Design a Rate Limiter in an Interview?
Below are two popular Java solutions that are easy to explain and code on a whiteboard.
1. Sliding Window Log (Array of Timestamps)
This method maintains a queue of timestamps for each request. Before processing a new request:
- Remove timestamps older than the configured time window.
- If the queue size is below the limit, allow the request and insert the new timestamp.
- Otherwise, reject it.

Java implementation
import java.util.ArrayDeque;
import java.util.Deque;
/**
* Simple sliding‑window log rate limiter.
*/
public class SlidingWindowLogRateLimiter {
private final int maxRequests;
private final long windowSizeInMillis;
private final Deque<Long> timestamps = new ArrayDeque<>();
public SlidingWindowLogRateLimiter(int maxRequests, int windowSizeInSeconds) {
this.maxRequests = maxRequests;
this.windowSizeInMillis = windowSizeInSeconds * 1000L;
}
/**
* Returns true if the request is allowed, false otherwise.
*/
public synchronized boolean allowRequest() {
long now = System.currentTimeMillis();
// Discard timestamps that are outside the current window
while (!timestamps.isEmpty() && now - timestamps.peekFirst() > windowSizeInMillis) {
timestamps.pollFirst();
}
if (timestamps.size() < maxRequests) {
timestamps.addLast(now);
return true; // request allowed
}
return false; // request rejected
}
// Demo
public static void main(String[] args) throws InterruptedException {
SlidingWindowLogRateLimiter limiter = new SlidingWindowLogRateLimiter(5, 10); // 5 requests per 10 s
for (int i = 1; i <= 7; i++) {
System.out.println("Request " + i + ": " + (limiter.allowRequest() ? "allowed" : "rejected"));
Thread.sleep(1500);
}
}
}
This solution is perfect for interviews because it’s simple, intuitive, and demonstrates your understanding of sliding‑window rate limiting.
2. Token Bucket
The Token Bucket algorithm is widely used in production (e.g., API gateways, microservices).
- Tokens are added at a fixed rate.
- Each request consumes one token.
- If no tokens are available, the request is rejected.

Java implementation
public class TokenBucket {
private final int capacity;
private final double refillRate; // tokens per second
private double tokens;
private long lastRefillTimestamp; // nanoseconds
public TokenBucket(int capacity, double refillRate) {
this.capacity = capacity;
this.refillRate = refillRate;
this.tokens = capacity;
this.lastRefillTimestamp = System.nanoTime();
}
public synchronized boolean allowRequest() {
long now = System.nanoTime();
double tokensToAdd = ((now - lastRefillTimestamp) / 1e9) * refillRate;
tokens = Math.min(capacity, tokens + tokensToAdd);
lastRefillTimestamp = now;
if (tokens >= 1) {
tokens -= 1;
return true; // request allowed
}
return false; // request rejected
}
// Demo
public static void main(String[] args) throws InterruptedException {
TokenBucket bucket = new TokenBucket(10, 5); // burst up to 10, 5 tokens/sec refill
for (int i = 1; i <= 15; i++) {
System.out.println("Request " + i + ": " + (bucket.allowRequest() ? "allowed" : "rejected"));
Thread.sleep(200);
}
}
}
This implementation is thread‑safe and works well under concurrent loads.
Interview Strategy (for Java Developers)
When asked “How would you design a rate limiter?” you can progress through these points:
- Fixed‑Window Counter – easy to explain, but has edge‑case issues.
- Sliding‑Window Log – use a
Deque<Long>to keep timestamps. - Token Bucket – production‑grade solution for burst handling.
- Distributed Rate Limiting – mention Redis‑based counters or API‑gateway features (e.g., Nginx, Envoy).
Covering breadth (knowledge of algorithms) and depth (working Java code) demonstrates strong system‑design skills.
System Design Interview Preparation Material
ByteByteGo
- Website: https://bytebytego.com/?fpr=javarevisited
- Offers system‑design books and a platform for comprehensive preparation.

Codemia.io
- Website: https://codemia.io/?via=javarevisited
- 120+ system‑design problems, many free, with editorial solutions and practice tools.

Exponent
- Website: https://bit.ly/3cNF0vw
- Specialized courses, mock interviews, and system‑design material for top tech companies.
Typical prep combo
- ByteByteGo – theory & fundamentals.
- Codemia – hands‑on practice.
- Exponent – mock interviews and interview‑specific coaching.
Final Thoughts
Rate limiting is one of those interview questions that tests both your algorithm knowledge and system‑design intuition.
- For a clean, interview‑ready answer, use the Sliding Window Log approach (e.g., a
Dequein Java). - To demonstrate production‑grade expertise, also discuss the Token Bucket algorithm.
That way you cover both the practical coding side and the broader system‑design perspective in a single answer.