How to Design a Rate Limiter in a System Design Interview?

Published: (December 19, 2025 at 03:02 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

What is a Rate Limiter?

A Rate Limiter is a system component that restricts the number of actions a user (or client) can perform in a given timeframe.

Examples

  • API Gateway – Only allow 100 requests per user per minute.
  • Login System – Allow only 5 failed attempts in 10 minutes.
  • Messaging App – Prevent users from sending more than 20 messages per second.

If users exceed these limits, the system should block their requests (often returning HTTP status code 429 Too Many Requests).

Rate limiting system design

Key Rate‑Limiter System Requirements in Interviews

  • Correctness – Requests beyond the limit are rejected.
  • Efficiency – Handle millions of requests per second with low latency.
  • Scalability – Work in a distributed system across multiple servers.
  • Fairness – Avoid loopholes where burst traffic is allowed.
  • Configurability – Easy to change limits per user, per API, etc.

You can also ask clarifying questions, e.g., “Do we need a limit per URL and per HTTP method?”

Common Algorithms for Rate Limiting

Fixed Window Counter

  • Divide time into fixed windows (e.g., every minute).
  • Count requests.
  • Simple but can allow bursts at window boundaries.

Sliding Window Log

  • Store timestamps of requests in a log (array/queue).
  • Remove old timestamps.
  • More accurate but requires memory proportional to request volume.

Sliding Window Counter

  • Uses counters for the current and previous windows, weighted by time.
  • Memory‑efficient, smoother than a fixed window.

Token Bucket / Leaky Bucket

  • Tokens are added at a fixed rate, and requests consume tokens.
  • Smooths traffic and is widely used in production systems.

How to Design a Rate Limiter in an Interview?

Below are two popular Java solutions that are easy to explain and code on a whiteboard.

1. Sliding Window Log (Array of Timestamps)

This method maintains a queue of timestamps for each request. Before processing a new request:

  1. Remove timestamps older than the configured time window.
  2. If the queue size is below the limit, allow the request and insert the new timestamp.
  3. Otherwise, reject it.

Sliding Window Log animation

Java implementation

import java.util.ArrayDeque;
import java.util.Deque;

/**
 * Simple sliding‑window log rate limiter.
 */
public class SlidingWindowLogRateLimiter {
    private final int maxRequests;
    private final long windowSizeInMillis;
    private final Deque<Long> timestamps = new ArrayDeque<>();

    public SlidingWindowLogRateLimiter(int maxRequests, int windowSizeInSeconds) {
        this.maxRequests = maxRequests;
        this.windowSizeInMillis = windowSizeInSeconds * 1000L;
    }

    /**
     * Returns true if the request is allowed, false otherwise.
     */
    public synchronized boolean allowRequest() {
        long now = System.currentTimeMillis();

        // Discard timestamps that are outside the current window
        while (!timestamps.isEmpty() && now - timestamps.peekFirst() > windowSizeInMillis) {
            timestamps.pollFirst();
        }

        if (timestamps.size() < maxRequests) {
            timestamps.addLast(now);
            return true;   // request allowed
        }
        return false;      // request rejected
    }

    // Demo
    public static void main(String[] args) throws InterruptedException {
        SlidingWindowLogRateLimiter limiter = new SlidingWindowLogRateLimiter(5, 10); // 5 requests per 10 s
        for (int i = 1; i <= 7; i++) {
            System.out.println("Request " + i + ": " + (limiter.allowRequest() ? "allowed" : "rejected"));
            Thread.sleep(1500);
        }
    }
}

This solution is perfect for interviews because it’s simple, intuitive, and demonstrates your understanding of sliding‑window rate limiting.

2. Token Bucket

The Token Bucket algorithm is widely used in production (e.g., API gateways, microservices).

  • Tokens are added at a fixed rate.
  • Each request consumes one token.
  • If no tokens are available, the request is rejected.

how to design a rate limiter in System design interview

Java implementation

public class TokenBucket {
    private final int capacity;
    private final double refillRate; // tokens per second
    private double tokens;
    private long lastRefillTimestamp; // nanoseconds

    public TokenBucket(int capacity, double refillRate) {
        this.capacity = capacity;
        this.refillRate = refillRate;
        this.tokens = capacity;
        this.lastRefillTimestamp = System.nanoTime();
    }

    public synchronized boolean allowRequest() {
        long now = System.nanoTime();
        double tokensToAdd = ((now - lastRefillTimestamp) / 1e9) * refillRate;
        tokens = Math.min(capacity, tokens + tokensToAdd);
        lastRefillTimestamp = now;

        if (tokens >= 1) {
            tokens -= 1;
            return true;    // request allowed
        }
        return false;       // request rejected
    }

    // Demo
    public static void main(String[] args) throws InterruptedException {
        TokenBucket bucket = new TokenBucket(10, 5); // burst up to 10, 5 tokens/sec refill
        for (int i = 1; i <= 15; i++) {
            System.out.println("Request " + i + ": " + (bucket.allowRequest() ? "allowed" : "rejected"));
            Thread.sleep(200);
        }
    }
}

This implementation is thread‑safe and works well under concurrent loads.

Interview Strategy (for Java Developers)

When asked “How would you design a rate limiter?” you can progress through these points:

  1. Fixed‑Window Counter – easy to explain, but has edge‑case issues.
  2. Sliding‑Window Log – use a Deque<Long> to keep timestamps.
  3. Token Bucket – production‑grade solution for burst handling.
  4. Distributed Rate Limiting – mention Redis‑based counters or API‑gateway features (e.g., Nginx, Envoy).

Covering breadth (knowledge of algorithms) and depth (working Java code) demonstrates strong system‑design skills.

System Design Interview Preparation Material

ByteByteGo

best resources for system design interview

Codemia.io

best resources to crack system design interview

Exponent

  • Website: https://bit.ly/3cNF0vw
  • Specialized courses, mock interviews, and system‑design material for top tech companies.

Typical prep combo

  1. ByteByteGo – theory & fundamentals.
  2. Codemia – hands‑on practice.
  3. Exponent – mock interviews and interview‑specific coaching.

Final Thoughts

Rate limiting is one of those interview questions that tests both your algorithm knowledge and system‑design intuition.

  • For a clean, interview‑ready answer, use the Sliding Window Log approach (e.g., a Deque in Java).
  • To demonstrate production‑grade expertise, also discuss the Token Bucket algorithm.

That way you cover both the practical coding side and the broader system‑design perspective in a single answer.

Back to Blog

Related posts

Read more »