Rate Limiting: How to Stop Your API From Drowning in Requests

Published: 1 month ago (March 10, 2026 at 02:34 PM EDT)

6 min read

Source: Dev.to

Source: Dev.to

Introduction

Hello! I’m Jairo
Your favorite dev.to writer.
Just kidding — I know I’m not. Just breaking the ice 😄

Last week I was reading an excellent book called System Design Interview by Alex Xu. If you work with backend systems and haven’t read it yet, you probably should.

One concept from the book reminded me of something interesting about software engineers: we all know rate limiting exists, but very few engineers really understand when to use it, how it works internally, and which strategy to choose.

So today let’s talk about one of the most important protections your API can have: rate limiting.

🌧️ A Simple Analogy

Your application is a person walking in the rain, and every raindrop represents an HTTP request hitting your server.

At first, everything is fine.
Then the rain gets heavier → more drops, more requests.
Eventually your application becomes completely soaked — CPU usage spikes, the database struggles, and your server turns into soup.

Not ideal.

What do we do when it’s raining?
We grab an umbrella. (No, not the evil corporation from Resident Evil — a real umbrella.)

That umbrella represents a rate limiter. It doesn’t stop the rain entirely; it simply controls how much rain reaches you. In the same way, a rate limiter allows some requests to pass while blocking the excess ones, protecting your system from overload.

🚫 Why Rate Limiting Matters

Without rate limiting, a client could send:

10 requests per second
100 requests per second
1 000 requests per second

Your application will try to process every request it receives, eventually leading to:

CPU overload
Database contention
Cascading failures
Full system crash

With rate limiting in place, the server can simply respond with:

HTTP 429 – Too Many Requests

Which is basically your server saying:

“Slow down, my friend.”

📊 Common Rate‑Limiting Strategies

1. Token Bucket

Idea: A bucket holds a number of tokens. Each incoming request consumes one token.
Refill: Tokens are added back at a fixed rate.

Example configuration

Parameter	Value
Bucket capacity	10 tokens
Refill rate	1 token per second

Allows short bursts (up to the bucket capacity).
Once empty, requests must wait for new tokens.

Why popular? It supports bursts while still controlling overall traffic.

2. Leaky Bucket

Idea: Requests enter a bucket that leaks at a constant rate (like water through a hole).
If the bucket fills up, new requests are rejected.
Effect: Forces a steady, predictable request rate, smoothing traffic spikes.
Downside: Doesn’t handle bursts as well as the token bucket.

3. Sliding Window Log

Idea: Store the timestamp of every request.
For a limit of 5 requests per minute, the system checks all timestamps within the last 60 seconds.
Pros: Very accurate; always uses the real time window.
Cons: Requires storing many timestamps → can be expensive at large scale.

4. Sliding Window Counter (Optimized Sliding Window)

Idea: Keep only two counters:
1. Requests in the current window
2. Requests in the previous window
Compute a weighted average between the two counters to estimate the real request rate.
Pros: Drastically reduces memory usage while retaining good accuracy.
Common use: Large distributed systems.

🛠️ Simple Java Implementations

The following examples are simplified to illustrate the core ideas.

1. Fixed‑Window (One request per second per client)

import java.util.concurrent.ConcurrentHashMap;

public class SimpleRateLimiter {

    private final ConcurrentHashMap<String, Long> lastRequest = new ConcurrentHashMap<>();

    /** Returns true if the request is allowed */
    public boolean allowRequest(String clientId) {
        long now = System.currentTimeMillis();
        Long last = lastRequest.get(clientId);

        if (last == null || now - last > 1000) { // > 1 second
            lastRequest.put(clientId, now);
            return true;
        }
        return false;
    }
}

2. Token Bucket (capacity = 10)

import java.util.concurrent.atomic.AtomicInteger;

public class TokenBucket {

    private final int capacity = 10;
    private final AtomicInteger tokens = new AtomicInteger(capacity);

    /** Returns true if a token is available */
    public boolean allowRequest() {
        if (tokens.get() > 0) {
            tokens.decrementAndGet();
            return true;
        }
        return false;
    }

    /** Refill one token – typically called by a scheduled task */
    public void refill() {
        // refill logic here
    }
}

A scheduled task can periodically call refill() (e.g., every second).

3. Using a Library – Resilience4j

import io.github.resilience4j.ratelimiter.RateLimiter;
import io.github.resilience4j.ratelimiter.RateLimiterConfig;
import io.vavr.control.Try;

import java.time.Duration;
import java.util.function.Supplier;

RateLimiterConfig config = RateLimiterConfig.custom()
        .limitForPeriod(5)                     // max 5 calls
        .limitRefreshPeriod(Duration.ofSeconds(1))
        .timeoutDuration(Duration.ofMillis(0))
        .build();

RateLimiter rateLimiter = RateLimiter.of("apiLimiter", config);

Supplier<String> decoratedSupplier =
        RateLimiter.decorateSupplier(rateLimiter, () -> "Hello API");

Try.ofSupplier(decoratedSupplier)
   .onFailure(e -> System.out.println("Rate limit exceeded"));

Pros: Integrates nicely with Spring Boot, Micrometer, and other ecosystem tools.
Cons: Adds an external dependency; you need to understand its configuration options.

📚 Takeaways

Rate limiting protects your services from overload, abuse, and cascading failures.
Choose a strategy that matches your traffic pattern:
- Token Bucket → bursty traffic, flexible.
- Leaky Bucket → smooth, constant flow.
- Sliding Window Log → precise limits, higher memory cost.
- Sliding Window Counter → good trade‑off for distributed systems.
For production, prefer battle‑tested libraries (e.g., Resilience4j, Bucket4j, Spring Cloud Gateway) over hand‑rolled implementations.

Happy coding, and may your APIs stay dry! 🌂

Rate Limiting Overview

Rate limiting can be implemented in several layers of your architecture.

1. API Gateway

Tools like NGINX, Kong, Cloudflare, or AWS API Gateway commonly enforce limits before traffic even reaches your application.

2. Application Layer

Libraries such Resilience4j or Bucket4j allow developers to control request flow directly within the service.

3. Distributed Systems

Redis is often used to share rate‑limit counters across multiple instances.

Rate limiting looks simple on the surface, but once your system begins handling real traffic, it quickly becomes clear how important it is.

A well‑designed rate limiter protects your:

API
Infrastructure
Databases
Users

And sometimes, the difference between a stable system and an outage is surprisingly simple.

Sometimes your API just needs… a good umbrella ☔

Rate Limiting: How to Stop Your API From Drowning in Requests

Introduction

🌧️ A Simple Analogy

🚫 Why Rate Limiting Matters

📊 Common Rate‑Limiting Strategies

1. Token Bucket

2. Leaky Bucket

3. Sliding Window Log

4. Sliding Window Counter (Optimized Sliding Window)

🛠️ Simple Java Implementations

1. Fixed‑Window (One request per second per client)

2. Token Bucket (capacity = 10)

3. Using a Library – Resilience4j

📚 Takeaways

Rate Limiting Overview

1. API Gateway

2. Application Layer

3. Distributed Systems

Related posts

Complete Guide to NestJS Real Estate API - Updated

Designing Uber: Geospatial Indexing, WebSockets, and Distributed Locks

Claude's take on the Slawk Codebase (14-day build)

🚀 Construyendo mi primer SaaS con FastAPI y PostgreSQL: los errores reales que casi rompen mi backend

Introduction

🌧️ A Simple Analogy

🚫 Why Rate Limiting Matters

📊 Common Rate‑Limiting Strategies

1. Token Bucket

2. Leaky Bucket

3. Sliding Window Log

4. Sliding Window Counter (Optimized Sliding Window)

🛠️ Simple Java Implementations

1. Fixed‑Window (One request per second per client)

2. Token Bucket (capacity = 10)

3. Using a Library – Resilience4j

📚 Takeaways

Rate Limiting Overview

1. API Gateway

2. Application Layer

3. Distributed Systems

Related posts

Complete Guide to NestJS Real Estate API - Updated

Designing Uber: Geospatial Indexing, WebSockets, and Distributed Locks

Claude's take on the Slawk Codebase (14-day build)

🚀 Construyendo mi primer SaaS con FastAPI y PostgreSQL: los errores reales que casi rompen mi backend

2. Token Bucket (capacity = 10)