Rate Limiting: Concepts, Algorithms, and Distributed Challenges

Published: 3 days ago (February 8, 2026 at 06:15 AM EST)

8 min read

Source: Dev.to

Source: Dev.to

Introduction

If you’ve ever built an API or backend service, you’ve probably faced one of these problems:

One user sends too many requests
Bots abusing your endpoints
Traffic spikes breaking your service
Retries or cron jobs accidentally overloading your system

This blog is about rate limiting, a simple but critical technique used to protect systems from these issues.

In this post we will:

Understand why rate limiting is needed
Learn how common rate limiting algorithms work
See where rate limiting fits in real systems

You don’t need prior knowledge of rate limiting. If you understand basic APIs and requests, you’ll be able to follow along.

The Problem Rate Limiting Solves
What Rate Limiting Actually Does
Common Rate Limiting Algorithms
- Fixed Window Counter
- Sliding Window
- Token Bucket
- Leaky Bucket
Comparing the Algorithms
Rate‑Limiting Algorithms Overview
Challenges in Distributed Systems
Key Takeaway

The Problem Rate Limiting Solves

When a request hits your server, it consumes resources such as CPU time, memory, database connections, and network bandwidth. Under normal usage this works fine, but problems start when too many requests arrive at the same time.

Typical causes

A single user sending requests in a tight loop
Bots hitting public endpoints
Retry mechanisms without proper back‑off
Sudden traffic spikes after a release or promotion

The server sees all requests as the same; it doesn’t know which request is important and which one is harmful.

Why this becomes a serious problem

If request volume keeps increasing without limits:

Response times go up
Databases start slowing down
Timeouts increase
Error rates spike
Eventually the service becomes unavailable for everyone

Why we can’t just “scale the server”

Our common reaction is “let’s just add more servers.” Scaling helps, but it does not solve the root problem:

Unlimited requests will eventually overwhelm any system
Scaling increases cost
Databases and third‑party APIs may not scale the same way

If we keep scaling, we only delay failure.

What systems really need

A way to control how fast requests are allowed
Protection against accidental or intentional abuse
Fairness so one user cannot starve others

This is exactly the problem rate limiting is designed to solve.

What Rate Limiting Actually Does

At its core, rate limiting controls how frequently an action is allowed within a given time period. Most commonly, this action is an API request.

A rate limit usually looks like this:

Allow 100 requests per minute per user
Allow 10 requests per second per IP
Allow 1 request per second for a sensitive endpoint

When the limit is reached, the system does not process further requests until enough time has passed.

What happens when a limit is exceeded

The request is rejected
The server responds immediately
Resources are preserved for other users

In HTTP‑based systems this is commonly returned as a 429 Too Many Requests response. Early rejection prevents unnecessary work such as database queries or external API calls.

What rate limiting guarantees

Fair usage – One user cannot consume resources meant for everyone else
Predictable performance – The system remains responsive even under load
Controlled bursts – Some algorithms allow short bursts while still enforcing long‑term limits
System protection – Accidental bugs or misbehaving clients are contained early

What rate limiting does NOT do

Authentication (it does not verify who the user is)
A complete security solution
A replacement for proper input validation

It is a traffic‑control mechanism, not a security gate.

Common Rate Limiting Algorithms

Different systems use different rate‑limiting algorithms depending on their needs. No algorithm is universally “best”; each makes different trade‑offs among simplicity, accuracy, and flexibility.

Fixed Window Counter

The simplest form of rate limiting.

How it works

Time is divided into fixed windows (e.g., 1 minute, 1 hour).
For each user, the system keeps a counter for the current window.
Every incoming request increments this counter.
Once the counter reaches the limit, further requests are rejected.
When the window ends, the counter is reset to zero.

Example

Limit: 5 requests per minute
Window: 12:00 – 12:01

A user sends 5 requests at 12:00:59 → all allowed.
The counter resets at 12:01:00 → another 5 requests are allowed.

This means the user effectively made 10 requests in 1 second.

Fixed‑window example 1

Fixed‑window example 2

Why Fixed Window fails

Users can exploit window boundaries.
Traffic becomes very bursty.
Backend services can suddenly receive huge spikes.
The system becomes unfair under load.

When Fixed Window can be acceptable

Very low‑traffic systems.
Internal tools.
Prototypes or demos.
Cases where correctness is less important than simplicity.

In most production APIs, Fixed Window is usually avoided.

Sliding Window

Sliding‑window algorithms fix the burst problem of fixed windows by looking at the last N seconds from the current time.

How it works

The system always looks at the last N seconds from the current time.
Every request is evaluated against this rolling window.
The system counts how many requests occurred in the previous N seconds.
If the count exceeds the limit, the request is rejected.

Example

Limit: 100 requests per 60 seconds

At any moment, the system checks how many requests happened in the previous 60 seconds.
This prevents sudden spikes caused by window resets, as requests are spread more evenly over time.

Sliding window example 1

Sliding window example 2

Pros

Much fairer request distribution.
Traffic spikes are naturally reduced.
No burst problems at window boundaries.

Cons

The system needs to store timestamps of requests.
Memory usage increases with traffic.
More computation per request.

Token Bucket

One of the most commonly used algorithms in production because it balances strict limits with a good user experience.

How it works

Each user has a bucket that holds tokens.
Tokens are added to the bucket at a fixed rate.
Each request consumes one token.
If there are no tokens left, the request is rejected.
The bucket has a maximum capacity, so tokens cannot grow infinitely.

Example

Bucket size: 10 tokens
Refill rate: 1 token per second

Situation	Result
User is idle → bucket fills to 10 tokens	Ready for a burst
User sends 10 requests instantly	All allowed
11th request	Rejected
After 1 second, 1 token is added	1 request allowed

Token bucket example 1

Token bucket example 2

Why it works well

Allows short bursts without breaking the system.
Enforces long‑term rate limits.
Provides a better user experience.
Simple and efficient to implement.

Because of these properties, Token Bucket is often the default choice for APIs.

Leaky Bucket

Focuses on producing a smooth and stable output rate.

How it works

Incoming requests are placed into a queue (the bucket).
Requests leave the queue at a constant, fixed rate.
If the queue becomes full, new requests are dropped.

Example

Many requests arrive at once.
The system processes requests at a steady pace.
Extra requests are dropped when the queue is full.

Pros

Protects downstream systems very well.
Ensures predictable processing speed.
Prevents sudden traffic spikes.

Cons for user‑facing APIs

Burst requests are delayed or dropped.
Latency increases under load.
User experience can suffer.

Leaky bucket illustration

Leaky Bucket is more suitable for background jobs and pipelines than for interactive APIs.

Comparing the Algorithms

The goal here is to understand which algorithm fits which situation.

High‑level Comparison

Algorithm	Burst handling	Fairness	Complexity	Common usage
Fixed Window	Poor	Low	Very low	Simple or low‑traffic systems
Sliding Window	Good	High	High	Systems needing accuracy
Token Bucket	Excellent	High	Medium	Default for most public APIs
Leaky Bucket	Excellent (output)	Medium	Medium	Background jobs, pipelines

Rate‑Limiting Algorithms Overview

Algorithm	Accuracy	Burst Capacity	Complexity	Typical Use‑case
Token Bucket	Very good	High	Medium	Most APIs
Leaky Bucket	Poor	Medium	Medium	Background jobs

Choose the algorithm that best matches your traffic pattern, fairness requirements, and operational constraints.

Challenges in Distributed Systems

So far, everything we discussed assumes a single server. In real‑world applications this is rarely the case—most systems run on multiple servers behind a load balancer. This introduces several important challenges.

Introduction

Table of Contents

The Problem Rate Limiting Solves

Typical causes

Why this becomes a serious problem

Why we can’t just “scale the server”

What systems really need

What Rate Limiting Actually Does

What happens when a limit is exceeded

What rate limiting guarantees

What rate limiting does NOT do

Common Rate Limiting Algorithms

Fixed Window Counter

How it works

Example

Why Fixed Window fails

When Fixed Window can be acceptable

Sliding Window

How it works

Example

Pros

Cons

Token Bucket

How it works

Example

Why it works well

Leaky Bucket

How it works

Example

Pros

Cons for user‑facing APIs

Comparing the Algorithms

High‑level Comparison

Rate‑Limiting Algorithms Overview

Challenges in Distributed Systems

Related posts

Idempotent APIs in Node.js with Redis

E2E Tests: The Full Stack Check

5 Redis Patterns Every Developer Should Know

Scalable SaaS Blueprints: Build Once, Grow Forever (No Regrets)