Rate Limiting: Concepts, Algorithms, and Distributed Challenges

Published: (February 8, 2026 at 06:15 AM EST)
8 min read
Source: Dev.to

Source: Dev.to

Introduction

If you’ve ever built an API or backend service, you’ve probably faced one of these problems:

  • One user sends too many requests
  • Bots abusing your endpoints
  • Traffic spikes breaking your service
  • Retries or cron jobs accidentally overloading your system

This blog is about rate limiting, a simple but critical technique used to protect systems from these issues.

In this post we will:

  • Understand why rate limiting is needed
  • Learn how common rate limiting algorithms work
  • See where rate limiting fits in real systems

You don’t need prior knowledge of rate limiting. If you understand basic APIs and requests, you’ll be able to follow along.

Table of Contents

The Problem Rate Limiting Solves

When a request hits your server, it consumes resources such as CPU time, memory, database connections, and network bandwidth. Under normal usage this works fine, but problems start when too many requests arrive at the same time.

Typical causes

  • A single user sending requests in a tight loop
  • Bots hitting public endpoints
  • Retry mechanisms without proper back‑off
  • Sudden traffic spikes after a release or promotion

The server sees all requests as the same; it doesn’t know which request is important and which one is harmful.

Why this becomes a serious problem

If request volume keeps increasing without limits:

  • Response times go up
  • Databases start slowing down
  • Timeouts increase
  • Error rates spike
  • Eventually the service becomes unavailable for everyone

Why we can’t just “scale the server”

Our common reaction is “let’s just add more servers.” Scaling helps, but it does not solve the root problem:

  • Unlimited requests will eventually overwhelm any system
  • Scaling increases cost
  • Databases and third‑party APIs may not scale the same way

If we keep scaling, we only delay failure.

What systems really need

  • A way to control how fast requests are allowed
  • Protection against accidental or intentional abuse
  • Fairness so one user cannot starve others

This is exactly the problem rate limiting is designed to solve.

What Rate Limiting Actually Does

At its core, rate limiting controls how frequently an action is allowed within a given time period. Most commonly, this action is an API request.

A rate limit usually looks like this:

  • Allow 100 requests per minute per user
  • Allow 10 requests per second per IP
  • Allow 1 request per second for a sensitive endpoint

When the limit is reached, the system does not process further requests until enough time has passed.

What happens when a limit is exceeded

  • The request is rejected
  • The server responds immediately
  • Resources are preserved for other users

In HTTP‑based systems this is commonly returned as a 429 Too Many Requests response. Early rejection prevents unnecessary work such as database queries or external API calls.

What rate limiting guarantees

  • Fair usage – One user cannot consume resources meant for everyone else
  • Predictable performance – The system remains responsive even under load
  • Controlled bursts – Some algorithms allow short bursts while still enforcing long‑term limits
  • System protection – Accidental bugs or misbehaving clients are contained early

What rate limiting does NOT do

  • Authentication (it does not verify who the user is)
  • A complete security solution
  • A replacement for proper input validation

It is a traffic‑control mechanism, not a security gate.

Common Rate Limiting Algorithms

Different systems use different rate‑limiting algorithms depending on their needs. No algorithm is universally “best”; each makes different trade‑offs among simplicity, accuracy, and flexibility.

Fixed Window Counter

The simplest form of rate limiting.

How it works

  1. Time is divided into fixed windows (e.g., 1 minute, 1 hour).
  2. For each user, the system keeps a counter for the current window.
  3. Every incoming request increments this counter.
  4. Once the counter reaches the limit, further requests are rejected.
  5. When the window ends, the counter is reset to zero.

Example

  • Limit: 5 requests per minute
  • Window: 12:00 – 12:01

A user sends 5 requests at 12:00:59 → all allowed.
The counter resets at 12:01:00 → another 5 requests are allowed.

This means the user effectively made 10 requests in 1 second.

Fixed‑window example 1

Fixed‑window example 2

Why Fixed Window fails

  • Users can exploit window boundaries.
  • Traffic becomes very bursty.
  • Backend services can suddenly receive huge spikes.
  • The system becomes unfair under load.

When Fixed Window can be acceptable

  • Very low‑traffic systems.
  • Internal tools.
  • Prototypes or demos.
  • Cases where correctness is less important than simplicity.

In most production APIs, Fixed Window is usually avoided.

Sliding Window

Sliding‑window algorithms fix the burst problem of fixed windows by looking at the last N seconds from the current time.

How it works

  1. The system always looks at the last N seconds from the current time.
  2. Every request is evaluated against this rolling window.
  3. The system counts how many requests occurred in the previous N seconds.
  4. If the count exceeds the limit, the request is rejected.

Example

Limit: 100 requests per 60 seconds
  • At any moment, the system checks how many requests happened in the previous 60 seconds.
  • This prevents sudden spikes caused by window resets, as requests are spread more evenly over time.

Sliding window example 1

Sliding window example 2

Pros

  • Much fairer request distribution.
  • Traffic spikes are naturally reduced.
  • No burst problems at window boundaries.

Cons

  • The system needs to store timestamps of requests.
  • Memory usage increases with traffic.
  • More computation per request.

Token Bucket

One of the most commonly used algorithms in production because it balances strict limits with a good user experience.

How it works

  • Each user has a bucket that holds tokens.
  • Tokens are added to the bucket at a fixed rate.
  • Each request consumes one token.
  • If there are no tokens left, the request is rejected.
  • The bucket has a maximum capacity, so tokens cannot grow infinitely.

Example

Bucket size: 10 tokens
Refill rate: 1 token per second
SituationResult
User is idle → bucket fills to 10 tokensReady for a burst
User sends 10 requests instantlyAll allowed
11th requestRejected
After 1 second, 1 token is added1 request allowed

Token bucket example 1

Token bucket example 2

Why it works well

  • Allows short bursts without breaking the system.
  • Enforces long‑term rate limits.
  • Provides a better user experience.
  • Simple and efficient to implement.

Because of these properties, Token Bucket is often the default choice for APIs.

Leaky Bucket

Focuses on producing a smooth and stable output rate.

How it works

  • Incoming requests are placed into a queue (the bucket).
  • Requests leave the queue at a constant, fixed rate.
  • If the queue becomes full, new requests are dropped.

Example

  1. Many requests arrive at once.
  2. The system processes requests at a steady pace.
  3. Extra requests are dropped when the queue is full.

Pros

  • Protects downstream systems very well.
  • Ensures predictable processing speed.
  • Prevents sudden traffic spikes.

Cons for user‑facing APIs

  • Burst requests are delayed or dropped.
  • Latency increases under load.
  • User experience can suffer.

Leaky bucket illustration

Leaky Bucket is more suitable for background jobs and pipelines than for interactive APIs.

Comparing the Algorithms

The goal here is to understand which algorithm fits which situation.

High‑level Comparison

AlgorithmBurst handlingFairnessComplexityCommon usage
Fixed WindowPoorLowVery lowSimple or low‑traffic systems
Sliding WindowGoodHighHighSystems needing accuracy
Token BucketExcellentHighMediumDefault for most public APIs
Leaky BucketExcellent (output)MediumMediumBackground jobs, pipelines

Rate‑Limiting Algorithms Overview

AlgorithmAccuracyBurst CapacityComplexityTypical Use‑case
Token BucketVery goodHighMediumMost APIs
Leaky BucketPoorMediumMediumBackground jobs

Choose the algorithm that best matches your traffic pattern, fairness requirements, and operational constraints.

Challenges in Distributed Systems

So far, everything we discussed assumes a single server. In real‑world applications this is rarely the case—most systems run on multiple servers behind a load balancer. This introduces several important challenges.

0 views
Back to Blog

Related posts

Read more »

Idempotent APIs in Node.js with Redis

!Cover image for Idempotent APIs in Node.js with Redishttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2...

E2E Tests: The Full Stack Check

Part of The Coercion Sagahttps://dev.to/nicolas_vbgh/programming-by-coercion-b5 — making AI write quality code. Backend tests pass. Frontend tests pass. The con...