[Paper] Designing Scalable Rate Limiting Systems: Algorithms, Architecture, and Distributed Solutions

Published: (February 12, 2026 at 04:11 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.11741v1

Overview

Bo Guan’s paper tackles a problem that every modern API provider wrestles with: how to enforce rate limits that stay accurate, highly available, and horizontally scalable in a distributed environment. By marrying a classic sliding‑window algorithm with Redis’s sorted‑set data structure and Lua scripting, the author delivers a production‑ready design that can be dropped into large‑scale services without sacrificing latency or consistency.

Key Contributions

  • Concrete architecture for a distributed rate‑limiting service built on Redis (single‑node and Redis Cluster).
  • Quantitative analysis of the trade‑offs among three popular algorithms—Fixed Window, Token Bucket, and Rolling (Sliding) Window—focusing on accuracy vs. memory footprint.
  • Lua‑scripted atomic operations that combine cleanup, counting, and insertion, eliminating race conditions in high‑concurrency scenarios.
  • Three‑layer rule‑management model that separates rule storage, rule compilation (script hashing), and enforcement, enabling hot‑swap of rate‑limit policies without service restarts.
  • CAP‑theoretic justification for choosing Availability + Partition‑tolerance (AP) over strict consistency, with a discussion of how eventual consistency is acceptable for rate limiting.

Methodology

  1. Algorithm selection – The author implements a Rolling Window algorithm because it offers the best balance of precision (no burst‑spike “leakage”) while keeping per‑client state modest.
  2. Data structure choice – Redis Sorted Sets (ZSETs) store timestamps as scores and request identifiers as members. Insertion, range‑query, and deletion all run in O(log N), where N is the number of recent requests for a key.
  3. Atomic enforcement – A Lua script runs server‑side, performing three steps in one transaction:
    • Remove entries older than the window (cleanup).
    • Count remaining entries (current usage).
    • Insert the new request if the limit isn’t exceeded.
      This guarantees that concurrent requests can’t interleave and cause over‑counting.
  4. Rule management – Rate‑limit policies are stored in a separate Redis hash. When a rule changes, the system re‑hashes the rule parameters and loads a new Lua script version, leaving the existing cached script untouched.
  5. Scalability testing – The design is deployed on a Redis Cluster with sharding and replica nodes. Experiments measure latency, memory usage, and error rates under varying request loads and cluster sizes.

Results & Findings

MetricFixed WindowToken BucketRolling Window (proposed)
Max error (requests over limit)Up to 100 % burstUp to 50 % burst< 1 % (practically exact)
Memory per client (bytes)~8~12~16 (due to timestamp storage)
Operation latency (µs)30‑4035‑4545‑55 (still sub‑millisecond)
Scalability on 6‑node clusterLinear up to 10 k QPSLinear up to 12 k QPSLinear up to 15 k QPS

Key takeaways: the Rolling Window delivers near‑perfect accuracy with only a modest increase in memory and latency, and it scales cleanly across Redis shards. The Lua‑scripted atomic path eliminates race conditions that plagued earlier implementations.

Practical Implications

  • API Gateways & Edge Services – Plug‑in the Lua script into existing Redis‑backed gateways (e.g., Kong, Envoy) to get precise per‑client throttling without redesigning the request path.
  • Microservice Meshes – Because the solution is AP‑oriented, it tolerates network partitions; services continue to enforce limits locally, and eventual convergence prevents “soft‑locks.”
  • Cost‑Effective Scaling – Using Redis’s native sharding means you can add nodes to handle higher QPS without rewriting business logic; the same Lua script works across the cluster.
  • Dynamic Policy Updates – Ops teams can adjust limits on‑the‑fly (e.g., during a product launch) by updating the rule hash; the system automatically picks up the new script version, avoiding downtime.
  • Observability – The design naturally exposes counters (current window size) that can be scraped for dashboards, enabling real‑time alerting on abuse spikes.

Limitations & Future Work

  • Eventual Consistency – In rare partition scenarios, a client might briefly exceed its quota before replicas reconcile; the paper notes this as an acceptable risk but suggests tighter sync for high‑value APIs.
  • Memory Growth for Hot Keys – Extremely bursty clients can inflate the sorted set size; adaptive bucketization or hybrid token‑bucket fallback could mitigate this.
  • Multi‑dimensional Limits – Current design handles a single dimension (e.g., requests per second). Extending to composite limits (IP + user + endpoint) would require additional indexing strategies.
  • Benchmark Diversity – Tests focus on synthetic workloads; future work could evaluate real‑world traffic patterns, including long‑tail distributions and mixed read/write mixes.

Bottom line: Guan’s architecture demonstrates that with the right combination of data structures, scripting, and cluster design, you can build a rate limiter that is precise, low‑latency, and ready for the cloud‑native scale that modern developers demand.

Authors

  • Bo Guan

Paper Information

  • arXiv ID: 2602.11741v1
  • Categories: cs.DC, cs.DB, cs.PF, cs.SE
  • Published: February 12, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »