RLAAS (Rate Limiting As A Service): Rate Limiting Across Modern Systems

Published: (March 15, 2026 at 09:16 PM EDT)
5 min read
Source: Dev.to

Source: Dev.to

The Problem Nobody Talks About

Every engineering team knows they need rate limiting, but most solutions only protect one layer — the API gateway.

What happens to everything else?

Pain pointDescription
Log floodsA bug sends millions of error logs to your observability stack. Costs spike, dashboards break, and on‑call engineers drown in noise.
Metric stormsA chatty service emits 50× the normal Datadog metrics during a deployment. Your bill triples overnight.
Kafka cascadesA slow consumer falls behind. Retries pile up. One service can take down the entire event pipeline.
Sidecar blind‑spotsTraffic between services inside a mesh never hits your gateway, so nothing enforces limits there.
Copy‑paste rate limitingEvery team re‑implements throttling logic in their own service, with their own bugs, edge cases, and no shared policy.

The root cause is the same in every case: rate limiting is treated as a gateway feature, not a platform capability.
That’s what I set out to fix.

Introducing RLAAS

RLAAS is an open‑source, policy‑driven platform written in Go. It applies consistent rate‑limiting decisions across multiple domains — HTTP, gRPC, telemetry, events, and sidecars — using one unified engine.

Instead of scattered, per‑service throttling code, you define policies once and enforce them everywhere.

RLAAS architecture diagram

Core idea: one policy engine, multiple providers, multiple deployment models.

Algorithms It Supports Today

RLAAS doesn’t lock you into a single algorithm. Each policy independently chooses the algorithm that fits its traffic pattern:

  • Token Bucket – A bucket refills tokens at a fixed rate. Requests consume tokens; when empty, requests are throttled. Great for bursty traffic you want to smooth out without hard‑blocking. Example: allow up to 100 API calls per minute with short bursts permitted.
  • Sliding Window – Tracks requests across a continuously rolling time window. Eliminates the “boundary spike” problem where clients fire double the limit by straddling two fixed‑window edges. Best for accurate per‑user and per‑tenant quota enforcement.
  • Fixed Window – Counts requests in a hard time slot (e.g., 0–60 s). Simple, cheap, and predictable. Best when coarse‑grained limits matter more than precision.
  • Leaky Bucket – Enforces a strict, steady output rate regardless of how bursty the input is. Useful for protecting downstream services that can’t handle spikes even if the total volume is within limits.
  • Concurrent Request Limiter – Caps the number of in‑flight requests at any moment. Essential for protecting slow upstream dependencies from being overwhelmed by parallel callers.

A single RLAAS deployment can run all algorithms simultaneously across different policies and resources.

What RLAAS Integrates With

One policy engine, many integration points:

Integration diagram

Decisions — More Than Just Allow or Deny

Most rate limiters return two answers: pass or reject. RLAAS returns five:

Decision matrix

Shadow mode is especially powerful during rollouts. You can observe exactly what would have been throttled before flipping enforcement on — no surprises, no incidents.

Each policy declares its own action, so one policy can DENY abusive callers while another DROP noisy telemetry and a third runs in SHADOW mode while the team validates thresholds.

Three Ways to Deploy It

  1. Embedded SDK – Import the library directly into your service. Zero network hop, full control. Works in Go, Python, Java, and TypeScript.
  2. Centralized Service – Deploy rlaas-server as a shared microservice. All your services call it over gRPC or HTTP to get allow/deny decisions. One place to manage all policies.
  3. Sidecar / Agent – Run rlaas-agent as a sidecar next to your workload. No code changes needed. Intercepts traffic at the infrastructure level. Works with Kubernetes, service meshes, and bare‑metal alike.

How Policies Work

Policies are declarative and version‑controlled. You define who the policy applies to, which algorithm to use, the limit, the window, and what to do when the limit is hit.

{
  "id": "nw-payments-logs",
  "org_id": "northwind",
  "resource": "payments.logs",
  "algorithm": "sliding_window",
  "limit": 5000,
  "window_seconds": 60,
  "action": "drop"
}

No code changes. No redeploys. Policy updates take effect immediately.

Why Open Source?

Rate‑limiting logic is not your competitive advantage. It’s infrastructure — a shared, reusable capability that should be transparent, auditable, and community‑driven. Open‑sourcing RLAAS lets teams collaborate on a robust, battle‑tested solution while keeping the focus of each service on its core business logic.

RLAAS – Reinforcement Learning as a Service

The same way a load balancer or a message queue is infrastructure, RLAAS should be shared, composable, and policy‑driven rather than handcrafted inside each microservice.

RLAAS is MIT‑licensed. Every algorithm, every adapter, and every SDK is open source and built to be extended.

Try It

Docs:
GitHub:

If this solves a problem you’re dealing with, feel free to:

  • Open an issue
  • Contribute an adapter
  • Share it with your team
0 views
Back to Blog

Related posts

Read more »

Travigo

Travel as fast as you speak with Gemini! Where live agents meet immersive storytelling & 3D navigation. This project was created for entering the Gemini Live Ag...

Micro games

Hey Gamers! 👾 As part of the Rapid Games Prototyping module, we are tasked with reviewing a peer's game. The challenge is to analyse a prototype built in just...