限流：如何防止你的 API 被请求淹没

发布: 1个月前 (2026年3月11日 GMT+8 02:34)

8 分钟阅读

原文: Dev.to

I’m happy to help translate the article, but I need the full text you’d like translated. Could you please paste the content (or the portion you want translated) here? I’ll keep the source line exactly as you provided and preserve all formatting, markdown, and code blocks in the translation.

Introduction

Hello! I’m Jairo
Your favorite dev.to writer.
Just kidding — I know I’m not. Just breaking the ice 😄

上周我在阅读一本非常棒的书，叫做 System Design Interview，作者是 Alex Xu。如果你从事后端系统的工作，但还没有读过这本书，建议你去看看。

书中的一个概念让我想起了软件工程师的一个有趣现象：我们都知道限流的存在，但真正了解 何时使用限流、它在内部是如何工作的以及该选择哪种策略 的工程师却寥寥无几。

所以今天我们来聊聊你的 API 可以拥有的最重要的保护之一：限流。

🌧️ 简单类比

您的应用就像在雨中行走的人，每一滴雨滴代表一次请求打到您的服务器。

起初，一切正常。
然后雨势加大 → 雨滴增多，请求增多。
最终，您的应用被淋得透湿——CPU 使用率飙升，数据库吃力不堪，服务器变成了一锅汤。

这可不理想。

下雨时我们该怎么办？
我们拿把伞。（不是《生化危机》里的邪恶公司——是真正的伞。）

这把伞代表 限流器。它并不会完全阻止雨水，而是控制有多少雨水能到达您。同样，限流器让部分请求通过，同时拦截多余的请求，保护系统不被过载。

🚫 为什么速率限制很重要

如果没有速率限制，客户端可能会发送：

每秒 10 个请求
每秒 100 个请求
每秒 1 000 个请求

你的应用程序会尝试处理它收到的每个请求，最终导致：

CPU 过载
数据库争用
连锁故障
系统完全崩溃

在实施速率限制后，服务器可以简单地响应：

HTTP 429 – Too Many Requests

这基本上相当于你的服务器在说：

“慢一点，我的朋友。”

Source: …

📊 常见限流策略

1. Token Bucket

Idea: A bucket holds a number of tokens. Each incoming request consumes one token.
Refill: Tokens are added back at a fixed rate.

示例配置

参数	值
Bucket capacity	10 tokens
Refill rate	1 token per second

允许短时间的突发（最多到桶的容量）。
桶空后，请求必须等待新的 token。

为什么受欢迎？ 它在仍然控制整体流量的同时支持突发流量。

2. Leaky Bucket

Idea: Requests enter a bucket that leaks at a constant rate (like water through a hole).
If the bucket fills up, new requests are rejected.
Effect: Forces a steady, predictable request rate, smoothing traffic spikes.
Downside: Doesn’t handle bursts as well as the token bucket.

3. Sliding Window Log

Idea: Store the timestamp of every request.
For a limit of 5 requests per minute, the system checks all timestamps within the last 60 seconds.
Pros: Very accurate; always uses the real time window.
Cons: Requires storing many timestamps → can be expensive at large scale.

4. Sliding Window Counter (Optimized Sliding Window)

Idea: Keep only two counters:
1. Requests in the current window
2. Requests in the previous window
Compute a weighted average between the two counters to estimate the real request rate.
Pros: Drastically reduces memory usage while retaining good accuracy.
Common use: Large distributed systems.

🛠️ 简单的 Java 实现

以下示例已简化，以说明核心思路。

1. 固定窗口（每个客户端每秒一次请求）

import java.util.concurrent.ConcurrentHashMap;

public class SimpleRateLimiter {

    private final ConcurrentHashMap<String, Long> lastRequest = new ConcurrentHashMap<>();

    /** Returns true if the request is allowed */
    public boolean allowRequest(String clientId) {
        long now = System.currentTimeMillis();
        Long last = lastRequest.get(clientId);

        if (last == null || now - last > 1000) { // > 1 second
            lastRequest.put(clientId, now);
            return true;
        }
        return false;
    }
}

2. 令牌桶（容量 = 10）

import java.util.concurrent.atomic.AtomicInteger;

public class TokenBucket {

    private final int capacity = 10;
    private final AtomicInteger tokens = new AtomicInteger(capacity);

    /** Returns true if a token is available */
    public boolean allowRequest() {
        if (tokens.get() > 0) {
            tokens.decrementAndGet();
            return true;
        }
        return false;
    }

    /** Refill one token – typically called by a scheduled task */
    public void refill() {
        // refill logic here
    }
}

可以通过计划任务定期调用 refill()（例如，每秒一次）。

3. 使用库 – Resilience4j

import io.github.resilience4j.ratelimiter.RateLimiter;
import io.github.resilience4j.ratelimiter.RateLimiterConfig;
import io.vavr.control.Try;

import java.time.Duration;
import java.util.function.Supplier;

RateLimiterConfig config = RateLimiterConfig.custom()
        .limitForPeriod(5)                     // max 5 calls
        .limitRefreshPeriod(Duration.ofSeconds(1))
        .timeoutDuration(Duration.ofMillis(0))
        .build();

RateLimiter rateLimiter = RateLimiter.of("apiLimiter", config);

Supplier<String> decoratedSupplier =
        RateLimiter.decorateSupplier(rateLimiter, () -> "Hello API");

Try.ofSupplier(decoratedSupplier)
   .onFailure(e -> System.out.println("Rate limit exceeded"));

优点： 与 Spring Boot、Micrometer 以及其他生态系统工具集成良好。
缺点： 增加了外部依赖；需要了解其配置选项。

📚 要点

速率限制 保护你的服务免受过载、滥用和级联故障的影响。
选择与流量模式匹配的策略：
- 令牌桶 → 突发流量，灵活。
- 漏桶 → 平滑、恒定流量。
- 滑动窗口日志 → 精确限制，内存开销更高。
- 滑动窗口计数器 → 分布式系统的良好折中。
在生产环境中，优先使用经过实战检验的库（例如 Resilience4j、Bucket4j、Spring Cloud Gateway），而不是自行实现。

祝编码愉快，愿你的 API 保持干燥！ 🌂

限流概述

限流可以在架构的多个层面实现。

1. API 网关

像 NGINX、Kong、Cloudflare 或 AWS API Gateway 之类的工具通常在流量到达应用程序之前就强制执行限制。

2. 应用层

Resilience4j 或 Bucket4j 等库允许开发者直接在服务内部控制请求流量。

3. 分布式系统

Redis 常用于在多个实例之间共享限流计数器。

限流表面看起来很简单，但一旦系统开始处理真实流量，就会迅速意识到它的重要性。

一个设计良好的限流器可以保护你的：

API
基础设施
数据库
用户

有时，系统稳定与故障之间的差距出奇地简单。

有时你的 API 只需要… 一把好伞 ☔

限流：如何防止你的 API 被请求淹没

Introduction

🌧️ 简单类比

🚫 为什么速率限制很重要

📊 常见限流策略

1. Token Bucket

2. Leaky Bucket

3. Sliding Window Log

4. Sliding Window Counter (Optimized Sliding Window)

🛠️ 简单的 Java 实现

1. 固定窗口（每个客户端每秒一次请求）

2. 令牌桶（容量 = 10）

3. 使用库 – Resilience4j

📚 要点

限流概述

1. API 网关

2. 应用层

3. 分布式系统

相关文章

设计 Uber：地理空间索引、WebSockets 与分布式锁

Meta 不再放弃 Jemalloc

在企业 People 页面上弃用成本中心集成

Nango（YC W23，API Access for Agents and Apps）招聘

Introduction

🌧️ 简单类比

🚫 为什么速率限制很重要

📊 常见限流策略

1. Token Bucket

2. Leaky Bucket

3. Sliding Window Log

4. Sliding Window Counter (Optimized Sliding Window)

🛠️ 简单的 Java 实现

1. 固定窗口（每个客户端每秒一次请求）

2. 令牌桶（容量 = 10）

3. 使用库 – Resilience4j

📚 要点

限流概述

1. API 网关

2. 应用层

3. 分布式系统

相关文章

设计 Uber：地理空间索引、WebSockets 与 分布式锁

Meta 不再放弃 Jemalloc

在企业 People 页面上弃用成本中心集成

Nango（YC W23，API Access for Agents and Apps）招聘

2. 令牌桶（容量 = 10）

设计 Uber：地理空间索引、WebSockets 与分布式锁