Rate Limiting Patterns

Algorithms

Algorithm	Accuracy	Burst Handling	Best For
Token Bucket	High	Allows controlled bursts	API rate limiting, traffic shaping
Leaky Bucket

Fixed window problem: A user sends the full limit at 11:59 and again at 12:01, doubling the effective rate. Sliding window fixes this.

Token Bucket

Bucket holds tokens up to capacity. Tokens refill at a fixed rate. Each request consumes one.

CODEBLOCK0

Sliding Window Counter

Hybrid of fixed window and sliding window log — weights the previous window's count by overlap percentage:

CODEBLOCK1

Implementation Options

Approach	Scope	Best For
In-memory	Single server	Zero latency, no dependencies
Redis (`INCR` + `EXPIRE`)

Use gateway-level limiting as outer defense + application-level for fine-grained control.

HTTP Headers

Always return rate limit info, even on successful requests:

CODEBLOCK2

Header	When to Include
INLINECODE2	Every response
INLINECODE3

429 Response Body

CODEBLOCK3

Never return 500 or 503 for rate limiting — 429 is the correct status code.

Rate Limit Tiers

Apply limits at multiple granularities:

Scope	Key	Example Limit	Purpose
Per-IP	Client IP	100 req/min	Abuse prevention
Per-User

Tiered pricing:

Tier	Rate Limit	Burst	Cost
Free	100 req/hr	10	$0
Pro

5,000 req/hr | 100 | $49/mo |
| Enterprise | 100,000 req/hr | 2,000 | Custom |

Evaluate from most specific to least specific: per-endpoint > per-user > per-IP.

Distributed Rate Limiting

Redis-based pattern for consistent limiting across instances:

CODEBLOCK4

Atomic Lua script (prevents race conditions):

CODEBLOCK5

Never do separate GET then SET — the gap allows overcount.

API Gateway Configuration

NGINX:

CODEBLOCK6

Kong:

CODEBLOCK7

Client-Side Handling

Clients must handle 429 gracefully:

CODEBLOCK8

- Always respect Retry-After when present
Use exponential backoff with jitter when absent
Implement request queuing for batch operations

Monitoring

Track these metrics:

- Rate limit hit rate — % of requests returning 429 (alert if >5% sustained)
Near-limit warnings — requests where remaining < 10% of limit
Top offenders — keys/IPs hitting limits most frequently
Limit headroom — how close normal traffic is to the ceiling
False positives — legitimate users being rate limited

Anti-Patterns

Anti-Pattern	Fix
Application-only limiting	Always combine with infrastructure-level limits
No retry guidance

NEVER Do

1. NEVER rate limit health check endpoints — monitoring systems will false-alarm
NEVER use client-supplied identifiers as sole rate limit key — trivially spoofed
NEVER return 200 OK when rate limiting — clients must know they were throttled
NEVER set limits without measuring actual traffic first — you'll block legitimate users or set limits too high to matter
NEVER share counters across unrelated tenants — noisy neighbor problem
NEVER skip rate limiting on internal APIs — misbehaving internal services can take down shared infrastructure
NEVER implement rate limiting without logging — you need visibility to tune limits and detect abuse

速率限制模式

算法

算法	准确性	突发处理	最佳适用场景
令牌桶	高	允许受控突发	API速率限制、流量整形
漏桶

固定窗口问题： 用户在11:59发送完整限额请求，又在12:01再次发送，导致有效速率翻倍。滑动窗口可解决此问题。

令牌桶

桶内最多容纳一定数量的令牌。令牌以固定速率补充。每个请求消耗一个令牌。

python
class TokenBucket:
def init(self, capacity: int, refill_rate: float):
self.capacity = capacity
self.tokens = capacity
self.refillrate = refillrate # 每秒令牌数
self.last_refill = time.monotonic()

def allow(self) -> bool:
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False

滑动窗口计数器

固定窗口和滑动窗口日志的混合方案——根据重叠百分比对前一个窗口的计数进行加权：

python
def slidingwindowallow(key: str, limit: int, window_sec: int) -> bool:
now = time.time()
currentwindow = int(now // windowsec)
positioninwindow = (now % windowsec) / windowsec

prevcount = getcount(key, current_window - 1)
currcount = getcount(key, current_window)

estimated = prevcount * (1 - positioninwindow) + currcount
if estimated >= limit:
return False
incrementcount(key, currentwindow)
return True

实现方案

方案	范围	最佳适用场景
内存中	单服务器	零延迟、无依赖
Redis（INCR + EXPIRE）

分布式 | 多实例部署 |
| API网关 | 边缘 | 无需编码、内置仪表盘 |
| 中间件 | 按服务 | 细粒度按用户/端点控制 |

使用网关级限制作为外层防御 + 应用级限制实现细粒度控制。

HTTP头部

始终返回速率限制信息，即使请求成功：

RateLimit-Limit: 1000
RateLimit-Remaining: 742
RateLimit-Reset: 1625097600
Retry-After: 30

头部	何时包含
RateLimit-Limit	每次响应
RateLimit-Remaining

429响应体

json
{
error: {
code: ratelimitexceeded,
message: 超出速率限制。每小时最多1000次请求。,
retry_after: 30,
limit: 1000,
reset_at: 2025-07-01T12:00:00Z
}
}

切勿对速率限制返回500或503——429才是正确的状态码。

速率限制层级

在多个粒度上应用限制：

范围	键	示例限制	目的
按IP	客户端IP	100次请求/分钟	滥用防护
按用户

分层定价：

层级	速率限制	突发	费用
免费	100次请求/小时	10	$0
专业版

5,000次请求/小时 | 100 | $49/月 |
| 企业版 | 100,000次请求/小时 | 2,000 | 定制 |

从最具体到最不具体进行评估：按端点 > 按用户 > 按IP。

分布式速率限制

基于Redis的模式，实现跨实例的一致限制：

python
def redisratelimit(redis, key: str, limit: int, window: int) -> bool:
pipe = redis.pipeline()
now = time.time()
window_key = frl:{key}:{int(now // window)}
pipe.incr(window_key)
pipe.expire(window_key, window * 2)
results = pipe.execute()
return results[0] <= limit

原子Lua脚本（防止竞态条件）：

lua
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current = redis.call(INCR, key)
if current == 1 then
redis.call(EXPIRE, key, window)
end
return current <= limit and 1 or 0

切勿先执行GET再执行SET——间隔会导致计数溢出。

API网关配置

NGINX：

nginx
http {
limitreqzone $binaryremoteaddr zone=api:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api burst=20 nodelay;
limitreqstatus 429;
}
}
}

Kong：

yaml
plugins:
- name: rate-limiting
config:
minute: 60
hour: 1000
policy: redis
redis_host: redis.internal

客户端处理

客户端必须优雅地处理429：

typescript
async function fetchWithRetry(url: string, maxRetries = 3): Promise {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const res = await fetch(url);
if (res.status !== 429) return res;

const retryAfter = res.headers.get(Retry-After);
const delay = retryAfter
? parseInt(retryAfter, 10) * 1000
: Math.min(1000 2 * attempt, 30000);
await new Promise(r => setTimeout(r, delay));
}
throw new Error(重试后仍超出速率限制);
}

- 始终尊重Retry-After（若存在）
若不存在，使用带抖动的指数退避
对批量操作实现请求排队

监控

跟踪以下指标：

- 速率限制命中率 — 返回429的请求百分比（持续超过5%则告警）
接近限制警告 — 剩余额度低于限制10%的请求
最高违规者 — 最频繁触发限制的键/IP
限制余量 — 正常流量距离上限的接近程度
误报 — 被速率限制的合法用户

反模式

反模式	修复方法
仅应用层限制	始终结合基础设施级限制
无重试指导

绝对禁止

1. 绝对不要对健康检查端点进行速率限制 — 监控系统会误报
绝对不要仅使用客户端提供的标识符作为速率限制键 — 极易被伪造
速率限制时绝对不要返回200 OK — 客户端必须知道被限流
绝对不要未测量实际流量就设置限制 — 你会阻挡合法用户，或设置过高而无效
绝对不要跨不相关租户共享计数器 — 噪声邻居问题
绝对不要跳过内部API的速率限制 — 行为异常的内部服务可能拖垮

rate-limiting速率限制

rate-limiting

Rate Limiting Patterns

Algorithms

Token Bucket

Sliding Window Counter

Implementation Options

HTTP Headers

429 Response Body

Rate Limit Tiers

Distributed Rate Limiting

API Gateway Configuration

Client-Side Handling

Monitoring

Anti-Patterns

NEVER Do

速率限制模式

算法

令牌桶

滑动窗口计数器

实现方案

HTTP头部

429响应体

速率限制层级

分布式速率限制

API网关配置

客户端处理

监控

反模式

绝对禁止

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

rate-limiting速率限制

rate-limiting

Rate Limiting Patterns

Algorithms

Token Bucket

Sliding Window Counter

Implementation Options

HTTP Headers

429 Response Body

Rate Limit Tiers

Distributed Rate Limiting

API Gateway Configuration

Client-Side Handling

Monitoring

Anti-Patterns

NEVER Do

速率限制模式

算法

令牌桶

滑动窗口计数器

实现方案

HTTP头部

429响应体

速率限制层级

分布式速率限制

API网关配置

客户端处理

监控

反模式

绝对禁止

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement