Rate Limiting Patterns
Algorithms
| Algorithm | Accuracy | Burst Handling | Best For |
|---|
| Token Bucket | High | Allows controlled bursts | API rate limiting, traffic shaping |
| Leaky Bucket |
High | Smooths bursts entirely | Steady-rate processing, queues |
|
Fixed Window | Low | Allows edge bursts (2x) | Simple use cases, prototyping |
|
Sliding Window Log | Very High | Precise control | Strict compliance, billing-critical |
|
Sliding Window Counter | High | Good approximation |
Production APIs — best tradeoff |
Fixed window problem: A user sends the full limit at 11:59 and again at 12:01, doubling the effective rate. Sliding window fixes this.
Token Bucket
Bucket holds tokens up to capacity. Tokens refill at a fixed rate. Each request consumes one.
CODEBLOCK0
Sliding Window Counter
Hybrid of fixed window and sliding window log — weights the previous window's count by overlap percentage:
CODEBLOCK1
Implementation Options
| Approach | Scope | Best For |
|---|
| In-memory | Single server | Zero latency, no dependencies |
Redis (INCR + EXPIRE) |
Distributed |
Multi-instance deployments |
|
API Gateway | Edge | No code, built-in dashboards |
|
Middleware | Per-service | Fine-grained per-user/endpoint control |
Use gateway-level limiting as outer defense + application-level for fine-grained control.
HTTP Headers
Always return rate limit info, even on successful requests:
CODEBLOCK2
| Header | When to Include |
|---|
| INLINECODE2 | Every response |
| INLINECODE3 |
Every response |
|
RateLimit-Reset | Every response |
|
Retry-After | 429 responses only |
429 Response Body
CODEBLOCK3
Never return 500 or 503 for rate limiting — 429 is the correct status code.
Rate Limit Tiers
Apply limits at multiple granularities:
| Scope | Key | Example Limit | Purpose |
|---|
| Per-IP | Client IP | 100 req/min | Abuse prevention |
| Per-User |
User ID | 1000 req/hr | Fair usage |
|
Per-API-Key | API key | 5000 req/hr | Service-to-service |
|
Per-Endpoint | Route + key | 60 req/min on
/search | Protect expensive ops |
Tiered pricing:
| Tier | Rate Limit | Burst | Cost |
|---|
| Free | 100 req/hr | 10 | $0 |
| Pro |
5,000 req/hr | 100 | $49/mo |
| Enterprise | 100,000 req/hr | 2,000 | Custom |
Evaluate from most specific to least specific: per-endpoint > per-user > per-IP.
Distributed Rate Limiting
Redis-based pattern for consistent limiting across instances:
CODEBLOCK4
Atomic Lua script (prevents race conditions):
CODEBLOCK5
Never do separate GET then SET — the gap allows overcount.
API Gateway Configuration
NGINX:
CODEBLOCK6
Kong:
CODEBLOCK7
Client-Side Handling
Clients must handle 429 gracefully:
CODEBLOCK8
- - Always respect
Retry-After when present - Use exponential backoff with jitter when absent
- Implement request queuing for batch operations
Monitoring
Track these metrics:
- - Rate limit hit rate — % of requests returning 429 (alert if >5% sustained)
- Near-limit warnings — requests where remaining < 10% of limit
- Top offenders — keys/IPs hitting limits most frequently
- Limit headroom — how close normal traffic is to the ceiling
- False positives — legitimate users being rate limited
Anti-Patterns
| Anti-Pattern | Fix |
|---|
| Application-only limiting | Always combine with infrastructure-level limits |
| No retry guidance |
Always include
Retry-After header on 429 |
|
Inconsistent limits | Same endpoint, same limits across services |
|
No burst allowance | Allow controlled bursts for legitimate traffic |
|
Silent dropping | Always return 429 so clients can distinguish from errors |
|
Global single counter | Per-endpoint counters to protect expensive operations |
|
Hard-coded limits | Use configuration, not code constants |
NEVER Do
- 1. NEVER rate limit health check endpoints — monitoring systems will false-alarm
- NEVER use client-supplied identifiers as sole rate limit key — trivially spoofed
- NEVER return
200 OK when rate limiting — clients must know they were throttled - NEVER set limits without measuring actual traffic first — you'll block legitimate users or set limits too high to matter
- NEVER share counters across unrelated tenants — noisy neighbor problem
- NEVER skip rate limiting on internal APIs — misbehaving internal services can take down shared infrastructure
- NEVER implement rate limiting without logging — you need visibility to tune limits and detect abuse
速率限制模式
算法
| 算法 | 准确性 | 突发处理 | 最佳适用场景 |
|---|
| 令牌桶 | 高 | 允许受控突发 | API速率限制、流量整形 |
| 漏桶 |
高 | 完全平滑突发 | 稳定速率处理、队列 |
|
固定窗口 | 低 | 允许边缘突发(2倍) | 简单用例、原型开发 |
|
滑动窗口日志 | 非常高 | 精确控制 | 严格合规、计费关键场景 |
|
滑动窗口计数器 | 高 | 良好近似 |
生产环境API — 最佳权衡 |
固定窗口问题: 用户在11:59发送完整限额请求,又在12:01再次发送,导致有效速率翻倍。滑动窗口可解决此问题。
令牌桶
桶内最多容纳一定数量的令牌。令牌以固定速率补充。每个请求消耗一个令牌。
python
class TokenBucket:
def init(self, capacity: int, refill_rate: float):
self.capacity = capacity
self.tokens = capacity
self.refillrate = refillrate # 每秒令牌数
self.last_refill = time.monotonic()
def allow(self) -> bool:
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
滑动窗口计数器
固定窗口和滑动窗口日志的混合方案——根据重叠百分比对前一个窗口的计数进行加权:
python
def slidingwindowallow(key: str, limit: int, window_sec: int) -> bool:
now = time.time()
currentwindow = int(now // windowsec)
positioninwindow = (now % windowsec) / windowsec
prevcount = getcount(key, current_window - 1)
currcount = getcount(key, current_window)
estimated = prevcount * (1 - positioninwindow) + currcount
if estimated >= limit:
return False
incrementcount(key, currentwindow)
return True
实现方案
| 方案 | 范围 | 最佳适用场景 |
|---|
| 内存中 | 单服务器 | 零延迟、无依赖 |
| Redis(INCR + EXPIRE) |
分布式 |
多实例部署 |
|
API网关 | 边缘 | 无需编码、内置仪表盘 |
|
中间件 | 按服务 | 细粒度按用户/端点控制 |
使用网关级限制作为外层防御 + 应用级限制实现细粒度控制。
HTTP头部
始终返回速率限制信息,即使请求成功:
RateLimit-Limit: 1000
RateLimit-Remaining: 742
RateLimit-Reset: 1625097600
Retry-After: 30
| 头部 | 何时包含 |
|---|
| RateLimit-Limit | 每次响应 |
| RateLimit-Remaining |
每次响应 |
| RateLimit-Reset | 每次响应 |
| Retry-After | 仅429响应 |
429响应体
json
{
error: {
code: ratelimitexceeded,
message: 超出速率限制。每小时最多1000次请求。,
retry_after: 30,
limit: 1000,
reset_at: 2025-07-01T12:00:00Z
}
}
切勿对速率限制返回500或503——429才是正确的状态码。
速率限制层级
在多个粒度上应用限制:
| 范围 | 键 | 示例限制 | 目的 |
|---|
| 按IP | 客户端IP | 100次请求/分钟 | 滥用防护 |
| 按用户 |
用户ID | 1000次请求/小时 | 公平使用 |
|
按API密钥 | API密钥 | 5000次请求/小时 | 服务间通信 |
|
按端点 | 路由+密钥 | 60次请求/分钟(/search) | 保护昂贵操作 |
分层定价:
| 层级 | 速率限制 | 突发 | 费用 |
|---|
| 免费 | 100次请求/小时 | 10 | $0 |
| 专业版 |
5,000次请求/小时 | 100 | $49/月 |
| 企业版 | 100,000次请求/小时 | 2,000 | 定制 |
从最具体到最不具体进行评估:按端点 > 按用户 > 按IP。
分布式速率限制
基于Redis的模式,实现跨实例的一致限制:
python
def redisratelimit(redis, key: str, limit: int, window: int) -> bool:
pipe = redis.pipeline()
now = time.time()
window_key = frl:{key}:{int(now // window)}
pipe.incr(window_key)
pipe.expire(window_key, window * 2)
results = pipe.execute()
return results[0] <= limit
原子Lua脚本(防止竞态条件):
lua
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current = redis.call(INCR, key)
if current == 1 then
redis.call(EXPIRE, key, window)
end
return current <= limit and 1 or 0
切勿先执行GET再执行SET——间隔会导致计数溢出。
API网关配置
NGINX:
nginx
http {
limitreqzone $binaryremoteaddr zone=api:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api burst=20 nodelay;
limitreqstatus 429;
}
}
}
Kong:
yaml
plugins:
- name: rate-limiting
config:
minute: 60
hour: 1000
policy: redis
redis_host: redis.internal
客户端处理
客户端必须优雅地处理429:
typescript
async function fetchWithRetry(url: string, maxRetries = 3): Promise {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const res = await fetch(url);
if (res.status !== 429) return res;
const retryAfter = res.headers.get(Retry-After);
const delay = retryAfter
? parseInt(retryAfter, 10) * 1000
: Math.min(1000 2 * attempt, 30000);
await new Promise(r => setTimeout(r, delay));
}
throw new Error(重试后仍超出速率限制);
}
- - 始终尊重Retry-After(若存在)
- 若不存在,使用带抖动的指数退避
- 对批量操作实现请求排队
监控
跟踪以下指标:
- - 速率限制命中率 — 返回429的请求百分比(持续超过5%则告警)
- 接近限制警告 — 剩余额度低于限制10%的请求
- 最高违规者 — 最频繁触发限制的键/IP
- 限制余量 — 正常流量距离上限的接近程度
- 误报 — 被速率限制的合法用户
反模式
| 反模式 | 修复方法 |
|---|
| 仅应用层限制 | 始终结合基础设施级限制 |
| 无重试指导 |
始终在429响应中包含Retry-After头部 |
|
限制不一致 | 同一端点、跨服务使用相同限制 |
|
无突发配额 | 允许合法流量受控突发 |
|
静默丢弃 | 始终返回429,以便客户端区分错误 |
|
全局单一计数器 | 按端点计数器保护昂贵操作 |
|
硬编码限制 | 使用配置,而非代码常量 |
绝对禁止
- 1. 绝对不要对健康检查端点进行速率限制 — 监控系统会误报
- 绝对不要仅使用客户端提供的标识符作为速率限制键 — 极易被伪造
- 速率限制时绝对不要返回200 OK — 客户端必须知道被限流
- 绝对不要未测量实际流量就设置限制 — 你会阻挡合法用户,或设置过高而无效
- 绝对不要跨不相关租户共享计数器 — 噪声邻居问题
- 绝对不要跳过内部API的速率限制 — 行为异常的内部服务可能拖垮