Rate Limiting (Deep Workflow)
Rate limits balance fairness, availability, and abuse prevention. Design explicitly: who is throttled, what resource is limited, and how clients should back off.
When to Offer This Workflow
Trigger conditions:
- - Protecting public APIs, auth endpoints, or expensive operations
- Multi-tenant “noisy neighbor” isolation
- Retry storms after incidents causing cascading 429/502
Initial offer:
Use six stages: (1) threat & fairness model, (2) dimensions & keys, (3) algorithms & config, (4) distributed enforcement, (5) client protocol & UX, (6) observability & tuning). Confirm enforcement layer (API gateway vs app middleware vs edge).
Stage 1: Threat & Fairness Model
Goal: Distinguish legitimate bursts (batch jobs, mobile retries) from abuse; align limits with product tiers and SLAs.
Exit condition: Written policy: free vs paid limits, partner caps, burst allowances.
Stage 2: Dimensions & Keys
Goal: Choose stable limit keys: authenticated user id > API key > IP (with shared-NAT caveats).
Practices
- - Per-tenant and global limits; separate expensive routes (exports, search)
Stage 3: Algorithms & Config
Goal: Token bucket / leaky bucket for smooth bursts; sliding window for strict per-minute caps; consider concurrency limits separately from request rate.
Stage 4: Distributed Enforcement
Goal: Central store (Redis, etc.) with atomic increments; handle multi-region (sticky routing vs shared counters); mind clock skew.
Stage 5: Client Protocol & UX
Goal: Consistent 429 responses with Retry-After; document exponential backoff + jitter; optional X-RateLimit-* headers for transparency.
Stage 6: Observability & Tuning
Goal: Metrics on throttles by route and actor class; alerts on abnormal deny spikes (attack vs misconfigured client).
Final Review Checklist
- - [ ] Policy matches tiers and fairness goals
- [ ] Limit keys stable and hard to spoof
- [ ] Algorithm matches burst vs sustained semantics
- [ ] Distributed correctness considered
- [ ] Client-facing 429 behavior documented
- [ ] Metrics and tuning loop defined
Tips for Effective Guidance
- - Coordinate with authentication—anonymous IP limits are coarse.
- Don’t throttle health checks in ways that break monitors.
- GraphQL: consider query cost / depth limits, not only HTTP count.
- WebSockets: separate connection caps from message rate limits.
Handling Deviations
- - Edge/CDN: limits may differ from origin—document both layers.
技能名称:速率限制
详细描述:
速率限制(深度工作流)
速率限制平衡了公平性、可用性和滥用预防。需明确设计:谁被限制、什么资源受限,以及客户端应如何回退。
何时提供此工作流
触发条件:
- - 保护公共API、认证端点或高成本操作
- 多租户嘈杂邻居隔离
- 事件后重试风暴导致级联429/502错误
初始提供:
使用六个阶段:(1) 威胁与公平模型,(2) 维度与键,(3) 算法与配置,(4) 分布式执行,(5) 客户端协议与用户体验,(6) 可观测性与调优。确认执行层(API网关 vs 应用中间件 vs 边缘节点)。
阶段1:威胁与公平模型
目标: 区分合法突发流量(批处理作业、移动端重试)与滥用行为;根据产品层级和服务等级协议设定限制。
退出条件: 书面策略:免费与付费限制、合作伙伴上限、突发流量配额。
阶段2:维度与键
目标: 选择稳定的限制键:认证用户ID > API密钥 > IP(注意共享NAT场景)。
实践
- - 按租户和全局限制;区分高成本路由(导出、搜索)
阶段3:算法与配置
目标: 令牌桶/漏桶算法处理平滑突发;滑动窗口实现严格每分钟上限;考虑并发限制与请求速率分开处理。
阶段4:分布式执行
目标: 使用中央存储(Redis等)进行原子递增;处理多区域场景(粘性路由 vs 共享计数器);注意时钟偏差。
阶段5:客户端协议与用户体验
目标: 一致的429响应并包含Retry-After;记录指数退避+抖动策略;可选X-RateLimit-*头部实现透明化。
阶段6:可观测性与调优
目标: 按路由和行为者类别统计限制指标;对异常拒绝峰值(攻击 vs 客户端配置错误)设置告警。
最终审查清单
- - [ ] 策略匹配层级和公平性目标
- [ ] 限制键稳定且难以伪造
- [ ] 算法匹配突发与持续语义
- [ ] 考虑分布式正确性
- [ ] 记录面向客户端的429行为
- [ ] 定义指标与调优循环
有效指导技巧
- - 与认证机制协调——匿名IP限制较为粗糙。
- 不要以破坏监控的方式限制健康检查。
- GraphQL:考虑查询成本/深度限制,而不仅是HTTP请求数。
- WebSocket:将连接上限与消息速率限制分开处理。
处理偏差
- - 边缘/CDN: 限制可能与源站不同——需记录两层限制。