API Failover
Create or improve a lightweight failover layer for AI APIs.
Goals
Build systems that:
- - detect unavailable or degraded providers/models
- classify failures before retrying blindly
- switch to a safe fallback chain
- avoid hammering broken endpoints
- recover back to preferred providers after cooldown
Workflow
- 1. Identify the call path.
- Classify failure modes.
- Define a fallback policy.
- Add health memory.
- Implement guarded retries.
- Emit observable logs.
- Validate with forced-failure tests.
Use the detailed rules below and the bundled scripts instead of re-inventing routing logic each time.
Practical defaults
Error classes
Use these normalized categories:
- - INLINECODE0
- INLINECODE1
- INLINECODE2
- INLINECODE3
- INLINECODE4
- INLINECODE5
- INLINECODE6
- INLINECODE7
- INLINECODE8
Suggested routing behavior
- -
AUTH_ERROR, BAD_REQUEST: fail fast; do not retry other providers unless config explicitly maps to another credential set. - INLINECODE11 : short backoff, then fallback.
- INLINECODE12 ,
SERVER_ERROR, NETWORK_ERROR, MODEL_UNAVAILABLE, UNKNOWN_TRANSIENT: retry briefly, then fallback. - INLINECODE17 : mark provider unavailable for a longer cooldown and fallback immediately.
Circuit breaker defaults
Start with:
- - open after
3 consecutive transient failures - cooldown INLINECODE19
- half-open with
1 probe - close after
1-2 successful probes
Configuration pattern
Keep policy in config, not hard-coded logic.
Recommended shape:
- - provider registry
- task profiles with ordered fallback chains
- retry policy
- circuit-breaker policy
- per-provider overrides
Design guidance
- - Prefer fewer, well-understood providers over large fallback chains.
- Keep the fallback chain semantically compatible when possible.
- Separate "best quality" from "must return something" behavior.
- Keep downgrade rules explicit; avoid silent huge capability drops for critical tasks.
- For tool-using agents, treat provider switching as a reliability event and report it when user-visible quality may change.
Semi-automatic deployment model
Use this skill to discover the environment, generate a production-ish config, run a local HTTP failover proxy, and verify health.
Do not claim full autonomous takeover unless the environment-specific integration is actually completed.
References
Read these only when needed:
- -
references/config-example.yaml for a compact policy example - INLINECODE23 for a more practical multi-provider template
- INLINECODE24 for a ready-to-edit production template
- INLINECODE25 for failure-injection and validation cases
- INLINECODE26 for local proxy deployment and environment-variable setup
- INLINECODE27 for a user-systemd service example
Bundled scripts
scripts/discover_env.py
Inspect the current environment.
scripts/generate_config.py
Generate a production-ish YAML config from simple defaults.
scripts/failover_proxy.py
Run a minimal CLI failover call path.
scripts/http_proxy.py
Expose a single local OpenAI-compatible entrypoint.
Endpoints:
- - INLINECODE32
- INLINECODE33
Optional request header:
scripts/selfcheck.py
Validate that the local proxy is reachable and can process a minimal chat request.
scripts/bootstrap_failover.py
Run the semi-automatic bootstrap flow:
- - discover environment
- generate config
- optionally start the proxy
- run self-check
- print next actions
Example:
CODEBLOCK0
Keep these scripts small and inspectable. Extend them instead of turning SKILL.md into code-heavy instructions.
API 故障转移
为AI API创建或改进轻量级故障转移层。
目标
构建能够实现以下功能的系统:
- - 检测不可用或性能降级的提供商/模型
- 在盲目重试前对故障进行分类
- 切换到安全的回退链
- 避免反复请求故障端点
- 冷却期后恢复至首选提供商
工作流程
- 1. 识别调用路径。
- 对故障模式进行分类。
- 定义回退策略。
- 添加健康状态记忆。
- 实现有防护的重试机制。
- 输出可观测日志。
- 通过强制故障测试进行验证。
使用以下详细规则和捆绑脚本,无需每次重新发明路由逻辑。
实用默认配置
错误分类
使用以下标准化类别:
- - AUTHERROR(认证错误)
- BADREQUEST(错误请求)
- RATELIMIT(速率限制)
- TIMEOUT(超时)
- SERVERERROR(服务器错误)
- NETWORKERROR(网络错误)
- MODELUNAVAILABLE(模型不可用)
- QUOTAEXCEEDED(配额超限)
- UNKNOWNTRANSIENT(未知临时错误)
建议路由行为
- - AUTHERROR、BADREQUEST:快速失败;除非配置明确映射到其他凭证集,否则不重试其他提供商。
- RATELIMIT:短暂退避,然后回退。
- TIMEOUT、SERVERERROR、NETWORKERROR、MODELUNAVAILABLE、UNKNOWNTRANSIENT:短暂重试,然后回退。
- QUOTAEXCEEDED:将提供商标记为长时间不可用,并立即回退。
断路器默认配置
初始设置:
- - 连续3次临时故障后开启
- 冷却时间60-180秒
- 半开状态使用1次探测
- 1-2次成功探测后关闭
配置模式
将策略保留在配置中,而非硬编码逻辑。
推荐结构:
- - 提供商注册表
- 带有有序回退链的任务配置文件
- 重试策略
- 断路器策略
- 按提供商的覆盖设置
设计指导
- - 优先选择少数且充分了解的提供商,而非冗长的回退链。
- 尽可能保持回退链的语义兼容性。
- 区分最佳质量与必须返回结果的行为。
- 明确降级规则;避免关键任务中无声的巨大能力下降。
- 对于使用工具的代理,将提供商切换视为可靠性事件,并在用户可见质量可能发生变化时进行报告。
半自动部署模型
使用此技能发现环境、生成类似生产的配置、运行本地HTTP故障转移代理,并验证健康状态。
除非实际完成特定环境的集成,否则不要声称完全自主接管。
参考资料
仅在需要时阅读以下内容:
- - references/config-example.yaml:简洁策略示例
- references/config-realworld-example.yaml:更实用的多提供商模板
- references/config-production.yaml:可直接编辑的生产模板
- references/test-scenarios.md:故障注入和验证案例
- references/realworld-notes.md:本地代理部署和环境变量设置
- references/api-failover.service:用户systemd服务示例
捆绑脚本
scripts/discover_env.py
检查当前环境。
scripts/generate_config.py
从简单默认值生成类似生产的YAML配置。
scripts/failover_proxy.py
运行最小化CLI故障转移调用路径。
scripts/http_proxy.py
暴露单个本地OpenAI兼容入口点。
端点:
- - POST /v1/chat/completions
- GET /health
可选请求头:
- - X-Failover-Profile: cheap|default|critical|local-first
scripts/selfcheck.py
验证本地代理是否可达并能处理最小聊天请求。
scripts/bootstrap_failover.py
运行半自动引导流程:
- - 发现环境
- 生成配置
- 可选启动代理
- 运行自检
- 打印后续操作
示例:
bash
python3 scripts/bootstrap_failover.py \
--default-model custom-ai-td-ee/gpt-5.4 \
--start-proxy
保持这些脚本小巧且可审查。扩展它们,而不是将SKILL.md变成代码繁重的说明。