API Failover

Create or improve a lightweight failover layer for AI APIs.

Goals

Build systems that:

- detect unavailable or degraded providers/models
classify failures before retrying blindly
switch to a safe fallback chain
avoid hammering broken endpoints
recover back to preferred providers after cooldown

Workflow

1. Identify the call path.
Classify failure modes.
Define a fallback policy.
Add health memory.
Implement guarded retries.
Emit observable logs.
Validate with forced-failure tests.

Use the detailed rules below and the bundled scripts instead of re-inventing routing logic each time.

Practical defaults

Error classes

Use these normalized categories:

- INLINECODE0
INLINECODE1
INLINECODE2
INLINECODE3
INLINECODE4
INLINECODE5
INLINECODE6
INLINECODE7
INLINECODE8

Suggested routing behavior

- AUTH_ERROR, BAD_REQUEST: fail fast; do not retry other providers unless config explicitly maps to another credential set.
INLINECODE11: short backoff, then fallback.
INLINECODE12, SERVER_ERROR, NETWORK_ERROR, MODEL_UNAVAILABLE, UNKNOWN_TRANSIENT: retry briefly, then fallback.
INLINECODE17: mark provider unavailable for a longer cooldown and fallback immediately.

Circuit breaker defaults

Start with:

- open after 3 consecutive transient failures
cooldown INLINECODE19
half-open with 1 probe
close after 1-2 successful probes

Configuration pattern

Keep policy in config, not hard-coded logic.

Recommended shape:

- provider registry
task profiles with ordered fallback chains
retry policy
circuit-breaker policy
per-provider overrides

Design guidance

- Prefer fewer, well-understood providers over large fallback chains.
Keep the fallback chain semantically compatible when possible.
Separate "best quality" from "must return something" behavior.
Keep downgrade rules explicit; avoid silent huge capability drops for critical tasks.
For tool-using agents, treat provider switching as a reliability event and report it when user-visible quality may change.

Semi-automatic deployment model

Use this skill to discover the environment, generate a production-ish config, run a local HTTP failover proxy, and verify health.

Do not claim full autonomous takeover unless the environment-specific integration is actually completed.

References

Read these only when needed:

- references/config-example.yaml for a compact policy example
INLINECODE23 for a more practical multi-provider template
INLINECODE24 for a ready-to-edit production template
INLINECODE25 for failure-injection and validation cases
INLINECODE26 for local proxy deployment and environment-variable setup
INLINECODE27 for a user-systemd service example

Bundled scripts

`scripts/discover_env.py`

Inspect the current environment.

`scripts/generate_config.py`

Generate a production-ish YAML config from simple defaults.

`scripts/failover_proxy.py`

Run a minimal CLI failover call path.

`scripts/http_proxy.py`

Expose a single local OpenAI-compatible entrypoint.

Endpoints:

- INLINECODE32
INLINECODE33

Optional request header:

- INLINECODE34

`scripts/selfcheck.py`

Validate that the local proxy is reachable and can process a minimal chat request.

`scripts/bootstrap_failover.py`

Run the semi-automatic bootstrap flow:

- discover environment
generate config
optionally start the proxy
run self-check
print next actions

Example:

CODEBLOCK0

Keep these scripts small and inspectable. Extend them instead of turning SKILL.md into code-heavy instructions.

API 故障转移

为AI API创建或改进轻量级故障转移层。

目标

构建能够实现以下功能的系统：

- 检测不可用或性能降级的提供商/模型
在盲目重试前对故障进行分类
切换到安全的回退链
避免反复请求故障端点
冷却期后恢复至首选提供商

工作流程

1. 识别调用路径。
对故障模式进行分类。
定义回退策略。
添加健康状态记忆。
实现有防护的重试机制。
输出可观测日志。
通过强制故障测试进行验证。

使用以下详细规则和捆绑脚本，无需每次重新发明路由逻辑。

实用默认配置

错误分类

使用以下标准化类别：

- AUTHERROR（认证错误）
BADREQUEST（错误请求）
RATELIMIT（速率限制）
TIMEOUT（超时）
SERVERERROR（服务器错误）
NETWORKERROR（网络错误）
MODELUNAVAILABLE（模型不可用）
QUOTAEXCEEDED（配额超限）
UNKNOWNTRANSIENT（未知临时错误）

建议路由行为

- AUTHERROR、BADREQUEST：快速失败；除非配置明确映射到其他凭证集，否则不重试其他提供商。
RATELIMIT：短暂退避，然后回退。
TIMEOUT、SERVERERROR、NETWORKERROR、MODELUNAVAILABLE、UNKNOWNTRANSIENT：短暂重试，然后回退。
QUOTAEXCEEDED：将提供商标记为长时间不可用，并立即回退。

断路器默认配置

初始设置：

- 连续3次临时故障后开启
冷却时间60-180秒
半开状态使用1次探测
1-2次成功探测后关闭

配置模式

将策略保留在配置中，而非硬编码逻辑。

推荐结构：

- 提供商注册表
带有有序回退链的任务配置文件
重试策略
断路器策略
按提供商的覆盖设置

设计指导

- 优先选择少数且充分了解的提供商，而非冗长的回退链。
尽可能保持回退链的语义兼容性。
区分最佳质量与必须返回结果的行为。
明确降级规则；避免关键任务中无声的巨大能力下降。
对于使用工具的代理，将提供商切换视为可靠性事件，并在用户可见质量可能发生变化时进行报告。

半自动部署模型

使用此技能发现环境、生成类似生产的配置、运行本地HTTP故障转移代理，并验证健康状态。

除非实际完成特定环境的集成，否则不要声称完全自主接管。

参考资料

仅在需要时阅读以下内容：

- references/config-example.yaml：简洁策略示例
references/config-realworld-example.yaml：更实用的多提供商模板
references/config-production.yaml：可直接编辑的生产模板
references/test-scenarios.md：故障注入和验证案例
references/realworld-notes.md：本地代理部署和环境变量设置
references/api-failover.service：用户systemd服务示例

捆绑脚本

scripts/discover_env.py

检查当前环境。

scripts/generate_config.py

从简单默认值生成类似生产的YAML配置。

scripts/failover_proxy.py

运行最小化CLI故障转移调用路径。

scripts/http_proxy.py

暴露单个本地OpenAI兼容入口点。

端点：

- POST /v1/chat/completions
GET /health

可选请求头：

- X-Failover-Profile: cheap|default|critical|local-first

scripts/selfcheck.py

验证本地代理是否可达并能处理最小聊天请求。

scripts/bootstrap_failover.py

运行半自动引导流程：

- 发现环境
生成配置
可选启动代理
运行自检
打印后续操作

示例：

bash
python3 scripts/bootstrap_failover.py \
--default-model custom-ai-td-ee/gpt-5.4 \
--start-proxy

保持这些脚本小巧且可审查。扩展它们，而不是将SKILL.md变成代码繁重的说明。

api-failoverAPI故障转移

api-failover