paperclip-resilience
Production-grade resilience for AI agents running on Paperclip, orchestrated through OpenClaw.
The Problem
Paperclip agents die silently when providers hit rate limits, sessions crash on gateway restarts, and failed runs leave agents stuck in error state with no recovery path. If you're running agents overnight or in parallel, you need automated recovery — not manual babysitting.
What's Included
| Module | File | Purpose |
|---|
| Spawn with Fallback | INLINECODE1 | Wraps openclaw session spawn with automatic provider failover. If your primary model 429s, it tries the configured fallback. |
| Model Rotation |
src/model-rotation.js | Tracks fix attempts per PR/task and rotates through models + thinking levels after repeated failures. |
|
Run Recovery |
src/run-recovery.js | Detects failed Paperclip heartbeat runs (gateway errors, timeouts, 429s) and re-invokes agents with model fallback. |
|
Blocker Routing |
src/blocker-routing.js | Scans agent session transcripts for blocked/stuck signals and routes them to configurable destinations (file, stdout, webhook). |
|
Task Injection |
src/task-injection.js | Enriches spawn task descriptions with issue tracking metadata, PR requirements, and UX design checklists before agent execution. |
Quick Start
1. Install
CODEBLOCK0
2. Configure
CODEBLOCK1
3. Use Spawn with Fallback
CODEBLOCK2
CODEBLOCK3
4. Set Up Run Recovery (Cron)
Add to your OpenClaw cron schedule to auto-recover failed runs:
CODEBLOCK4
Once verified, schedule it:
CODEBLOCK5
5. Model Rotation for PR Fixes
CODEBLOCK6
Configuration
All modules read from config.json in the skill directory, with sensible defaults if no config is provided.
See config.example.json for the full documented schema, and config.schema.json for validation.
Key Configuration Sections
aliases — Map short model names to full provider/model strings:
CODEBLOCK7
fallbacks — Define provider failover pairs:
CODEBLOCK8
failurePatterns — Regex patterns that trigger fallback:
CODEBLOCK9
Architecture
CODEBLOCK10
Requirements
- - OpenClaw (for session spawning and agent management)
- Paperclip (for heartbeat run monitoring and agent lifecycle)
- Node.js 18+
- At least two LLM provider API keys configured (for fallback to work)
Security
This skill was security-reviewed for ClawHub publication in SUP-453. The code paths that accept user-controlled input now enforce validation up front and fail closed.
Hardened Surfaces
| Surface | Protection |
|---|
| Model names | Character allowlist with support for provider suffixes like :free; rejects empty path segments and . / .. traversal segments |
Task files (@file) |
Blocks explicit
../, canonicalizes symlinks with
realpath, rejects system paths like
/etc/ and
/usr/, requires a regular file |
|
Task payloads | 1MB max size limit for inline and file-backed task content |
|
Spawn mode + labels | Allowlist validation for mode (
run,
session) and safe-character validation for labels |
|
Failure regex config | Caps pattern count/length and drops invalid regexes to reduce ReDoS risk |
|
Paperclip issue metadata | Sanitizes API strings, constrains issue identifier extraction, normalizes priority values |
Security Boundaries
- - Process execution: uses
execFile, not shell execution - Dynamic code execution: none (
eval / Function not used) - Credentials: read from environment or external auth files; not embedded in the skill
- File access: limited to explicitly requested files, with traversal and symlink tunnel protections
- Dependencies: zero external runtime dependencies in this package
Verification
CODEBLOCK11
Audit Record
- - Last audit: 2026-03-27
- Tracking issue: SUP-453
- Status: ✅ Approved for ClawHub publication
- Details: see SECURITY-AUDIT-REPORT.md
Related Paperclip Issues
These are the upstream gaps this skill works around:
- - #276 — Auto-requeue agent on failure
- #1845 — No crash-recovery wakeup after restart
- #1861 — Agent death on 429 with no model fallback
License
MIT
paperclip-resilience
为运行在 Paperclip 上的 AI 智能体提供生产级弹性,通过 OpenClaw 进行编排。
问题
当提供商达到速率限制时,Paperclip 智能体会静默死亡;会话在网关重启时崩溃;失败的运行会使智能体陷入 error 状态且无法恢复。如果你需要让智能体通宵运行或并行运行,你需要的是自动恢复——而不是手动看护。
包含的模块
| 模块 | 文件 | 用途 |
|---|
| 带回退的生成 | src/spawn-with-fallback.js | 封装 openclaw session spawn,自动进行提供商故障转移。如果主模型返回 429,则尝试配置的回退模型。 |
| 模型轮换 |
src/model-rotation.js | 跟踪每个 PR/任务的修复尝试次数,在重复失败后轮换模型及思考级别。 |
|
运行恢复 | src/run-recovery.js | 检测失败的 Paperclip 心跳运行(网关错误、超时、429),并通过模型回退重新调用智能体。 |
|
阻塞路由 | src/blocker-routing.js | 扫描智能体会话记录中的阻塞/卡住信号,并将其路由到可配置的目标(文件、标准输出、Webhook)。 |
|
任务注入 | src/task-injection.js | 在智能体执行前,用问题跟踪元数据、PR 要求和 UX 设计检查清单丰富生成任务的描述。 |
快速开始
1. 安装
bash
clawhub install paperclip-resilience
2. 配置
bash
cd skills/paperclip-resilience
cp config.example.json config.json
用你的模型别名和回退对编辑 config.json
3. 使用带回退的生成
bash
命令行
node skills/paperclip-resilience/src/spawn-with-fallback.js \
--model sonnet --task 修复登录错误 --mode run
空运行,查看将会发生什么
node skills/paperclip-resilience/src/spawn-with-fallback.js \
--model opus --task 重构认证 --dry-run
javascript
// 编程方式
const { spawnWithFallback, loadConfig } = require(./skills/paperclip-resilience/src/spawn-with-fallback);
const config = loadConfig(./my-config.json);
const result = await spawnWithFallback({ model: sonnet, task: 修复错误, config });
4. 设置运行恢复(定时任务)
添加到你的 OpenClaw 定时任务计划中,以自动恢复失败的运行:
bash
node skills/paperclip-resilience/src/run-recovery.js --dry-run --verbose
验证后,将其加入计划:
/15 * node skills/paperclip-resilience/src/run-recovery.js
5. 用于 PR 修复的模型轮换
bash
检查 PR 是否需要模型轮换
node skills/paperclip-resilience/src/model-rotation.js check --pr 42 --repo owner/repo
记录一次尝试
node skills/paperclip-resilience/src/model-rotation.js record --pr 42 --repo owner/repo --model anthropic/claude-sonnet-4-6
配置
所有模块都从技能目录中的 config.json 读取配置,如果未提供配置,则使用合理的默认值。
请参阅 config.example.json 获取完整的文档化模式,以及 config.schema.json 进行验证。
关键配置部分
aliases — 将简短模型名称映射到完整的提供商/模型字符串:
json
{
aliases: {
sonnet: anthropic/claude-sonnet-4-6,
opus: anthropic/claude-opus-4-6,
codex: openai-codex/gpt-5.3-codex
}
}
fallbacks — 定义提供商故障转移对:
json
{
fallbacks: {
anthropic/claude-sonnet-4-6: openai-codex/gpt-5.3-codex,
openai-codex/gpt-5.3-codex: anthropic/claude-sonnet-4-6
}
}
failurePatterns — 触发回退的正则表达式模式:
json
{
failurePatterns: {
patterns: [credits, quota, 402, rate[\\s_-]?limit]
}
}
架构
┌──────────────────┐ ┌──────────────────┐
│ 任务注入 │────▶│ 带回退的生成 │
│ (丰富任务) │ │ (提供商重试) │
└──────────────────┘ └────────┬───────────┘
│
▼
┌──────────────────────┐
│ Paperclip 智能体 │
│ (心跳运行) │
└──────────┬───────────┘
│
┌──────────┴───────────┐
│ │
▼ ▼
┌────────────────┐ ┌──────────────────┐
│ 运行恢复 │ │ 阻塞路由 │
│ (检测 + 唤醒) │ │ (升级卡住情况) │
└────────────────┘ └──────────────────┘
│
▼
┌────────────────┐
│ 模型轮换 │
│ (升级模型) │
└────────────────┘
要求
- - OpenClaw(用于会话生成和智能体管理)
- Paperclip(用于心跳运行监控和智能体生命周期)
- Node.js 18+
- 至少配置两个 LLM 提供商的 API 密钥(以便回退功能正常工作)
安全性
此技能已为 ClawHub 发布进行了安全审查,编号为 SUP-453。接受用户输入的代码路径现在会预先执行验证,并在失败时安全关闭。
加固的表面
| 表面 | 保护措施 |
|---|
| 模型名称 | 字符允许列表,支持 :free 等提供商后缀;拒绝空路径段和 . / .. 遍历段 |
| 任务文件(@file) |
阻止显式的 ../,使用 realpath 规范化符号链接,拒绝 /etc/ 和 /usr/ 等系统路径,要求是常规文件 |
|
任务负载 | 内联和文件支持的任务内容大小限制为 1MB |
|
生成模式 + 标签 | 对模式(run、session)进行允许列表验证,对标签进行安全字符验证 |
|
失败正则配置 | 限制模式数量/长度,丢弃无效正则表达式以降低 ReDoS 风险 |
|
Paperclip 问题元数据 | 清理 API 字符串,约束问题标识符提取,规范化优先级值 |
安全边界
- - 进程执行:使用 execFile,而非 shell 执行
- 动态代码执行:无(未使用 eval / Function)
- 凭据:从环境变量或外部认证文件读取;不嵌入技能中
- 文件访问:仅限于显式请求的文件,具有遍历和符号链接隧道保护
- 依赖项:此包中零外部运行时依赖项
验证
bash
功能覆盖
node skills/paperclip-resilience/tests/test-spawn-with-fallback.js
完整安全套件
node skills/paperclip-resilience/tests/test-security.js
快速冒烟测试
node skills/paperclip-resilience/tests/test-security-quick.js
审计记录
相关的 Paperclip 问题
以下是此技能绕过的上游缺陷:
许可证
MIT