HARNESS ENGINEER
A production-grade skill for Claude Code and OpenClaw that transforms a repository into a
self-improving software system using six core harness engineering principles.
SIX CORE PRINCIPLES
P1: CONTEXT ENGINEERING
Treat context as a finite, precious resource. Curate aggressively.
See: runtime/context-engineering.md, runtime/compaction.md
P2: TOOL USAGE
Each sub-agent receives only the tools it needs -- no more.
See: tools/TOOL_REGISTRY.md, references/mcp-tools.md
P3: VERIFICATION MECHANISM
Every output is verified by someone other than who produced it.
See: agents/reviewer.md, references/testing-standards.md
P4: STATUS MANAGEMENT
State lives outside the context window, in the repo.
See: runtime/status-management.md, templates/handoff.md
P5: OBSERVABILITY AND FEEDBACK CLOSED-LOOP
Track what happens. Feed failures back into the harness, not the code.
See: runtime/observability.md, runtime/memory-system.md
P6: HUMAN SUPERVISION
Humans approve high-impact events. The harness surfaces them explicitly.
See: runtime/autonomy-rules.md, runtime/prioritization.md
NON-NEGOTIABLE RULES
- 1. CLAUDE.md / AGENTS.md IS GROUND TRUTH -- read it first, every session.
- CODEBASE OVER DOCS -- when they conflict, trust the code.
- 40% CONTEXT RULE -- compact or sub-agent before crossing 40% of context window.
- NO IMPLEMENTATION WITHOUT a research output, a plan, and validation criteria.
- GENERATION AND REVIEW ARE ALWAYS SEPARATE -- never the same agent.
- FAILURE = HARNESS GAP -- fix the harness, not just the symptom.
- OPTIMIZATION PRIORITY: Security => Correctness => Reliability => Performance =>
Memory => Maintainability => Cost
- 8. MINIMAL SCOPE PER SUBAGENT — Estimate codebase size first (use platform file-count or line-count capability — never raw shell). If >5K lines, split into multiple subagents by module/feature/layer. Pin exact files to read (no wandering). One research doc + one code area per subagent max. If a subagent gets killed or times out, the scope was too large — split further.
Adaptive timeouts: Default timeouts are guidelines, not hard kills. Check process logs before killing — if the agent is actively producing output, extend the timeout instead of killing. Only kill-and-split if the agent is silent/stuck for >10min or producing garbage. Scale timeouts by effort: S-effort=15min, M-effort=20min, L-effort=30-40min.
- 9. SUBAGENT PERMISSION MODE — Subagents are spawned by the platform using its native agent mechanism. The permission mode is set by the platform, NOT by this skill. The skill MUST NOT mandate any specific spawn command or permission mode — that decision belongs to the platform's enforcement layer (see PLATFORM_REQUIREMENTS.md Section 8). If the platform's default permission mode is insufficient, the platform operator configures it — the skill never overrides it.
- ACTIVE MONITORING — Every time you launch a new batch of subagents, track session IDs, expected output files, and remaining queue. If the platform provides a cron/scheduler, use it to detect dead agents. If no scheduler is available, check agent status before each dispatch step. Dead agents stall the pipeline — detect them early.
- MAX PARALLEL = 5 — Up to 5 Claude Code agents running simultaneously. If rate/API limit errors encountered, drop to 4, then 3, etc until no errors. Resume increasing after 5 clean minutes.
- TOKEN EXHAUSTION RECOVERY: If ALL active agents hit rate/API limits (429/500), tokens are exhausted. Wait for token refresh before retrying. If the platform provides a scheduler, set a recovery job to resume after refresh. If no scheduler is available, the human operator must manually restart the cycle.
- 10-MIN STUCK KILL — If any agent produces no output for >10 minutes, log the issue, kill it, and split the task into smaller subtasks before respawning. MUST ALWAYS set a cron job when a subagent is given a command that will run for a while, to periodically check on its progress.
- TRACKING EVERYWHERE — Every phase, cycle, and step writes to tracking logs. DISPATCH-TRACK, error log, compact summaries, progress logs. Recovery must be able to pick up from any interruption point.
SAFE START GUIDE
Before anything else: read PLATFORM_REQUIREMENTS.md and verify every item.
The harness depends on platform enforcement that cannot be checked from these files alone.
Step 1 -- Verify platform requirements (PLATFORM_REQUIREMENTS.md)
Run through the five platform capability checks before any other step.
Step 2 -- Sandbox first
Run on a throwaway branch. Observe one single-pass cycle before enabling continuous mode.
Step 2 -- Review CONFIG.yaml before every run
| loop_mode | single-pass | Change after sandbox validation |
| maxparallelagents | 3 | Increase after confirming behavior |
| block_destructive... | true | Never change |
PRs always require human approval. There is no auto-merge.
Step 3 -- Protect main branch
Require human reviewers on main/trunk in your git host.
Step 4 -- Graduation path: single-pass => maintenance => continuous
HOW TO USE THIS SKILL
When activated in Claude Code or OpenClaw, read in this order:
- 1. CLAUDE.md or AGENTS.md if present (base context)
- CONFIG.yaml (runtime settings)
- runtime/loop.md (execution model)
- runtime/context-engineering.md (context budget rules)
- runtime/status-management.md (restore checkpoint if resuming)
- MEMORY.md (prior failure context)
- agents/dispatcher.md (task decomposition model, worktree agent)
- Begin the loop
REFERENCE FILES
| File | When to read |
|---|
| CLAUDE.md / AGENTS.md | First, every session -- base knowledge |
| CONFIG.yaml |
At startup |
| MEMORY.md | At startup and after every failure |
| runtime/loop.md | Each loop cycle |
| runtime/context-engineering.md | Continuously -- governs context budget |
| runtime/compaction.md | When compacting context within a phase |
| runtime/status-management.md | At startup (resume) and after each task |
| runtime/observability.md | After VERIFY phase |
| runtime/memory-system.md | When writing or querying memory |
| runtime/self-improvement.md | After any failure |
| runtime/prioritization.md | When selecting the next task |
| runtime/autonomy-rules.md | When blocked or at human gate |
| agents/dispatcher.md | Before decomposing any task (worktree agent) |
| agents/researcher.md | Research phase (Q-Agent + R-Agent model) |
| agents/planner.md | Plan phase (3-phase: design, outline, master plan) |
| agents/implementer.md | Implement phase (worktree-driven execution) |
| agents/reviewer.md | Review cycle |
| agents/debugger.md | On any failure |
| agents/optimizer.md | Optimization mode |
| agents/garbage-collector.md | GC interval |
| tools/TOOL_REGISTRY.md | Before any tool call |
| tools/tool-router.md | Routing and redaction rules |
| tools/execution-protocol.md | Full tool call lifecycle |
| references/harness-rules.md | Core constraints |
| references/testing-standards.md | Before writing or running tests |
| references/security-performance.md| Before any implementation |
| references/simplification-checklist.md | During review and refactoring |
| references/git-workflow.md | Before any commit or PR |
| references/mcp-tools.md | MCP tool definitions and per-agent sets |
| references/sensitive-paths.md | Forbidden read paths -- enforced in-skill |
| references/constraints.md | Active prevention rules |
| templates/ | Plans, ADRs, handoffs, status docs |
线束工程师
一个面向Claude Code和OpenClaw的生产级技能,通过六大核心线束工程原则将代码仓库转变为自我改进的软件系统。
六大核心原则
P1:上下文工程
将上下文视为有限且宝贵的资源。积极筛选。
参见:runtime/context-engineering.md,runtime/compaction.md
P2:工具使用
每个子代理只接收其所需的工具——不多不少。
参见:tools/TOOL_REGISTRY.md,references/mcp-tools.md
P3:验证机制
每个输出都由非产出者进行验证。
参见:agents/reviewer.md,references/testing-standards.md
P4:状态管理
状态存在于上下文窗口之外,即代码仓库中。
参见:runtime/status-management.md,templates/handoff.md
P5:可观测性与反馈闭环
追踪发生的事件。将失败反馈回线束,而非代码。
参见:runtime/observability.md,runtime/memory-system.md
P6:人工监督
人类批准高影响事件。线束明确呈现这些事件。
参见:runtime/autonomy-rules.md,runtime/prioritization.md
不可协商的规则
- 1. CLAUDE.md / AGENTS.md 是基本事实——每次会话首先阅读。
- 代码库优先于文档——当两者冲突时,信任代码。
- 40%上下文规则——在达到上下文窗口的40%之前进行压缩或使用子代理。
- 没有研究输出、计划和验证标准,不得实施。
- 生成和审查始终分离——绝不能是同一个代理。
- 失败=线束缺口——修复线束,而不仅仅是症状。
- 优化优先级:安全性=>正确性=>可靠性=>性能=>内存=>可维护性=>成本
- 每个子代理的最小范围——首先估算代码库大小(使用平台的文件计数或行计数能力——绝不使用原始shell)。如果超过5000行,按模块/功能/层拆分为多个子代理。精确定位要读取的文件(不漫游)。每个子代理最多一个研究文档+一个代码区域。如果子代理被终止或超时,说明范围过大——进一步拆分。
自适应超时: 默认超时是指南,而非硬性终止。在终止前检查进程日志——如果代理正在积极产出,则延长超时而非终止。仅当代理静默/卡住超过10分钟或产出垃圾内容时才终止并拆分。按工作量缩放超时:小工作量=15分钟,中工作量=20分钟,大工作量=30-40分钟。
- 9. 子代理权限模式——子代理由平台使用其原生代理机制生成。权限模式由平台设置,而非此技能。技能不得强制指定任何生成命令或权限模式——该决定属于平台的执行层(参见PLATFORM_REQUIREMENTS.md第8节)。如果平台的默认权限模式不足,由平台操作员配置——技能绝不覆盖。
- 主动监控——每次启动新一批子代理时,追踪会话ID、预期输出文件和剩余队列。如果平台提供cron/调度器,使用它检测死代理。如果没有调度器,在每个调度步骤前检查代理状态。死代理会阻塞流水线——尽早检测。
- 最大并行数=5——最多5个Claude Code代理同时运行。如果遇到速率/API限制错误,降至4个,然后3个,依此类推直到无错误。在5分钟无错误后恢复增加。
- 令牌耗尽恢复:如果所有活跃代理都遇到速率/API限制(429/500),则令牌已耗尽。等待令牌刷新后重试。如果平台提供调度器,设置恢复任务在刷新后恢复。如果没有调度器,人类操作员必须手动重启周期。
- 10分钟卡死终止——如果任何代理超过10分钟未产出任何输出,记录问题,终止它,并将任务拆分为更小的子任务后重新生成。当子代理被赋予将运行一段时间的命令时,必须始终设置cron作业,定期检查其进度。
- 处处追踪——每个阶段、周期和步骤都写入追踪日志。调度追踪、错误日志、压缩摘要、进度日志。恢复必须能够从任何中断点继续。
安全启动指南
在此之前:阅读PLATFORM_REQUIREMENTS.md并验证每一项。
线束依赖于无法仅从这些文件中检查的平台执行。
步骤1——验证平台要求(PLATFORM_REQUIREMENTS.md)
在任何其他步骤之前,运行五项平台能力检查。
步骤2——先沙盒测试
在可丢弃的分支上运行。在启用连续模式前观察一次单次通过周期。
步骤2——每次运行前审查CONFIG.yaml
| loop_mode | 单次通过 | 沙盒验证后更改 |
| maxparallelagents | 3 | 确认行为后增加 |
| block_destructive... | true | 永不更改 |
PR始终需要人工审批。没有自动合并。
步骤3——保护主分支
在git托管平台中要求主/主干分支有人类审查者。
步骤4——升级路径:单次通过=>维护=>连续
如何使用此技能
在Claude Code或OpenClaw中激活时,按此顺序读取:
- 1. CLAUDE.md或AGENTS.md(如果存在)(基础上下文)
- CONFIG.yaml(运行时设置)
- runtime/loop.md(执行模型)
- runtime/context-engineering.md(上下文预算规则)
- runtime/status-management.md(如果恢复则恢复检查点)
- MEMORY.md(先前失败上下文)
- agents/dispatcher.md(任务分解模型,工作树代理)
- 开始循环
参考文件
| 文件 | 何时读取 |
|---|
| CLAUDE.md / AGENTS.md | 首先,每次会话——基础知识 |
| CONFIG.yaml |
启动时 |
| MEMORY.md | 启动时和每次失败后 |
| runtime/loop.md | 每个循环周期 |
| runtime/context-engineering.md | 持续——管理上下文预算 |
| runtime/compaction.md | 在阶段内压缩上下文时 |
| runtime/status-management.md | 启动时(恢复)和每个任务后 |
| runtime/observability.md | 验证阶段后 |
| runtime/memory-system.md | 写入或查询记忆时 |
| runtime/self-improvement.md | 任何失败后 |
| runtime/prioritization.md | 选择下一个任务时 |
| runtime/autonomy-rules.md | 被阻塞或处于人工关卡时 |
| agents/dispatcher.md | 分解任何任务前(工作树代理) |
| agents/researcher.md | 研究阶段(Q-代理+R-代理模型) |
| agents/planner.md | 计划阶段(三阶段:设计、大纲、主计划) |
| agents/implementer.md | 实施阶段(工作树驱动执行) |
| agents/reviewer.md | 审查周期 |
| agents/debugger.md | 任何失败时 |
| agents/optimizer.md | 优化模式 |
| agents/garbage-collector.md | 垃圾回收间隔 |
| tools/TOOL_REGISTRY.md | 任何工具调用前 |
| tools/tool-router.md | 路由和编辑规则 |
| tools/execution-protocol.md | 完整工具调用生命周期 |
| references/harness-rules.md | 核心约束 |
| references/testing-standards.md | 编写或运行测试前 |
| references/security-performance.md| 任何实施前 |
| references/simplification-checklist.md | 审查和重构期间 |
| references/git-workflow.md | 任何提交或PR前 |
| references/mcp-tools.md | MCP工具定义和每个代理的工具集 |
| references/sensitive-paths.md | 禁止读取路径——技能内强制执行 |
| references/constraints.md | 主动预防规则 |
| templates/ | 计划、ADR、交接、状态文档 |