Battle-Tested Agent

19 production-hardened patterns for AI agents. Every one earned from failure.

Use this skill when you are:

- hardening an agent that will run repeatedly or autonomously
tightening memory, verification, or anti-hallucination behavior
reducing compaction failures, weak handoffs, or orchestration drift
reviewing an agent workspace for missing production patterns
debugging why an agent keeps losing context, guessing, or dropping work

Do not use this skill for:

- persona writing or onboarding polish
one-off prompt tweaks with no reusable pattern behind them
adding new tools, servers, or runtime capabilities
turning a simple workspace into process theater

Default workflow

1. Audit first

Run bash scripts/audit.sh <workspace> to see which patterns are present. The script checks for all 16 patterns and tells you what to fix first.

2. Start with the smallest tier that fits

Implement starter patterns first, then intermediate, then advanced. Do not cargo-cult every pattern into every agent.

3. Patch the actual failure mode

Change the mechanism, not just the wording. "ALWAYS check X" is not a fix — a verification gate is a fix.

4. Keep patterns lightweight

Add only the pieces that materially reduce failures or operator burden.

Pattern tiers

- Starter (5): baseline reliability for almost every agent
Intermediate (5): daily-driver patterns for briefs, heartbeats, and recurring work
Advanced (6): multi-agent orchestration, handoffs, and self-improvement discipline

Pattern clusters

Some patterns reinforce each other naturally. Adopt them together when the failure
mode calls for it:

- Trust chain: WAL Protocol + Anti-Hallucination + Agent Verification — ensures

data is captured, sourced, and measured before reporting

- Handoff loop: Delegation Rules + Completion Contract + Acceptance Gate + Task State Tracking — prevents

work from disappearing between agents or being certified without proof

- Survival kit: Working Buffer + Compaction Injection Hardening + Silent Worker Recovery — keeps context

alive across long sessions and prevents silent delegated drift

- Quality gate: QA Gates + Verify Implementation + Decision Logs — ensures output

quality and traceable reasoning

- Delegation hardening: Brief Quality Gate + Scoped Verifier Gate — keeps delegation tight without turning the whole system into bureaucracy

When patterns conflict

If two patterns seem to give contradictory advice:

- Safety patterns win over speed patterns. Ambiguity Gate overrides Simple Path First

when the request is ambiguous. Verify before acting, even if the simple path is obvious.

- Evidence patterns win over action patterns. Anti-Hallucination overrides "just try it"

when reporting data. Never guess a number to move faster.

Assets — how to use them

The assets/ folder contains starter files you copy into your workspace and customize.
They are templates, not drop-in replacements.

CODEBLOCK0

Read references/audit-usage.md for the full rollout order and bootstrap workflow.

References

- references/starter-patterns.md — WAL, anti-hallucination, ambiguity, simple-path-first, unblock-before-shelve
INLINECODE4 — verification, working buffer, QA gates, decision logs, verify implementation
INLINECODE5 — delegation, brief quality, proof-based handoffs, acceptance gates, orchestration, stale-worker recovery, compaction hardening, recurrence tracking
INLINECODE6 — audit script usage, install/copy snippets, and expected outcomes

Included scripts

- scripts/audit.sh — workspace audit for all 19 patterns (supports AGENTS.md, CLAUDE.md, SOUL.md, and system.md)

Rules of thumb

- Audit before expanding
Prefer progressive disclosure over giant core files
Silence is better than hallucination
Ambiguity is a stop sign, not permission
The orchestrator should preserve oversight, not sink into implementation
Mechanism changes beat wording changes
After acting, verify the new state before declaring success
Partial progress is not success; recovery steps matter as much as first-attempt steps

Outcome

A leaner, more resilient agent that survives compaction, hands work off cleanly,
reports only what is verified, and improves without spiraling into bureaucracy.

实战锤炼型智能体

19个经生产环境验证的智能体模式，每一个都源自失败教训。

在以下场景使用此技能：

- 加固需要重复运行或自主运行的智能体
收紧记忆、验证或反幻觉行为
减少压缩失败、弱交接或编排漂移
审查智能体工作区以发现缺失的生产模式
调试智能体持续丢失上下文、猜测或遗漏工作的原因

请勿在以下场景使用此技能：

- 角色设定撰写或入职流程优化
缺乏可复用模式的单次提示词调整
添加新工具、服务器或运行时能力
将简单工作区变成流程表演

默认工作流程

1. 先审计

运行 bash scripts/audit.sh 查看当前已有哪些模式。该脚本会检查全部16种模式，并告知应优先修复的内容。

2. 从最小的适用层级开始

先实施入门模式，再实施中级模式，最后实施高级模式。不要将每个模式都生搬硬套到每个智能体中。

3. 修补实际故障模式

改变机制，而不仅仅是措辞。始终检查X不是修复方案——验证门控才是。

4. 保持模式轻量化

只添加能实质性减少故障或操作负担的组件。

模式层级

- 入门级（5种）： 几乎所有智能体的基础可靠性保障
中级（5种）： 适用于简报、心跳检测和周期性工作的日常驱动模式
高级（6种）： 多智能体编排、交接和自我改进规范

模式集群

某些模式天然相互强化。当故障模式需要时，应一并采用：

- 信任链： WAL协议 + 反幻觉 + 智能体验证——确保数据在报告前已被捕获、溯源和度量
交接循环： 委派规则 + 完成契约 + 验收门控 + 任务状态追踪——防止工作在不同智能体间消失或未经证明即被确认
生存工具包： 工作缓冲区 + 压缩注入加固 + 静默工作器恢复——在长会话中保持上下文存活，防止静默委派漂移
质量门控： QA门控 + 验证实施 + 决策日志——确保输出质量和可追溯的推理过程
委派加固： 简报质量门控 + 范围验证门控——保持委派紧凑性，同时避免整个系统陷入官僚主义

模式冲突时的处理

若两个模式给出矛盾建议：

- 安全模式优先于速度模式。 当请求存在歧义时，歧义门控覆盖简单路径优先原则。先验证再行动，即使简单路径显而易见。
证据模式优先于行动模式。 报告数据时，反幻觉覆盖先试试看原则。切勿为了加快速度而猜测数字。

资产——如何使用

assets/ 文件夹包含可复制到工作区并自定义的入门文件。它们是模板，而非即插即用的替代品。

bash

将委派和决策日志规则合并到现有的 AGENTS.md 中

cp assets/AGENTS-additions.md ~/workspace/ # 审查后合并

添加 QA 门控

cp assets/QA-gates.md ~/workspace/QA.md

设置自我改进追踪

mkdir -p ~/workspace/.learnings cp assets/learnings-template.md ~/workspace/.learnings/LEARNINGS.md cp assets/errors-template.md ~/workspace/.learnings/ERRORS.md cp assets/features-template.md ~/workspace/.learnings/FEATURE_REQUESTS.md

阅读 references/audit-usage.md 了解完整的部署顺序和引导工作流程。

参考资料

- references/starter-patterns.md — WAL、反幻觉、歧义处理、简单路径优先、搁置前先解阻
references/intermediate-patterns.md — 验证、工作缓冲区、QA门控、决策日志、验证实施
references/advanced-patterns.md — 委派、简报质量、基于证明的交接、验收门控、编排、陈旧工作器恢复、压缩加固、周期性追踪
references/audit-usage.md — 审计脚本使用、安装/复制代码片段及预期结果

包含的脚本

- scripts/audit.sh — 针对全部19种模式的工作区审计（支持 AGENTS.md、CLAUDE.md、SOUL.md 和 system.md）

经验法则

- 先审计再扩展
渐进式披露优于巨型核心文件
沉默胜于幻觉
歧义是停止标志，而非许可
编排器应保持监督职能，而非陷入具体实现
机制改变优于措辞改变
行动后，先验证新状态再宣布成功
部分进展不等于成功；恢复步骤与首次尝试步骤同等重要

成果

一个更精简、更具韧性的智能体，能够经受压缩考验，干净利落地交接工作，仅报告经过验证的信息，并在不陷入官僚主义的情况下持续改进。

battle-tested-agent战场老手

battle-tested-agent

Battle-Tested Agent

Default workflow