Adversarial Review
Structured multi-agent review loop. Catches what a single agent misses.
Session store: ~/.openclaw/workspace/reviews/
Process: Init session → spawn Opus reviewers → collect redlines → position on each → produce v2 → deliver
Complexity Self-Assessment
Run this check whenever you produce a substantial document. Score 1 point per signal present. If score ≥ 3, offer the review loop without being asked.
| # | Signal | Points |
|---|
| 1 | Has multiple interdependent components (failure in one affects others) | 1 |
| 2 |
Involves schema changes, migrations, or index design | 1 |
| 3 | Irreversible or expensive to undo (data loss, structural rework) | 1 |
| 4 | Affects production systems, stored data, or external services | 1 |
| 5 | Introduces new abstractions, taxonomies, or data models | 1 |
| 6 | Has a defined sequence of steps where order matters | 1 |
| 7 | Contains security, access control, or permission logic | 1 |
| 8 | Will be acted on by code or agents without further human review | 1 |
| 9 | Document is longer than ~500 lines or covers 3+ distinct systems | 1 |
| 10 | Scott said "let's build this" or "implement this" at any point in the conversation | 1 |
Score 0–2 → skip. Simple doc, don't add noise.
Score 3–6 → offer. "This scores [N]/10 on complexity. Want me to run the review team on it before we act?"
Score 7–10 → strongly recommend. Don't just offer — make the case. "This scores [N]/10 on complexity — multiple interdependent systems, production consequences, hard to reverse. I'd strongly recommend running the review team before we act on this. Today's taxonomy strategy was a 10/10 and the review caught 14 issues including multiple production-breaking bugs."
Quick Reference
| Step | Action |
|---|
| 0. Init session | INLINECODE1 |
| 1. Choose reviewers |
Read
references/review-types.md for the right bundle |
| 2. Spawn reviewers |
sessions_spawn with
model=anthropic/claude-opus-4-6,
mode=run — all in parallel |
| 3. Wait | Reviewers auto-announce. Do NOT poll. |
| 4. Save raw output | Write each reviewer result to
redlines/reviewer-{role}.md |
| 5. Synthesize |
scripts/synthesize.sh <session-dir> → writes
redlines/combined.md |
| 6. Position | AGREE / DISAGREE / MODIFY on every redline → write
positions.md |
| 7. Produce v2 | Write
output/{slug}-v2.md incorporating accepted changes + rejected appendix |
| 8. Deliver |
scripts/cp-output.sh <session-name> <destination> |
Session Directory Structure
CODEBLOCK0
Review Types
| Document Type | Reviewer A | Reviewer B |
|---|
| Architecture / strategy | Theory & data modeling | Implementation & systems |
| Pipeline / workflow |
Sequencing & dependencies | Failure modes & ops |
| Schema / migration | SQL correctness & constraints | Performance & indexes |
| Security design | Threat modeling | Implementation gaps |
| Marketing / positioning | Message clarity & truth | Competitive exposure |
| API / interface design | Consistency & contracts | Consumer experience |
For full persona prompt templates → read references/reviewer-personas.md
For pre-configured bundles → read references/review-types.md
Spawning Reviewers
Spawn ALL reviewers simultaneously — parallel, not sequential. Independent reviewers find different issues.
Model Selection
| Doc Score | Default Model | Rationale |
|---|
| 7–10 | INLINECODE14 | Deep reasoning required; subtle architectural flaws need Opus |
| 3–6 |
anthropic/claude-sonnet-4-6 | Worth trying; structured prompts may close the gap |
A/B testing note: If Sonnet misses a CRITICAL issue that Opus would have caught on a 3–6 doc, upgrade that doc type to Opus permanently. Track findings in references/model-notes.md as patterns emerge.
Key parameters for every reviewer spawn:
CODEBLOCK1
The task field contains the full reviewer prompt from references/reviewer-personas.md plus the document content to review.
Positioning Rules
For EVERY redline, take an explicit position. No skipping.
| Position | When | Requirement |
|---|
| AGREE | Critique is correct, change should be made | State what changes |
| DISAGREE |
Original design is defensible | Must provide rationale — not just dismissal |
|
MODIFY | Issue is real, suggested resolution is wrong | Propose your alternative |
All CRITICAL redlines default to AGREE unless strongly defensible.
At least 1 DISAGREE expected — if zero, you may be rubber-stamping.
Write positions to positions.md in the session directory.
v2 Requirements
- - Revision table at the top (what changed and why)
- All AGREE + MODIFY changes incorporated
- Rejected redlines documented in an appendix ("considered and rejected")
- Version bumped, date updated
- Saved to INLINECODE19
Quality Bar
A good review session produces:
- - ≥2 CRITICAL issues (if zero, reviewers weren't adversarial enough — re-spawn with harder prompt)
- ≥1 DISAGREE from farsight (if zero, consider whether the doc was genuinely perfect or just unchallenged)
- A v2 meaningfully different from v1
Redline Format
CODEBLOCK2
Full spec → read INLINECODE20
对抗性审查
结构化多智能体审查循环。捕捉单个智能体会遗漏的问题。
会话存储: ~/.openclaw/workspace/reviews/
流程: 初始化会话 → 生成Opus审查员 → 收集红线意见 → 逐一表态 → 生成v2版本 → 交付
复杂度自评
每当生成重要文档时,请运行此检查。 每出现一个信号计1分。若得分 ≥ 3,则主动提供审查循环,无需等待请求。
| # | 信号 | 分值 |
|---|
| 1 | 包含多个相互依赖的组件(一个组件故障会影响其他组件) | 1 |
| 2 |
涉及模式变更、数据迁移或索引设计 | 1 |
| 3 | 不可逆或撤销成本高昂(数据丢失、结构性返工) | 1 |
| 4 | 影响生产系统、已存储数据或外部服务 | 1 |
| 5 | 引入新的抽象、分类体系或数据模型 | 1 |
| 6 | 具有明确的步骤序列且顺序至关重要 | 1 |
| 7 | 包含安全、访问控制或权限逻辑 | 1 |
| 8 | 将被代码或智能体执行,无需进一步人工审查 | 1 |
| 9 | 文档长度超过约500行或涉及3个以上不同系统 | 1 |
| 10 | 对话中Scott曾说过我们来构建这个或实现这个 | 1 |
得分0–2 → 跳过。 简单文档,不要增加干扰。
得分3–6 → 主动提供。 此文档复杂度评分为[N]/10。是否需要在执行前运行审查团队?
得分7–10 → 强烈建议。 不仅要主动提供,更要说明理由。此文档复杂度评分为[N]/10——涉及多个相互依赖的系统、影响生产环境、难以回退。我强烈建议在执行前运行审查团队。今天的分类策略评分为10/10,审查发现了14个问题,包括多个可能导致生产环境崩溃的缺陷。
快速参考
| 步骤 | 操作 |
|---|
| 0. 初始化会话 | scripts/new-review.sh <标识符> <文档路径> |
| 1. 选择审查员 |
阅读 references/review-types.md 选择正确的组合 |
| 2. 生成审查员 | sessions_spawn 使用 model=anthropic/claude-opus-4-6,mode=run——全部并行 |
| 3. 等待 | 审查员自动通知。请勿轮询。 |
| 4. 保存原始输出 | 将每位审查员的结果写入 redlines/reviewer-{角色}.md |
| 5. 综合 | scripts/synthesize.sh <会话目录> → 写入 redlines/combined.md |
| 6. 表态 | 对每条红线意见执行 同意 / 不同意 / 修改 → 写入 positions.md |
| 7. 生成v2版本 | 将采纳的修改写入 output/{标识符}-v2.md,并附上被拒绝的意见附录 |
| 8. 交付 | scripts/cp-output.sh <会话名称> <目标路径> |
会话目录结构
~/.openclaw/workspace/reviews/{YYYY-MM-DD}-{标识符}/
├── input/
│ └── {原始文件名} ← 被审查文档的副本
├── redlines/
│ ├── reviewer-{角色}.md ← 每位审查员的原始输出
│ └── combined.md ← synthesize.sh 的输出(按严重程度排序)
├── positions.md ← farsight 的同意/不同意记录
└── output/
└── {标识符}-v2.md ← 最终文档
审查类型
| 文档类型 | 审查员A | 审查员B |
|---|
| 架构/策略 | 理论与数据建模 | 实现与系统 |
| 流水线/工作流 |
排序与依赖关系 | 故障模式与运维 |
| 模式/迁移 | SQL正确性与约束 | 性能与索引 |
| 安全设计 | 威胁建模 | 实现漏洞 |
| 营销/定位 | 信息清晰度与真实性 | 竞争暴露 |
| API/接口设计 | 一致性与契约 | 消费者体验 |
完整角色提示模板 → 阅读 references/reviewer-personas.md
预配置组合 → 阅读 references/review-types.md
生成审查员
同时生成所有审查员——并行而非串行。独立的审查员能发现不同的问题。
模型选择
| 文档评分 | 默认模型 | 理由 |
|---|
| 7–10 | anthropic/claude-opus-4-6 | 需要深度推理;细微的架构缺陷需要Opus |
| 3–6 |
anthropic/claude-sonnet-4-6 | 值得尝试;结构化提示可能缩小差距 |
A/B测试说明: 如果Sonnet在3–6分的文档中遗漏了Opus本可发现的严重问题,则将该文档类型永久升级为Opus。随着模式的出现,将发现记录在 references/model-notes.md 中。
每次生成审查员的关键参数:
model: anthropic/claude-opus-4-6 ← 或3-6分文档使用sonnet
mode: run
runTimeoutSeconds: 300
label: reviewer-{角色}
任务字段包含来自 references/reviewer-personas.md 的完整审查员提示以及待审查的文档内容。
表态规则
对每条红线意见,必须明确表态。不得跳过。
| 表态 | 适用场景 | 要求 |
|---|
| 同意 | 批评正确,应进行修改 | 说明具体修改内容 |
| 不同意 |
原始设计可辩护 | 必须提供理由——不能仅表示驳回 |
|
修改 | 问题确实存在,但建议的解决方案错误 | 提出你的替代方案 |
所有严重红线意见默认同意,除非有强有力的辩护理由。
预期至少出现1次不同意——如果为零,你可能只是在走过场。
将表态写入会话目录中的 positions.md。
v2版本要求
- - 顶部附修订表(变更内容及原因)
- 所有同意 + 修改的变更均已纳入
- 被拒绝的红线意见记录在附录中(已考虑并拒绝)
- 版本号升级,日期更新
- 保存至 output/{标识符}-v2.md
质量标准
一次良好的审查会话应产生:
- - ≥2个严重问题(如果为零,说明审查员不够对抗性——使用更严格的提示重新生成)
- farsight至少提出1次不同意(如果为零,请思考文档是真正完美还是未经挑战)
- v2版本与v1版本有实质性差异
红线意见格式
[红线-{类型}-{编号}] {章节引用}
主张: 文档所述内容
质疑: 具体的反对意见或漏洞
严重程度: 严重 | 主要 | 次要
建议解决方案: 应如何修改
完整规范 → 阅读 references/redline-format.md