Adversarial Review

Structured multi-agent review loop. Catches what a single agent misses.

Session store: ~/.openclaw/workspace/reviews/
Process: Init session → spawn Opus reviewers → collect redlines → position on each → produce v2 → deliver

Complexity Self-Assessment

Run this check whenever you produce a substantial document. Score 1 point per signal present. If score ≥ 3, offer the review loop without being asked.

#	Signal	Points
1	Has multiple interdependent components (failure in one affects others)	1
2

Involves schema changes, migrations, or index design | 1 |
| 3 | Irreversible or expensive to undo (data loss, structural rework) | 1 |
| 4 | Affects production systems, stored data, or external services | 1 |
| 5 | Introduces new abstractions, taxonomies, or data models | 1 |
| 6 | Has a defined sequence of steps where order matters | 1 |
| 7 | Contains security, access control, or permission logic | 1 |
| 8 | Will be acted on by code or agents without further human review | 1 |
| 9 | Document is longer than ~500 lines or covers 3+ distinct systems | 1 |
| 10 | Scott said "let's build this" or "implement this" at any point in the conversation | 1 |

Score 0–2 → skip. Simple doc, don't add noise.

Score 3–6 → offer. "This scores [N]/10 on complexity. Want me to run the review team on it before we act?"

Score 7–10 → strongly recommend. Don't just offer — make the case. "This scores [N]/10 on complexity — multiple interdependent systems, production consequences, hard to reverse. I'd strongly recommend running the review team before we act on this. Today's taxonomy strategy was a 10/10 and the review caught 14 issues including multiple production-breaking bugs."

Quick Reference

Step	Action
0. Init session	INLINECODE1
1. Choose reviewers

Read references/review-types.md for the right bundle |
| 2. Spawn reviewers | sessions_spawn with model=anthropic/claude-opus-4-6, mode=run — all in parallel |
| 3. Wait | Reviewers auto-announce. Do NOT poll. |
| 4. Save raw output | Write each reviewer result to redlines/reviewer-{role}.md |
| 5. Synthesize | scripts/synthesize.sh <session-dir> → writes redlines/combined.md |
| 6. Position | AGREE / DISAGREE / MODIFY on every redline → write positions.md |
| 7. Produce v2 | Write output/{slug}-v2.md incorporating accepted changes + rejected appendix |
| 8. Deliver | scripts/cp-output.sh <session-name> <destination> |

Session Directory Structure

CODEBLOCK0

Review Types

Document Type	Reviewer A	Reviewer B
Architecture / strategy	Theory & data modeling	Implementation & systems
Pipeline / workflow

For full persona prompt templates → read references/reviewer-personas.md
For pre-configured bundles → read references/review-types.md

Spawning Reviewers

Spawn ALL reviewers simultaneously — parallel, not sequential. Independent reviewers find different issues.

Model Selection

Doc Score	Default Model	Rationale
7–10	INLINECODE14	Deep reasoning required; subtle architectural flaws need Opus
3–6

anthropic/claude-sonnet-4-6 | Worth trying; structured prompts may close the gap |

A/B testing note: If Sonnet misses a CRITICAL issue that Opus would have caught on a 3–6 doc, upgrade that doc type to Opus permanently. Track findings in references/model-notes.md as patterns emerge.

Key parameters for every reviewer spawn:
CODEBLOCK1

The task field contains the full reviewer prompt from references/reviewer-personas.md plus the document content to review.

Positioning Rules

For EVERY redline, take an explicit position. No skipping.

Position	When	Requirement
AGREE	Critique is correct, change should be made	State what changes
DISAGREE

All CRITICAL redlines default to AGREE unless strongly defensible.
At least 1 DISAGREE expected — if zero, you may be rubber-stamping.

Write positions to positions.md in the session directory.

v2 Requirements

- Revision table at the top (what changed and why)
All AGREE + MODIFY changes incorporated
Rejected redlines documented in an appendix ("considered and rejected")
Version bumped, date updated
Saved to INLINECODE19

Quality Bar

A good review session produces:

- ≥2 CRITICAL issues (if zero, reviewers weren't adversarial enough — re-spawn with harder prompt)
≥1 DISAGREE from farsight (if zero, consider whether the doc was genuinely perfect or just unchallenged)
A v2 meaningfully different from v1

Redline Format

CODEBLOCK2

Full spec → read INLINECODE20

对抗性审查

结构化多智能体审查循环。捕捉单个智能体会遗漏的问题。

会话存储： ~/.openclaw/workspace/reviews/
流程： 初始化会话 → 生成Opus审查员 → 收集红线意见 → 逐一表态 → 生成v2版本 → 交付

复杂度自评

每当生成重要文档时，请运行此检查。 每出现一个信号计1分。若得分 ≥ 3，则主动提供审查循环，无需等待请求。

#	信号	分值
1	包含多个相互依赖的组件（一个组件故障会影响其他组件）	1
2

涉及模式变更、数据迁移或索引设计 | 1 |
| 3 | 不可逆或撤销成本高昂（数据丢失、结构性返工） | 1 |
| 4 | 影响生产系统、已存储数据或外部服务 | 1 |
| 5 | 引入新的抽象、分类体系或数据模型 | 1 |
| 6 | 具有明确的步骤序列且顺序至关重要 | 1 |
| 7 | 包含安全、访问控制或权限逻辑 | 1 |
| 8 | 将被代码或智能体执行，无需进一步人工审查 | 1 |
| 9 | 文档长度超过约500行或涉及3个以上不同系统 | 1 |
| 10 | 对话中Scott曾说过我们来构建这个或实现这个 | 1 |

得分0–2 → 跳过。 简单文档，不要增加干扰。

得分3–6 → 主动提供。 此文档复杂度评分为[N]/10。是否需要在执行前运行审查团队？

得分7–10 → 强烈建议。 不仅要主动提供，更要说明理由。此文档复杂度评分为[N]/10——涉及多个相互依赖的系统、影响生产环境、难以回退。我强烈建议在执行前运行审查团队。今天的分类策略评分为10/10，审查发现了14个问题，包括多个可能导致生产环境崩溃的缺陷。

快速参考

步骤	操作
0. 初始化会话	scripts/new-review.sh <标识符> <文档路径>
1. 选择审查员

阅读 references/review-types.md 选择正确的组合 |
| 2. 生成审查员 | sessions_spawn 使用 model=anthropic/claude-opus-4-6，mode=run——全部并行 |
| 3. 等待 | 审查员自动通知。请勿轮询。 |
| 4. 保存原始输出 | 将每位审查员的结果写入 redlines/reviewer-{角色}.md |
| 5. 综合 | scripts/synthesize.sh <会话目录> → 写入 redlines/combined.md |
| 6. 表态 | 对每条红线意见执行同意 / 不同意 / 修改 → 写入 positions.md |
| 7. 生成v2版本 | 将采纳的修改写入 output/{标识符}-v2.md，并附上被拒绝的意见附录 |
| 8. 交付 | scripts/cp-output.sh <会话名称> <目标路径> |

会话目录结构

~/.openclaw/workspace/reviews/{YYYY-MM-DD}-{标识符}/
├── input/
│ └── {原始文件名} ← 被审查文档的副本
├── redlines/
│ ├── reviewer-{角色}.md ← 每位审查员的原始输出
│ └── combined.md ← synthesize.sh 的输出（按严重程度排序）
├── positions.md ← farsight 的同意/不同意记录
└── output/
└── {标识符}-v2.md ← 最终文档

审查类型

文档类型	审查员A	审查员B
架构/策略	理论与数据建模	实现与系统
流水线/工作流

完整角色提示模板 → 阅读 references/reviewer-personas.md
预配置组合 → 阅读 references/review-types.md

生成审查员

同时生成所有审查员——并行而非串行。独立的审查员能发现不同的问题。

模型选择

文档评分	默认模型	理由
7–10	anthropic/claude-opus-4-6	需要深度推理；细微的架构缺陷需要Opus
3–6

anthropic/claude-sonnet-4-6 | 值得尝试；结构化提示可能缩小差距 |

A/B测试说明： 如果Sonnet在3–6分的文档中遗漏了Opus本可发现的严重问题，则将该文档类型永久升级为Opus。随着模式的出现，将发现记录在 references/model-notes.md 中。

每次生成审查员的关键参数：

model: anthropic/claude-opus-4-6 ← 或3-6分文档使用sonnet
mode: run
runTimeoutSeconds: 300
label: reviewer-{角色}

任务字段包含来自 references/reviewer-personas.md 的完整审查员提示以及待审查的文档内容。

表态规则

对每条红线意见，必须明确表态。不得跳过。

表态	适用场景	要求
同意	批评正确，应进行修改	说明具体修改内容
不同意

所有严重红线意见默认同意，除非有强有力的辩护理由。
预期至少出现1次不同意——如果为零，你可能只是在走过场。

将表态写入会话目录中的 positions.md。

v2版本要求

- 顶部附修订表（变更内容及原因）
所有同意 + 修改的变更均已纳入
被拒绝的红线意见记录在附录中（已考虑并拒绝）
版本号升级，日期更新
保存至 output/{标识符}-v2.md

质量标准

一次良好的审查会话应产生：

- ≥2个严重问题（如果为零，说明审查员不够对抗性——使用更严格的提示重新生成）
farsight至少提出1次不同意（如果为零，请思考文档是真正完美还是未经挑战）
v2版本与v1版本有实质性差异

红线意见格式

[红线-{类型}-{编号}] {章节引用}
主张： 文档所述内容
质疑： 具体的反对意见或漏洞
严重程度： 严重 | 主要 | 次要
建议解决方案： 应如何修改

完整规范 → 阅读 references/redline-format.md

adversarial-review对抗审查循环