cross-model-review

Metadata

name: cross-model-review
version: 2.0.0
description: >
  Adversarial plan review using two different AI models.
  v2: Alternating mode — models swap writer/reviewer each round.
  Fully autonomous loop — no human input between rounds.
  Use when: building features touching auth/payments/data models,
  plans that will take >1hr to implement.
  NOT for: simple one-file fixes, research tasks, quick scripts.
triggers:
  - "review this plan"
  - "cross review"
  - "challenge this"
  - "is this plan solid?"
  - "adversarial review"

When to Activate

Activate this skill when the user:

- Says any trigger phrase above
Shares a plan and asks for adversarial/second-opinion review
Asks you to "sanity check" a multi-step implementation plan

Do NOT activate for: simple fixes, one-liners, pure research tasks.

Modes

Static Mode (v1 — backward compatible)

Fixed roles: planner always writes, reviewer always reviews. Requires human to trigger each round.

Alternating Mode (v2 — recommended)

Models swap roles each round. Fully autonomous — no human input between rounds.

Flow:

- Round 1: Model A writes the plan. Model B reviews.
Round 2: Model B rewrites (based on its own review). Model A reviews.
Round 3: Model A rewrites (based on its own review). Model B reviews.
...continues alternating until both agree (reviewer says APPROVED) or max rounds hit.

Why this works:

- Each model must implement its own critique — can't nitpick without owning the fix
The other model catches over-engineering or proportionality issues
Natural convergence: each round addresses the other's concerns

Autonomous Orchestration (Alternating Mode)

You (the main agent) run this loop. It's fully autonomous after kickoff.

Step 1 — Save the plan and init

CODEBLOCK1

Captures workspace path from stdout.

Step 2 — The autonomous loop

CODEBLOCK2

Step 3 — Finalize

When the loop exits with APPROVED:
CODEBLOCK3

Present: rounds taken, issues found/resolved, rubric scores, plan-final.md location.

CLI Reference

CODEBLOCK4

Detailed Orchestration (for agent implementation)

Spawning reviewers

CODEBLOCK5

System instruction for reviewer: "You are a senior engineering reviewer. Output ONLY valid JSON matching the schema. No tool calls. No markdown fences. No preamble."

Spawning writers

CODEBLOCK6

System instruction for writer: none needed — the prompt is self-contained.

Error handling

- Reviewer timeout/failure: retry once, then ask user
Writer timeout/failure: retry once, then ask user
Parse error on review JSON: re-prompt reviewer once with "Your response was not valid JSON"
Max rounds hit: present status to user, ask for override or manual fix

Convergence

The loop converges when the reviewer says APPROVED with no open CRITICAL/HIGH blockers. The script enforces this — if reviewer says APPROVED but blockers remain, it overrides to REVISE.

Static Mode (v1 — backward compatible)

For static mode, the original orchestration from v1 still works:

Step 1 — Init

CODEBLOCK7

Step 2 — Manual loop

For each round: build reviewer prompt from template, spawn reviewer, parse-round, revise plan yourself, continue.

Step 3 — Finalize

Same as alternating mode.

Integration with coding-agent

Before dispatching any plan to coding-agent that:

- Touches auth, payments, or data models
Has 3+ implementation steps
The user hasn't already reviewed adversarially

Run cross-model-review first. Only proceed if exit code 0.

Notes

- Workspace persists in tasks/reviews/ — referenceable later
INLINECODE1 tracks full lifecycle of all issues
INLINECODE2 stores mode, models, current round, verdict, needsRevision flag
INLINECODE3 is the state machine — always call it to determine what to do
Dedup warnings help catch semantic drift across rounds
Models must be from different provider families (cross-provider enforcement)
INLINECODE4 is injected into reviewer prompts for calibration

跨模型审查

元数据

yaml 名称: cross-model-review 版本: 2.0.0 描述: > 使用两种不同AI模型进行对抗性计划审查。 v2：交替模式——模型每轮互换编写者/审查者角色。完全自主循环——轮次之间无需人工输入。适用场景：构建涉及认证/支付/数据模型的功能，实施时间超过1小时的计划。不适用于：单文件修复、研究任务、快速脚本。触发词: - 审查这个计划 - 交叉审查 - 挑战这个 - 这个计划可靠吗？ - 对抗性审查

何时激活

当用户出现以下情况时激活此技能：

- 说出上述任意触发词
分享计划并要求进行对抗性/第二意见审查
要求你对多步骤实施计划进行合理性检查

不适用于：简单修复、单行代码、纯研究任务。

模式

静态模式（v1 — 向后兼容）

固定角色：规划者始终编写，审查者始终审查。需要人工触发每轮。

交替模式（v2 — 推荐）

模型每轮互换角色。完全自主——轮次之间无需人工输入。

流程：

- 第1轮：模型A编写计划。模型B审查。
第2轮：模型B重写（基于自身审查意见）。模型A审查。
第3轮：模型A重写（基于自身审查意见）。模型B审查。
...持续交替，直到双方达成一致（审查者说已批准）或达到最大轮次。

为何有效：

- 每个模型必须实施自己的批评——不能只挑毛病而不负责修复
另一个模型能发现过度工程或比例失衡问题
自然收敛：每轮解决另一方的关切点

自主编排（交替模式）

你（主代理）运行此循环。启动后完全自主。

步骤1 — 保存计划并初始化

bash
node review.js init \
--plan /path/to/plan.md \
--mode alternating \
--model-a anthropic/claude-opus-4-6 \
--model-b openai-codex/gpt-5.3-codex \
--project-context 供审查者校准的简要描述 \
--out /home/ubuntu/clawd/tasks/reviews

从标准输出捕获工作空间路径。

步骤2 — 自主循环

while true:
step = next-step(workspace)

if step.action == done:
break # 已批准！

if step.action == max-rounds:
询问用户：覆盖或手动修复
break

if step.action == review:
生成子代理，使用 step.model, step.prompt
将响应保存到 workspace/round-N-response.json
parse-round(workspace, round, response)
continue

if step.action == revise:
生成子代理，使用 step.model, step.prompt
将输出计划保存到临时文件
save-plan(workspace, temp file, version)
continue

步骤3 — 完成

当循环以已批准退出时：
bash
node review.js finalize --workspace

展示：进行的轮次、发现/解决的问题、评分标准得分、plan-final.md位置。

CLI参考

命令：
init 创建审查工作空间
next-step 获取自主循环的下一个操作
parse-round 解析审查者响应，更新问题追踪器
save-plan 保存编写者输出的修订计划版本
finalize 生成plan-final.md、changelog.md、summary.json
status 打印当前工作空间状态

init选项：
--plan 计划文件路径（必需）
--mode static（默认）或alternating
--model-a 模型A — 首先编写（交替模式，必需）
--model-b 模型B — 首先审查（交替模式，必需）
--reviewer-model 审查者模型（静态模式，必需）
--planner-model 规划者模型（静态模式，必需）
--project-context 供审查者校准的简要项目背景
--out

输出基础目录（默认：tasks/reviews）
--max-rounds 最大轮次（默认：静态5，交替8）
--token-budget 上下文令牌预算（默认：8000）

next-step选项：
--workspace
审查工作空间路径（必需）
返回JSON：{ action, model, round, prompt, planVersion, saveTo }
操作：review、revise、done、max-rounds

parse-round选项：
--workspace
审查工作空间路径（必需）
--round 轮次编号（必需）
--response 原始审查者响应文件路径（必需）

save-plan选项：
--workspace
审查工作空间路径（必需）
--plan 修订计划markdown文件路径（必需）
--version 计划版本号（必需）

finalize选项：
--workspace
审查工作空间路径（必需）
--override-reason 存在未解决问题时强制批准的原因
--ci-force 在非TTY模式下覆盖时需要

status选项：
--workspace
审查工作空间路径（必需）

退出码：
0 已批准/正常
1 修订/达到最大轮次
2 错误

详细编排（供代理实现）

生成审查者

step = next-step(workspace) # action: review
response = sessions_spawn(model=step.model, task=step.prompt, timeout=120s)

将原始响应保存到 workspace/round-{step.round}-response.json

parse-round(workspace, step.round, response_file)

审查者系统指令：你是高级工程审查者。仅输出符合模式的合法JSON。无工具调用。无markdown围栏。无前言。

生成编写者

step = next-step(workspace) # action: revise
revisedplan = sessionsspawn(model=step.model, task=step.prompt, timeout=300s)

将原始输出保存为临时文件

save-plan(workspace, temp_file, step.planVersion)

编写者系统指令：无需——提示本身已自包含。

错误处理

- 审查者超时/失败：重试一次，然后询问用户
编写者超时/失败：重试一次，然后询问用户
审查JSON解析错误：重新提示审查者一次，提示您的响应不是合法JSON
达到最大轮次：向用户展示状态，询问覆盖或手动修复

收敛
当审查者说已批准且没有未解决的严重/高优先级阻塞项时，循环收敛。脚本强制执行此规则——如果审查者说已批准但仍有阻塞项，则覆盖为修订。

静态模式（v1 — 向后兼容）

对于静态模式，v1的原始编排仍然有效：

步骤1 — 初始化
bash node review.js init --plan --reviewer-model --planner-model
步骤2 — 手动循环
每轮：从模板构建审查者提示，生成审查者，解析轮次，自行修订计划，继续。
步骤3 — 完成
与交替模式相同。

与编码代理的集成

在将任何计划分派给编码代理之前，如果该计划：

- 涉及认证、支付或数据模型
有3个以上实施步骤
用户尚未进行对抗性审查

先运行跨模型审查。仅在退出码为0时才继续。

备注

- 工作空间持久化在 tasks/reviews/ 中——可后续引用
issues.json 追踪所有问题的完整生命周期
meta.json 存储模式、模型、当前轮次、裁决、是否需要修订标志
next-step 是状态机——始终调用它以确定下一步操作
去重警告有助于捕捉跨轮次的语义漂移
模型必须来自不同的提供商家族（跨提供商强制执行）
--project-context 被注入到审查者提示中以供校准

cross-model-review双模型对抗审查