Adversarial Code Review
Overview
A multi-agent review pattern where one agent builds (or authors), a second agent critiques the code, and a third agent critiques the review itself. This layered adversarial approach filters out low-value nitpicks and surfaces only high-confidence, high-priority issues that deserve human attention.
Core principle: Fewer, higher-quality review comments build trust. Filter ruthlessly to high confidence + high priority only. Target roughly two comments per PR.
Dependency: Claude Code CLI with --append-system-prompt support. Optionally, a different model for the review pass than the one used for writing.
When to Use
- - Reviewing pull requests before merge, especially when review quality matters more than speed
- As a CI-integrated automated reviewer that developers actually read instead of ignore
- When existing automated reviews produce too much noise and developers have stopped trusting them
- During versioned critique cycles where a plan or design needs iterative refinement
When NOT to Use
- - Trivial PRs (typo fixes, dependency bumps, single-line config changes)
- When you need instant feedback during live pairing sessions (too slow for interactive use)
- As a replacement for human review on security-critical or compliance-gated changes
Common Mistakes
| Mistake | Why it's wrong |
|---|
| Surfacing every finding to the developer | Noise kills trust. Developers stop reading reviews that cry wolf. Filter to ~2 high-priority, high-confidence comments per PR. |
| Using the same model for writing and reviewing |
The model is biased toward its own patterns. Use a different model for review than the one that wrote the code — it catches different classes of issues. |
| Skipping the meta-reviewer (third agent) | Without a check on the reviewer, you get false positives and nitpicks dressed up as critical findings. The meta-reviewer filters the reviewer's output. |
| Running adversarial review without priming the critic | A neutral prompt produces polite, hedging reviews. Tell the reviewer the code likely contains bugs to prime it for genuine criticality. |
| Treating all review comments as equal priority | Without confidence and priority scoring, developers cannot triage. Every comment must carry explicit confidence (high/medium/low) and priority (high/medium/low). |
The Three-Agent Adversarial Pattern
CODEBLOCK0
Step 1: Set up the builder context
Agent 1 is the code author (or a proxy that understands the change). It produces:
- - The diff itself
- A summary of intent ("what this PR is trying to do")
- Related context (linked issues, design docs, test plan)
If you are reviewing someone else's PR, have an agent read the PR description, diff, and linked issues to reconstruct this context.
Step 2: Run the adversarial reviewer (Agent 2)
Spawn a sub-agent with a system prompt that primes it for critical review. The key trick: tell the reviewer the code likely contains bugs.
CODEBLOCK1
Model selection: If the code was written by Claude, use a different model for review. Cross-model review catches different failure modes.
Step 3: Run the meta-reviewer (Agent 3)
The meta-reviewer receives Agent 2's comments and filters them. Its job is to reject false positives, remove nitpicks that escaped the confidence filter, and validate that remaining comments are actionable.
CODEBLOCK2
Step 4: Format and deliver the filtered output
The final output should contain only the surviving comments, each with:
- - Location: File and line range
- Issue: One-sentence description
- Why it matters: Impact if not fixed
- Suggested fix: Concrete code change
- Confidence: high
- Priority: high
CI Integration with --append-system-prompt
To run this as an automated CI reviewer, add a workflow step that pipes the diff through the three-agent chain:
CODEBLOCK3
The --append-system-prompt flag is what makes this work in CI: it lets you inject the adversarial priming without modifying the user message, keeping the diff clean as the primary input.
Versioned Critique Cycle (For Plans and Designs)
For longer-form work like architecture plans, use a versioned critique loop:
- 1. planv1: Initial plan authored by Agent 1
- critiqueopusv1: Critique from one model (e.g., Claude)
- critiquegptv1: Critique from a different model (e.g., GPT) for cross-model coverage
- revise: Author incorporates valid critiques
- planv2: Revised plan, repeat if needed
This ensures the plan survives scrutiny from multiple perspectives before implementation begins. Each critique version is saved so you can trace which feedback was incorporated and which was rejected (and why).
Quick Reference
| Item | Details |
|---|
| Agent 1 (Builder) | Produces diff, intent summary, and context |
| Agent 2 (Reviewer) |
Adversarially primed critic, scores by confidence + priority |
| Agent 3 (Meta-Reviewer) | Filters reviewer output, rejects false positives |
| Target output | ~2 high-confidence, high-priority comments per PR |
| CI flag |
--append-system-prompt for injecting adversarial priming |
| Model strategy | Use a different model for review than for writing |
| Priming trick | "AI agent likely introduced bugs — find them" |
| Versioned cycle | plan
v1 -> critiquev1 -> revise -> plan_v2 |
Key Principles
- 1. Fewer comments build more trust. A reviewer that posts two real bugs gets read. A reviewer that posts twenty nitpicks gets ignored. Filter to ~2 high-priority, high-confidence findings.
- Cross-model review catches what self-review misses. A model is biased toward its own idioms. Use a different model for review than the one that wrote the code.
- Prime the critic for genuine adversarial behavior. Telling the reviewer "an AI agent likely introduced bugs, find them" produces dramatically more critical and useful reviews than a neutral prompt.
- The meta-reviewer is non-negotiable. Without a third agent checking the reviewer, false positives leak through and erode developer trust in the entire system.
- Every comment must be actionable. If the developer cannot immediately understand what to fix and why, the comment has failed. Include file, line, impact, and concrete fix.
Attribution
Based on techniques from the Coding Agents: AI Driven Dev Conference. Sid (Anthropic) demonstrated the three-agent adversarial pattern and CI integration with --append-system-prompt. Ankit (Databricks) contributed the cross-model review strategy. Demetrios introduced the priming trick of telling the reviewer that bugs were likely introduced. Chad described the versioned critique cycle for iterative plan refinement.
对抗性代码审查
概述
一种多智能体审查模式,其中一个智能体构建(或编写)代码,第二个智能体对代码进行评审,第三个智能体对评审本身进行评审。这种分层对抗性方法能够过滤掉低价值的吹毛求疵,仅呈现值得人工关注的高置信度、高优先级问题。
核心原则: 更少但更高质量的评审意见能建立信任。严格筛选,仅保留高置信度+高优先级的问题。目标为每个PR大约两条意见。
依赖条件: 支持--append-system-prompt参数的Claude Code CLI。可选地,评审环节使用与编写环节不同的模型。
适用场景
- - 在合并前审查拉取请求,尤其是当审查质量比速度更重要时
- 作为CI集成的自动化审查工具,让开发者真正阅读而非忽略
- 当现有自动化审查产生过多噪音,开发者已不再信任时
- 在需要迭代优化计划或设计的版本化评审周期中
不适用场景
- - 琐碎的PR(拼写修正、依赖升级、单行配置变更)
- 在实时结对编程中需要即时反馈时(对交互式使用来说太慢)
- 作为安全关键或合规性变更的人工审查替代方案
常见错误
| 错误 | 错误原因 |
|---|
| 将所有发现都呈现给开发者 | 噪音会扼杀信任。开发者会停止阅读狼来了的审查意见。每个PR筛选至约2条高优先级、高置信度的意见。 |
| 使用相同模型进行编写和审查 |
模型会偏向自己的模式。使用与编写代码不同的模型进行审查——它能发现不同类别的问题。 |
| 跳过元审查员(第三个智能体) | 没有对审查员的检查,你会得到误报和伪装成关键发现的吹毛求疵。元审查员负责过滤审查员的输出。 |
| 在没有引导的情况下运行对抗性审查 | 中立的提示会产生礼貌、含糊的审查意见。告诉审查员代码可能包含bug,以引导其进行真正的批判性审查。 |
| 将所有审查意见视为同等优先级 | 没有置信度和优先级评分,开发者无法进行分诊。每条意见必须带有明确的置信度(高/中/低)和优先级(高/中/低)。 |
三智能体对抗性模式
dot
digraph adversarial {
代码变更 (PR) -> 智能体 1: 构建者;
智能体 1: 构建者 -> 智能体 2: 审查员 [label=代码 + 上下文];
智能体 2: 审查员 -> 智能体 3: 元审查员 [label=审查意见];
智能体 3: 元审查员 -> 过滤后的输出 [label=约2条高信号意见];
智能体 3: 元审查员 -> 智能体 2: 审查员 [label=被拒绝的意见 style=dashed];
}
步骤1:设置构建者上下文
智能体1是代码作者(或理解变更的代理)。它生成:
- - 差异本身
- 意图摘要(此PR试图实现什么)
- 相关上下文(关联的问题、设计文档、测试计划)
如果你在审查他人的PR,让一个智能体读取PR描述、差异和关联的问题来重建此上下文。
步骤2:运行对抗性审查员(智能体2)
生成一个子智能体,其系统提示引导其进行批判性审查。关键技巧:告诉审查员代码可能包含bug。
bash
claude --print --append-system-prompt 你是一位高级工程师,正在审查由AI编码智能体刚刚生成的代码。该智能体可能引入了微妙的bug、安全问题或逻辑错误。你的任务是找出它们。不要客气。不要含糊其辞。对于你发现的每个问题,请分配:
- - 置信度:高 | 中 | 低
- 优先级:高 | 中 | 低
- 类别:bug | 安全 | 性能 | 逻辑 | 风格
仅报告置信度为中或更高的问题。 \
-p 审查此PR差异中的bug和问题:
$(git diff main...HEAD)
模型选择: 如果代码由Claude编写,请使用不同的模型进行审查。跨模型审查能发现不同的故障模式。
步骤3:运行元审查员(智能体3)
元审查员接收智能体2的评论并对其进行过滤。其任务是拒绝误报,移除逃过置信度过滤的吹毛求疵,并验证剩余评论是否可操作。
bash
claude --print --append-system-prompt 你正在审查一份代码审查。审查员可能产生了误报、伪装成bug的吹毛求疵或低价值评论。你的任务是严格过滤。仅保留满足以下条件的评论:
- 1. 真正高置信度的问题(非推测性)
- 高优先级(会导致bug、安全漏洞或数据丢失)
- 可操作(开发者确切知道要修复什么)
拒绝其他所有内容。目标输出:最多1-3条评论。如果代码没问题,请如实说明。 \
-p 以下是需要评估的代码审查:
$REVIEWER_OUTPUT
以及供参考的原始差异:
$(git diff main...HEAD)
步骤4:格式化并交付过滤后的输出
最终输出应仅包含保留下来的评论,每条评论包含:
- - 位置:文件和行范围
- 问题:一句话描述
- 重要性:如果不修复的影响
- 建议修复:具体的代码变更
- 置信度:高
- 优先级:高
使用--append-system-prompt的CI集成
要将其作为自动化CI审查工具运行,添加一个工作流步骤,将差异通过三智能体链传递:
yaml
.github/workflows/adversarial-review.yml
name: 对抗性代码审查
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: 运行对抗性审查
env:
ANTHROPICAPIKEY: ${{ secrets.ANTHROPICAPIKEY }}
run: |
DIFF=$(git diff origin/main...HEAD)
# 智能体2:对抗性审查员
REVIEW=$(claude --print --append-system-prompt \
你正在审查AI智能体刚刚编写的代码。它可能引入了bug。找出它们。对于每个问题,提供置信度(高/中/低)、优先级(高/中/低)、文件、行范围以及具体的修复方案。仅报告中+置信度的问题。 \
-p 审查此差异:\n\n$DIFF)
# 智能体3:元审查员过滤
FILTERED=$(claude --print --append-system-prompt \
将此代码审查过滤为仅保留高置信度、高优先级的问题。最多3条评论。拒绝吹毛求疵和误报。 \
-p 需要过滤的审查:\n\n$REVIEW\n\n原始差异:\n\n$DIFF)
# 作为PR评论发布
echo $FILTERED > review.md
gh pr comment ${{ github.event.pull_request.number }} --body-file review.md
--append-system-prompt标志是实现CI集成的关键:它允许你注入对抗性引导而不修改用户消息,保持差异作为主要输入的清洁性。
版本化评审周期(用于计划和设计)
对于架构计划等较长形式的工作,使用版本化评审循环:
- 1. planv1:智能体1编写的初始计划
- critiqueopusv1:来自一个模型(如Claude)的评审
- critiquegptv1:来自不同模型(如GPT)的评审,实现跨模型覆盖
- revise:作者整合有效的评审意见
- planv2:修订后的计划,必要时重复
这确保计划在实施前经受住来自多个角度的审视。每个评审版本都会被保存,以便追踪哪些反馈被采纳、哪些被拒绝(以及原因)。
快速参考
| 项目 | 详情 |
|---|
| 智能体1(构建者) | 生成差异、意图摘要和上下文 |
| 智能体2(审查员) |
对抗性引导的评论员,按置信度+优先级评分 |
| 智能体3(元审查员) | 过滤审查员输出,拒绝误报 |
| 目标输出 | 每个PR约2条高置信度、高优先级评论 |
| CI标志 | --append-system-prompt用于注入对抗性引导 |
| 模型策略 | 使用与编写不同的模型进行审查 |
| 引导技巧 | AI智能体可能引入了bug——找出它们 |
| 版本化周期 | plan
v1 -> critiquev1 -> revise -> plan_v2 |
关键原则
- 1. 更少的评论建立更多的信任。 发布两个真正bug的审查员会被阅读。发布二十条吹毛求疵的审查