Council v2
A hardened OpenClaw skill for multi-model council reviews.
It dispatches independent reviewers, collects structured JSON, and applies a
mechanical synthesis protocol so the final verdict is driven by votes and
critical findings — not orchestrator vibes.
Primary entrypoint: INLINECODE0
When to Use
Use when a single model reviewing its own work is not enough:
- - Code review before merge or deployment
- Plan review before committing resources
- Architecture review for important technical decisions
- Decision review when multiple plausible options exist
- Security-sensitive or irreversible choices
- Pre-flight review, adversarial critique, or second-opinion work
When Not to Use
Do not use for:
- - One-line fixes or trivial edits
- Low-stakes decisions where overhead exceeds risk
- Purely factual lookups with no judgment call
- Work already reviewed recently with no material change
Council Shape
Two tiers are supported:
- - Standard — 3 reviewers for routine code, plan, and decision reviews
- Full — 5 reviewers for high-stakes, security-sensitive, or irreversible choices
Tier selection heuristic
Use Standard when: routine code changes, internal plans, reversible decisions,
low blast radius. Use Full when: security-critical, production-facing architecture,
irreversible commitments, high cost of being wrong, or when you want maximum coverage.
When in doubt, start Standard. Escalate to Full if the Standard result is split or
if critical findings surface that need more perspectives.
Cost note
Full Council runs 5 model calls instead of 3. That is ~1.7x the token cost of Standard.
Use Full when the cost of a bad decision exceeds the cost of the extra API calls —
which for security, architecture, and irreversible choices, it almost always does.
Detailed role composition and synthesis rules live in:
- - INLINECODE1
- INLINECODE2
- INLINECODE3
Review Types
| Type | Typical use |
|---|
| INLINECODE4 | Source files, scripts, patches, PR diffs |
| INLINECODE5 |
Proposals, project plans, rollout plans |
|
architecture | Systems design, infra decisions, workflows |
|
decision | A/B/C choices with tradeoffs |
Definitions: INLINECODE8
Quick Start
CODEBLOCK0
How It Works
- 1. Loads content from file or stdin
- Selects Standard or Full tier
- Builds reviewer prompts from INLINECODE9
- Emits an orchestration plan suitable for INLINECODE10
- Collects reviewer JSON outputs
- Runs INLINECODE11
- Returns synthesis with mechanical result, minority report, and conditions
Interpreting Results
The synthesizer returns structured JSON and a meaningful exit code:
| Exit code | Meaning | What to do |
|---|
| INLINECODE12 | Approve — clear majority, no criticals | Ship it |
| INLINECODE13 |
Reject or Blocked — majority rejected or a critical finding blocked | Address the critical findings or rethink the approach |
|
2 |
Approve with conditions — mixed or conditional majority | Fix the flagged conditions, then re-review or proceed with documented risk |
|
3 |
Error — invalid input or synthesis failure | Check reviewer JSON for malformed output; see error handling below |
Reading the synthesis output
- - mechanicalresult: The vote-driven verdict. This is the answer.
- criticalblocks: Any critical findings that auto-blocked approval. Address these first.
- conditions: Aggregated recommendations from warning-level findings. These are your fix list.
- minorityreport: The strongest dissent from the majority. Read this even if you agree with the majority — it is often where the best insight lives.
- anticonsensus_check: Fires on unanimous decisions. Treat the counterargument seriously.
Error Handling
Reviewer returns invalid JSON
INLINECODE16 validates every reviewer output against required fields. If a reviewer
returns malformed JSON, synthesis exits with code 3 and prints an error message.
What to do:
- 1. Check the raw reviewer output for the failing model
- Re-run that single reviewer (the orchestration plan shows which models to dispatch)
- If the model consistently fails, substitute it — see model override flags below
Provider is down or times out
If a provider fails to respond, the review set will be incomplete. Run synthesis on
whatever outputs you have — a 2-of-3 Standard review is still useful. Note the missing
reviewer in your assessment.
Model override flags
Override any model at the command line:
CODEBLOCK1
Available flags: --opus, --gpt, --grok, --deepseek, INLINECODE21
Model Diversity
The council's value comes from different providers with different training data and
different biases reviewing the same decision. The specific model versions (Opus,
GPT-5.4, Grok 4, etc.) matter less than the diversity. Swap in whatever top-tier
models you have access to — what matters is that they are not all from the same
provider.
Retrospectives
INLINECODE22 generates a structured retrospective template for reviewing past
council decisions against actual outcomes.
CODEBLOCK2
When to run retros
Run monthly, or after any decision where the outcome surprised you. The retro surfaces:
- - Which reviewers provided signal vs. noise
- Whether critical findings were real or false alarms
- Whether synthesis preserved minority views accurately
- Prompt changes to consider for role-prompts.md
Feed retro findings back into references/role-prompts.md to calibrate the council.
Notes
- - Requires
bash, python3, and OpenClaw reviewer dispatch capability - Model aliases can be overridden — see model override flags above
- Synthesis rules are documented in INLINECODE26
References
- -
references/review-types.md — review type definitions and tier recommendations - INLINECODE28 — reviewer role prompts and shared output instructions
- INLINECODE29 — JSON schemas for reviewer output and synthesis output
- INLINECODE30 — mechanical synthesis protocol and edge cases
Council v2
一个强化版的OpenClaw技能,用于多模型委员会评审。
它派遣独立评审员,收集结构化JSON,并应用
机械合成协议,使最终裁决由投票和
关键发现驱动——而非编排者的主观判断。
主要入口点:bash skills/council-v2/scripts/council.sh review [file]
使用时机
当单个模型自我审查不足时使用:
- - 合并或部署前的代码审查
- 投入资源前的计划审查
- 重要技术决策的架构审查
- 存在多个合理选项时的决策审查
- 安全敏感或不可逆的选择
- 预检审查、对抗性批评或寻求第二意见
不适用场景
不要用于:
- - 单行修复或琐碎编辑
- 低风险决策,开销超过风险
- 纯事实查询,无需判断
- 近期已审查且无实质性变更的工作
委员会规模
支持两个层级:
- - 标准 — 3位评审员,用于常规代码、计划和决策审查
- 完整 — 5位评审员,用于高风险、安全敏感或不可逆的选择
层级选择启发式
使用标准层级:常规代码变更、内部计划、可逆决策、
影响范围小。使用完整层级:安全关键、面向生产的架构、
不可逆承诺、错误成本高,或需要最大覆盖范围。
如有疑问,从标准层级开始。如果标准结果出现分歧,
或出现需要更多视角的关键发现,则升级到完整层级。
成本说明
完整委员会运行5次模型调用而非3次。令牌成本约为标准的1.7倍。
当错误决策的成本超过额外API调用的成本时使用完整层级——
对于安全、架构和不可逆选择,这几乎总是成立的。
详细的角色组成和合成规则位于:
- - references/review-types.md
- references/role-prompts.md
- references/synthesis-rules.md
审查类型
| 类型 | 典型用途 |
|---|
| code | 源文件、脚本、补丁、PR差异 |
| plan |
提案、项目计划、部署计划 |
| architecture | 系统设计、基础设施决策、工作流程 |
| decision | 带有权衡的A/B/C选择 |
定义:references/review-types.md
快速开始
bash
标准代码审查
bash skills/council-v2/scripts/council.sh review code src/auth.py
强制完整计划审查
bash skills/council-v2/scripts/council.sh review plan proposal.md --tier full
从标准输入进行架构审查
cat design.md | bash skills/council-v2/scripts/council.sh review architecture --tier full
带选项的决策审查
bash skills/council-v2/scripts/council.sh review decision options.md --options SQLite,Postgres,Cloud SQL
以JSON格式输出编排计划
bash skills/council-v2/scripts/council.sh review code src/auth.py --format json
工作原理
- 1. 从文件或标准输入加载内容
- 选择标准或完整层级
- 从references/role-prompts.md构建评审员提示
- 输出适合sessions_spawn的编排计划
- 收集评审员JSON输出
- 运行python3 scripts/synthesize.py ...
- 返回包含机械结果、少数派报告和条件的合成结果
结果解读
合成器返回结构化JSON和有意义的退出代码:
| 退出代码 | 含义 | 操作 |
|---|
| 0 | 批准 — 明确多数,无关键问题 | 发布 |
| 1 |
拒绝或阻止 — 多数拒绝或关键发现阻止 | 处理关键发现或重新思考方案 |
| 2 |
有条件批准 — 混合或有条件多数 | 修复标记的条件,然后重新审查或记录风险后继续 |
| 3 |
错误 — 无效输入或合成失败 | 检查评审员JSON格式错误;参见下方错误处理 |
阅读合成输出
- - mechanicalresult: 投票驱动的裁决。这是答案。
- criticalblocks: 自动阻止批准的任何关键发现。优先处理这些。
- conditions: 来自警告级别发现的汇总建议。这是你的修复清单。
- minorityreport: 与多数意见最强烈的分歧。即使你同意多数意见也要阅读——这通常是最好见解所在。
- anticonsensus_check: 在一致决策时触发。认真对待反对意见。
错误处理
评审员返回无效JSON
synthesize.py根据必填字段验证每个评审员输出。如果评审员
返回格式错误的JSON,合成以代码3退出并打印错误消息。
操作步骤:
- 1. 检查失败模型的原始评审员输出
- 重新运行该单个评审员(编排计划显示要调度的模型)
- 如果模型持续失败,替换它——参见下方的模型覆盖标志
提供商宕机或超时
如果提供商未能响应,评审集将不完整。对已有的任何输出运行合成——
2/3的标准评审仍然有用。在你的评估中注明缺失的评审员。
模型覆盖标志
在命令行覆盖任何模型:
bash
bash skills/council-v2/scripts/council.sh review code src/auth.py \
--opus claude-sonnet-4 \
--gpt gpt-4.1 \
--grok grok-3
可用标志:--opus、--gpt、--grok、--deepseek、--gemini
模型多样性
委员会的价值来自不同提供商、不同训练数据和
不同偏见的模型审查同一决策。具体的模型版本
(Opus、GPT-5.4、Grok 4等)不如多样性重要。替换为
你能访问的任何顶级模型——重要的是它们并非来自同一提供商。
回顾
scripts/retro.sh生成结构化的回顾模板,用于对照实际结果
审查过去的委员会决策。
bash
审查目录中最近的5个决策
bash skills/council-v2/scripts/retro.sh ./council-outputs/ 5
何时运行回顾
每月运行,或在任何结果令你惊讶的决策之后运行。回顾揭示:
- - 哪些评审员提供了信号与噪声
- 关键发现是真实的还是误报
- 合成是否准确保留了少数派观点
- 考虑对role-prompts.md的提示更改
将回顾发现反馈到references/role-prompts.md以校准委员会。
注意事项
- - 需要bash、python3和OpenClaw评审员调度能力
- 模型别名可覆盖——参见上方的模型覆盖标志
- 合成规则记录在references/synthesis-rules.md中
参考资料
- - references/review-types.md — 审查类型定义和层级建议
- references/role-prompts.md — 评审员角色提示和共享输出说明
- references/schema.md — 评审员输出和合成输出的JSON模式
- references/synthesis-rules.md — 机械合成协议和边界情况