CAID Multi-Agent Coordination

This skill implements the Centralized Asynchronous Isolated Delegation (CAID) paradigm for coordinating multiple agents working on shared artifacts.

⚠️ CRITICAL WARNINGS FROM PAPER:

- Use CAID from the outset — Don't run single-agent first as fallback. Sequential strategy costs nearly 2x with minimal gain.
Physical worktree isolation is mandatory — Soft isolation (instruction-only) degrades performance on complex tasks.
Engineer limits are strict — 2 for PaperBench-style, 4 for Commit0-style, never exceed 8.
Higher cost/runtime trade-off — CAID improves accuracy, not speed. Integration is sequential/test-gated.

Core Principles

1. Centralized Task Delegation — A manager agent decomposes tasks into dependency-aware units
Asynchronous Execution — Multiple engineer agents work concurrently
Isolated Workspaces — Each agent works in its own isolated branch/worktree
Structured Integration — Progress is merged via git commit/merge with test verification

When to Use This Skill

Use CAID from the outset for:

- Long-horizon tasks with multiple interdependent files
Clear dependency structure (imports, test mappings)
Parallelizable work exists
Integration can be verified by executable tests

Don't use as fallback: Running single-agent first then CAID is inefficient (cost/runtime nearly additive, minimal performance gain).

Use single-agent for:

- Isolated, single-file changes
No clear parallelization opportunities
Exploratory/research-oriented tasks

Coordination Workflow

0. Manager Pre-Setup (CRITICAL)

Before ANY delegation, the manager must:

1. Prepare runtime environment

- Ensure dependencies installed - Set up virtual environment

2. Organize entry points

- Create main entry files - Ensure import paths work

3. Add minimal function stubs

- Empty function definitions so imports don't fail - Type signatures if available

4. Commit to main branch

- All engineer branches created from consistent base - Without this, engineers start from divergent states

CODEBLOCK0

1. Task Analysis & Dependency Graph Creation

Manager's role: Before delegating, analyze the task structure:

- Identify atomic units of work (files, functions, modules)
Build a dependency graph: G=(V,E) where edges indicate dependencies
Define Ready(v) ⇔ all dependencies of v are completed
Only delegate tasks that are Ready (all dependencies satisfied)

Commit0-style tasks (clear file structure):

1. Check import statements to identify file-level dependencies
Collect executable test cases from repository
Examine which files tests exercise
Identify components to implement earlier (upstream dependencies)
Delegate at file level first — only split to function level if file has many unimplemented functions

PaperBench-style tasks (inferred structure):

1. Read paper to identify main contribution
Infer implementation order from contribution
Use max 2 engineers — manager task is harder, more agents destabilize

Dependency graph construction:
CODEBLOCK1

2. Workspace Isolation Setup

Create PHYSICALLY isolated worktrees (not soft isolation):

CODEBLOCK2

⚠️ WARNING: Soft isolation (same workspace, instruction-level constraints) degrades performance to below single-agent on PaperBench. Physical git worktree isolation is mandatory.

Key isolation principles:

- Each engineer operates in its own git worktree (physical filesystem isolation)
All worktrees are derived from the main branch
Engineers modify files only within their assigned workspace
Restricted files (shared across engineers): __init__.py, config files, global constants — engineers must NOT commit changes to these

3. Dependency-Aware Task Delegation

STRICT Engineer Limits:

Task Type	Max Engineers	Why
PaperBench-style	2	Inferred dependencies; more destabilizes
Commit0-style

⚠️ Critical: Increasing engineers beyond optimal degrades performance due to integration overhead and conflict resolution costs.

Task prioritization heuristics:
Manager should prioritize tasks that:

1. Enable earlier test execution (expose evaluation signals sooner)
Lie closer to upstream of dependency chain
Are simpler functions before complex ones

Round definition:

One round = complete cycle of delegation → implementation → dependency update

Recommended iteration limits (from paper experiments):

Role	Max Iterations
Manager	50
Each Engineer

80 |
| Total Rounds | ~22 (varies by task) |

Delegation algorithm:

CODEBLOCK3

Task assignment JSON format (structured communication — NO free-form dialog):

CODEBLOCK4

Key: All communication uses structured JSON, not free-form dialog. This prevents inter-agent misalignment (primary failure mode in multi-agent systems).

4. Asynchronous Execution Loop

Event loop pattern:

1. Delegate → Manager assigns tasks to available engineers
Execute → Engineers work concurrently in isolated worktrees
Self-Verify → Engineer runs tests, fixes failures
Complete → Engineer submits commit when ALL tests pass
Integrate → Manager attempts merge to main
Conflict Resolution (if needed) → Responsible engineer resolves
Update → Manager updates dependency graph
Repeat → Continue until all tasks complete or limits reached

Engineer self-verification (MANDATORY before submission):

- Run relevant tests that import/reference modified files
If no explicit mapping, run repository's default test command
Any failed test or runtime exception MUST be resolved
Use concrete error logs and tracebacks for iterative refinement
Only submit commit after ALL tests pass

5. Integration via Merge

Merge workflow:

CODEBLOCK5

Main branch is single source of truth throughout execution.

6. Context Management for Manager

To prevent context explosion, manager uses LLMSummarizingCondenser pattern:

CODEBLOCK6

Compressed execution history format:
CODEBLOCK7

7. Worktree Synchronization & Cleanup

State synchronization when main advances:

CODEBLOCK8

Worktree cleanup (after completion or limit reached):

CODEBLOCK9

Worktrees are deleted after all assigned tasks are completed or when the engineer reaches the predefined iteration limit.

8. Termination Conditions

- Success: All units completed and integrated into main
Failure: Maximum rounds/iterations reached with unresolved tasks
Incomplete: Task considered incomplete if any units remain unresolved

Manager iteration limits (from paper):

- Manager: INLINECODE2
Each engineer: INLINECODE3
Total rounds: ~22 (varies by task)

9. Manager Final Review

After the asynchronous loop completes, the manager does a final review before submitting the final product.

Final review checklist:

1. Verify all tasks from dependency graph are completed
Run full test suite: INLINECODE4
Check integration completeness (all commits merged)
Review any unresolved errors or warnings
Validate final state matches expected outcome
Submit final product only after verification

CODEBLOCK10

Implementation Guidelines

Using OpenClaw Sub-agents

For OpenClaw, the sessions_spawn tool enables parallel agent execution:

Spawn engineer agents:

CODEBLOCK11

Check progress:

CODEBLOCK12

Worktree Synchronization

When main advances, update worktrees:

CODEBLOCK13

This ensures engineers work from latest integrated state.

Verification Intensity vs Efficiency Trade-off

From paper analysis (Section 4.4):

Strategy	Pass Rate	Runtime	When to Use
Round-Manager Review	60.2%	3689s	Maximum correctness required
Engineer Self-Verification

55.1% | 2244s | Default - balanced |
| Efficiency-Prioritized | 54.0% | 1909s | Time-critical, acceptable risk |

Default: Engineer self-verification without repeated manager review.

Common Pitfalls & Solutions

Pitfall	Solution
Using CAID as fallback after single-agent fails	Use from outset; sequential costs ~2x with minimal gain
Soft isolation (instruction-only)

Cost/Runtime Expectations

CAID trade-offs (vs single-agent):

- Higher API cost — Multiple agents = more LLM calls
Similar or longer wall-clock time — Integration is sequential/test-gated
Substantially higher accuracy — +26.7% PaperBench, +14.3% Commit0

When worth it: Long-horizon shared-artifact tasks where correctness matters more than speed.

Example Workflows

See references/examples.md for concrete implementation examples including:

- Commit0-style library implementation
PaperBench-style paper reproduction
Bug fixing (single-file vs multi-file)
Feature addition with API and frontend

References

- Paper: "Effective Strategies for Asynchronous Software Engineering Agents" (arXiv:2603.21489v1)
GitHub: https://github.com/JiayiGeng/async-swe-agents
Built on OpenHands agent SDK principles

CAID 多智能体协调

该技能实现了集中式异步隔离委派（CAID）范式，用于协调多个智能体在共享工件上协同工作。

⚠️ 论文中的关键警告：

- 从一开始就使用CAID — 不要先运行单智能体作为回退方案。顺序策略的成本几乎是两倍，且收益微乎其微。
物理工作树隔离是强制性的 — 软隔离（仅指令层面）会降低复杂任务的性能。
工程师数量有严格限制 — PaperBench风格为2个，Commit0风格为4个，切勿超过8个。
更高的成本/运行时间权衡 — CAID提高的是准确性，而非速度。集成是顺序/测试门控的。

核心原则

1. 集中式任务委派 — 管理智能体将任务分解为具有依赖关系的单元
异步执行 — 多个工程师智能体并发工作
隔离的工作空间 — 每个智能体在其独立的隔离分支/工作树中工作
结构化集成 — 通过git提交/合并及测试验证来合并进度

何时使用此技能

从一开始就使用CAID的场景：

- 涉及多个相互依赖文件的长期任务
存在清晰的依赖结构（导入、测试映射）
存在可并行化的工作
集成可通过可执行测试进行验证

不要作为回退方案： 先运行单智能体再使用CAID效率低下（成本/运行时间几乎叠加，性能提升极小）。

使用单智能体的场景：

- 孤立的单文件变更
没有明确的并行化机会
探索性/研究导向的任务

协调工作流

0. 管理智能体预设置（关键）

在任何委派之前，管理智能体必须：

1. 准备运行时环境

- 确保依赖项已安装 - 设置虚拟环境

2. 组织入口点

- 创建主入口文件 - 确保导入路径正常工作

3. 添加最小函数桩

- 空的函数定义，确保导入不会失败 - 如果有的话，添加类型签名

4. 提交到主分支

- 所有工程师分支从一致的基线创建 - 否则，工程师将从不同的状态开始

bash

预设置提交

git add .
git commit -m setup: initial stubs and entry points
git push origin main

1. 任务分析与依赖图创建

管理智能体的角色： 在委派之前，分析任务结构：

- 识别原子工作单元（文件、函数、模块）
构建依赖图：G=(V,E)，其中边表示依赖关系
定义 Ready(v) ⇔ v 的所有依赖项已完成
仅委派处于 Ready 状态的任务（所有依赖项已满足）

Commit0风格的任务（清晰的文件结构）：

1. 检查导入语句以识别文件级依赖关系
从仓库中收集可执行的测试用例
检查测试覆盖了哪些文件
识别需要提前实现的组件（上游依赖项）
首先在文件级别委派 — 仅当文件包含许多未实现函数时才拆分到函数级别

PaperBench风格的任务（推断结构）：

1. 阅读论文以识别主要贡献
根据贡献推断实现顺序
最多使用2个工程师 — 管理智能体的任务更困难，更多智能体会导致不稳定

依赖图构建：

Readyt(vj) ⇔ ∀(vi, vj) ∈ E, vi ∈ Completedt

仅委派来自 {v ∈ V | Ready_t(v)} 的任务

2. 工作空间隔离设置

创建物理隔离的工作树（非软隔离）：

bash

主分支是唯一真实来源

git worktree add ../workspace-engineer-1
git worktree add ../workspace-engineer-2

等等

⚠️ 警告： 软隔离（同一工作空间，指令级约束）在PaperBench上的性能会降至低于单智能体。物理 git worktree 隔离是强制性的。

关键隔离原则：

- 每个工程师在其自己的git工作树中操作（物理文件系统隔离）
所有工作树都源自主分支
工程师仅在其分配的工作空间内修改文件
受限文件（工程师间共享）：init.py、配置文件、全局常量 — 工程师不得提交对这些文件的更改

3. 依赖感知的任务委派

严格的工程师数量限制：

任务类型	最大工程师数	原因
PaperBench风格	2	依赖关系是推断的；更多会导致不稳定
Commit0风格

⚠️ 关键： 增加工程师数量超过最优值会因集成开销和冲突解决成本而降低性能。

任务优先级启发式规则：
管理智能体应优先处理以下任务：

1. 能够更早执行测试（更早暴露评估信号）
位于依赖链上游更近的位置
先处理简单函数，再处理复杂函数

轮次定义：

一轮 = 委派 → 实现 → 依赖更新 的完整周期

推荐的迭代限制（来自论文实验）：

角色	最大迭代次数
管理智能体	50
每个工程师

80 |
| 总轮次 | ~22（因任务而异） |

委派算法：

在第t轮：

1. ReadySet = {v ∈ V | Readyt(v)} // 所有依赖项已满足
从Ready_Set中选择最多N个任务（N = 上述最大并行工程师数）
应用优先级启发式规则
委派给可用的工程师
等待完成信号
每次成功集成后更新依赖状态

任务分配JSON格式（结构化通信 — 无自由格式对话）：

json
{
task_id: string,
task_description: string,
target_files: [path/to/file.py],
targetfunctions: [functionname],
dependencies: [taskid1, taskid2],
expected_outcome: description of success criteria,
verificationcommand: pytest tests/testfile.py -v,
restricted_files: [src/init.py, src/config.py],
priority: high|medium|low
}

关键： 所有通信使用结构化JSON，而非自由格式对话。这可以防止智能体间的不一致（多智能体系统中的主要故障模式）。

4. 异步执行循环

事件循环模式：

1. 委派 → 管理智能体将任务分配给可用的工程师
执行 → 工程师在隔离的工作树中并发工作
自我验证 → 工程师运行测试，修复失败
完成 → 当所有测试通过时，工程师提交
集成 → 管理智能体尝试合并到主分支
冲突解决（如有需要）→ 负责的工程师解决
更新 → 管理智能体更新依赖图
重复 → 继续直到所有任务完成或达到限制

工程师自我验证（提交前强制要求）：

- 运行导入/引用已修改文件的相关测试
如果没有显式映射，则运行仓库的默认测试命令
任何失败的测试或运行时异常必须解决
使用具体的错误日志和回溯进行迭代优化
仅在所有测试通过后提交

5. 通过合并进行集成

合并工作流：

bash

管理智能体尝试合并

git checkout main
git merge

如果冲突：

1. 产生冲突提交的工程师负责解决

2. 工程师拉取最新的主分支：git pull origin main

3. 在本地解决冲突

4. 重新运行测试以确保解决没有破坏任何内容

5. 重新提交

6. 管理智能体重试合并

主分支是整个执行过程中的唯一真实来源。

6. 管理智能体的上下文管理

为防止上下文爆炸，管理智能体使用LLMSummarizingCondenser模式：

定期：

1. 总结之前的交互轮次
保留结构化工件：

- 依赖图（当前状态）
- 已完成的任务（带提交哈希）
- 未解决的错误（带回溯摘要）

3. 丢弃详细的对话历史
保持执行可追溯性而不膨胀

压缩的执行历史格式：

caid-multi-agentCAID多智能体

caid-multi-agent

CAID Multi-Agent Coordination

Core Principles

When to Use This Skill

Coordination Workflow

0. Manager Pre-Setup (CRITICAL)

1. Task Analysis & Dependency Graph Creation

2. Workspace Isolation Setup

3. Dependency-Aware Task Delegation

4. Asynchronous Execution Loop

5. Integration via Merge

6. Context Management for Manager

7. Worktree Synchronization & Cleanup

8. Termination Conditions

9. Manager Final Review

Implementation Guidelines

Using OpenClaw Sub-agents

Worktree Synchronization

Verification Intensity vs Efficiency Trade-off

Common Pitfalls & Solutions

Cost/Runtime Expectations

Example Workflows

References

CAID 多智能体协调

核心原则

何时使用此技能

协调工作流

0. 管理智能体预设置（关键）

预设置提交

1. 任务分析与依赖图创建

2. 工作空间隔离设置

主分支是唯一真实来源

等等

3. 依赖感知的任务委派

4. 异步执行循环

5. 通过合并进行集成

管理智能体尝试合并

如果冲突：

1. 产生冲突提交的工程师负责解决

2. 工程师拉取最新的主分支：git pull origin main

3. 在本地解决冲突

4. 重新运行测试以确保解决没有破坏任何内容

5. 重新提交

6. 管理智能体重试合并

6. 管理智能体的上下文管理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement