Subagent Sheepdog
Use this skill whenever work is:
- - delegated to a subagent or child session
- launched as a background process
- driven through browser automation
- long-running enough that progress reporting matters
- vulnerable to "ghost delegation" (claimed launch, but no real work underway)
Do not use this skill for tiny direct tasks where startup verification would add noise.
Purpose
This skill prevents a common failure mode in agent systems:
an agent launches something, gets back a session/process/tab handle, and falsely reports that work is running even though startup already failed or never truly began.
It enforces clear state transitions, truthful communication, and watchdog-style recovery behavior.
Core Rule
Do not claim work is running until startup is verified.
A session id, process id, browser tab, or tool handle alone is not proof that work actually started.
Treat the period after launch and before verification as:
Only after verification may the task become:
If startup fails before real execution begins, classify it as:
If work began and failed later, classify it as:
Task States
Use these states consistently:
- - INLINECODE4
- INLINECODE5
- INLINECODE6
- INLINECODE7
- INLINECODE8
- INLINECODE9
State meanings
launching_unverified
Launch was attempted, but startup health is not yet confirmed.
running
Startup was verified and meaningful execution is underway.
delayed
Work is still running, but later than expected.
failed_to_start
The launch path failed before real work began.
failed_after_work_started
Real work began, then later failed.
completed
Task finished successfully with real output or result.
Required Launch Sequence
Whenever this skill applies, follow this sequence:
- 1. Announce launch briefly
- Attempt launch
- Enter INLINECODE16
- Verify startup health
- If verified, mark INLINECODE17
- If not verified, mark INLINECODE18
- Retry only if the fix is obvious and mechanical
- Re-verify before claiming success
Verification Rules By Work Type
A. Subagent / child-session launches
Do not claim success unless you verify at least:
- - the spawn call succeeded
- the child session actually exists
- the child is active or otherwise accepted
- the runtime did not reject the launch immediately
Examples of failed_to_start here:
- - invalid spawn parameters
- unsupported runtime combination
- missing thread binding
- rejected session mode
- missing required agent/runtime field
B. Background exec / process launches
Do not claim success unless you verify at least:
- - the process exists
- the process did not immediately exit
- the working directory is valid
- the executable exists
- no immediate fatal startup error is present in the initial result/log
Examples of failed_to_start here:
- - INLINECODE21
- INLINECODE22
- permission denied at startup
- invalid working directory
- immediate non-zero exit
- rejected runtime host/sandbox mismatch
C. Browser-driven work
Do not claim browser work is underway unless you verify at least the relevant part of the path:
- - browser/session exists
- target page actually loaded
- expected page or UI element is present
- no login wall, interstitial, or error page blocked progress
Examples of failed_to_start here:
- - browser opened but target page failed to load
- redirected to login before task could begin
- error page prevented interaction
- required UI element never appeared
If the page loaded and meaningful interaction began before failure, use failed_after_work_started instead.
Heartbeat Behavior
Heartbeat is a watchdog, not the main progress loop.
Use heartbeat to:
- - surface overdue tasks
- detect stalled work
- detect ghost delegation
- catch startup failures that were not properly reported
Do not use heartbeat to:
- - aggressively poll every active task
- replace direct milestone updates
- substitute for proper launch verification
Ghost delegation
A ghost delegation is when work was described as launched, but:
- - no real worker exists
- startup failed immediately
- browser setup never reached a usable page
- the agent implied progress without verified execution
Heartbeat should surface that clearly instead of returning a normal all-clear.
Communication Rules
Prefer concise truth over reassuring fiction.
Good launch-phase language
- - “Launching now. Verifying startup.”
- “Launch attempted; checking whether the worker actually started.”
- “Startup is being verified before I call this running.”
Good failure language
- - “Startup failed. No work started. Cause: {reason}.”
- “The launch was rejected before execution began. No work started.”
- “The browser path failed before meaningful interaction began. No work started.”
Good running language
- - “Verified: work is actually running now.”
- “Startup is confirmed. First milestone ETA ~{time}.”
Good delayed language
- - “Work is still running, but slower than expected.”
- “The task is active but behind the original ETA.”
Good late failure language
- - “Work did start, but failed later. Cause: {reason}.”
Bad language
- - “Started successfully” before verification
- “Working on it” when only a handle exists
- “It’s analyzing now” before startup health is confirmed
- any ETA that implies verified execution before verification happened
Retry Policy
Retry only when the fix is obvious and mechanical.
Examples:
- - correcting a known launch parameter
- switching to the proper thread-bound mode
- fixing a wrong working directory
- switching to an available executable/runtime
If retrying:
- 1. report the original failure
- say you are retrying
- return to INLINECODE25
- verify again before claiming success
Do not loop forever.
Best-Next-Step Rule
When blocked, recommend one best next step instead of dumping a large menu unless the user explicitly asks for options.
Good:
- - “Best next step: switch to one-shot run mode because persistent thread binding is unavailable here.”
Less good:
- - giving 4–6 loosely ranked possibilities without guidance
Generic Example
Bad
“Background worker launched and running now.”
(But only a session id exists; no verification was done.)
Good
“Launch attempted; verifying startup.”
Then either:
- - “Verified: the worker is actually running now.”
or:
- - “Startup failed. No work started. Cause: missing thread binding for session mode.”
Browser Example
Bad
“Opened the browser and I’m working on the site now.”
(But the browser only opened a login wall.)
Good
“Browser launched, but the target workflow did not start. The page redirected to login before the task could begin. No work started.”
Completion Standard
A task should only be described as completed when there is evidence of a real result, such as:
- - output artifacts
- a verified state change
- a final summary tied to actual work performed
Summary
This skill teaches a simple discipline:
- - launch carefully
- verify startup
- report truthfully
- use heartbeat as backup
- distinguish failed launch from failed work
The result is better trust, clearer status reporting, and fewer ghost tasks.
Subagent Sheepdog
当工作满足以下条件时,请使用此技能:
- - 委托给子代理或子会话
- 作为后台进程启动
- 通过浏览器自动化驱动
- 运行时间较长,需要进度报告
- 容易发生幽灵委托(声称已启动,但实际未开始工作)
对于启动验证会增加干扰的微小直接任务,不要使用此技能。
目的
此技能可防止代理系统中的一种常见故障模式:
代理启动某个任务,获取会话/进程/标签页句柄,然后错误地报告工作正在运行,尽管启动已经失败或从未真正开始。
它强制执行清晰的状态转换、真实的通信以及看门狗式的恢复行为。
核心规则
在验证启动成功之前,不要声称工作正在运行。
会话ID、进程ID、浏览器标签页或工具句柄本身不能证明工作已实际开始。
将启动后到验证前的阶段视为:
只有在验证之后,任务才能变为:
如果在实际执行开始前启动失败,则归类为:
如果工作开始后失败,则归类为:
任务状态
一致使用以下状态:
- - 启动未验证
- 运行中
- 延迟
- 启动失败
- 工作开始后失败
- 已完成
状态含义
启动未验证
已尝试启动,但启动健康状态尚未确认。
运行中
启动已验证,有意义的执行正在进行。
延迟
工作仍在运行,但比预期晚。
启动失败
在实际工作开始前,启动路径失败。
工作开始后失败
实际工作已开始,但随后失败。
已完成
任务成功完成,产生实际输出或结果。
必需的启动序列
每当应用此技能时,请遵循以下序列:
- 1. 简要宣布启动
- 尝试启动
- 进入启动未验证
- 验证启动健康状态
- 如果验证通过,标记为运行中
- 如果未验证通过,标记为启动失败
- 仅当修复显而易见且机械性时重试
- 在声称成功前重新验证
按工作类型的验证规则
A. 子代理/子会话启动
除非至少验证以下内容,否则不要声称成功:
- - 生成调用成功
- 子会话实际存在
- 子会话处于活动状态或以其他方式被接受
- 运行时没有立即拒绝启动
此处启动失败的示例:
- - 无效的生成参数
- 不支持的运行时组合
- 缺少线程绑定
- 被拒绝的会话模式
- 缺少必需的代理/运行时字段
B. 后台执行/进程启动
除非至少验证以下内容,否则不要声称成功:
- - 进程存在
- 进程没有立即退出
- 工作目录有效
- 可执行文件存在
- 初始结果/日志中没有立即致命的启动错误
此处启动失败的示例:
- - 命令未找到
- 没有这样的文件或目录
- 启动时权限被拒绝
- 无效的工作目录
- 立即非零退出
- 运行时主机/沙箱不匹配被拒绝
C. 浏览器驱动的工作
除非至少验证路径的相关部分,否则不要声称浏览器工作正在进行:
- - 浏览器/会话存在
- 目标页面实际加载
- 预期的页面或UI元素存在
- 没有登录墙、插页或错误页面阻止进度
此处启动失败的示例:
- - 浏览器已打开但目标页面加载失败
- 在任务开始前被重定向到登录页面
- 错误页面阻止了交互
- 所需的UI元素从未出现
如果页面已加载且有意义的交互在失败前已开始,则使用工作开始后失败。
心跳行为
心跳是看门狗,不是主要的进度循环。
使用心跳来:
- - 发现超时任务
- 检测停滞的工作
- 检测幽灵委托
- 捕获未正确报告的启动失败
不要使用心跳来:
- - 积极轮询每个活动任务
- 替代直接的里程碑更新
- 替代正确的启动验证
幽灵委托
幽灵委托是指工作被描述为已启动,但:
- - 没有实际的工作者存在
- 启动立即失败
- 浏览器设置从未到达可用页面
- 代理暗示了进度但没有验证执行
心跳应清晰地将此暴露出来,而不是返回正常的一切正常。
通信规则
宁要简洁的真实,不要令人安心的虚构。
良好的启动阶段用语
- - 正在启动。正在验证启动。
- 已尝试启动;正在检查工作者是否实际启动。
- 在将其称为运行中之前,正在验证启动。
良好的失败用语
- - 启动失败。没有工作开始。原因:{原因}。
- 启动在执行开始前被拒绝。没有工作开始。
- 浏览器路径在有意义的交互开始前失败。没有工作开始。
良好的运行中用语
- - 已验证:工作现在实际正在运行。
- 启动已确认。第一个里程碑预计时间~{时间}。
良好的延迟用语
- - 工作仍在运行,但比预期慢。
- 任务处于活动状态,但落后于原始预计时间。
良好的后期失败用语
- - 工作确实开始了,但后来失败了。原因:{原因}。
不良用语
- - 在验证前说已成功启动
- 只有句柄存在时说正在处理
- 在启动健康状态确认前说正在分析中
- 在验证发生前暗示已验证执行的任何预计时间
重试策略
仅当修复显而易见且机械性时重试。
示例:
- - 纠正已知的启动参数
- 切换到适当的线程绑定模式
- 修复错误的工作目录
- 切换到可用的可执行文件/运行时
如果重试:
- 1. 报告原始失败
- 说明正在重试
- 返回启动未验证
- 在声称成功前重新验证
不要无限循环。
最佳下一步规则
当受阻时,推荐一个最佳下一步,而不是抛出一大堆选项,除非用户明确要求提供选项。
良好:
- - 最佳下一步:切换到一次性运行模式,因为此处无法使用持久线程绑定。
不太良好:
通用示例
不良
后台工作者已启动并正在运行。
(但只有一个会话ID存在;未进行任何验证。)
良好
已尝试启动;正在验证启动。
然后要么:
要么:
- - 启动失败。没有工作开始。原因:会话模式缺少线程绑定。
浏览器示例
不良
已打开浏览器,我现在正在处理网站。
(但浏览器只打开了一个登录墙。)
良好
浏览器已启动,但目标工作流未开始。页面在任务开始前重定向到登录页面。没有工作开始。
完成标准
只有当有实际结果的证据时,才能将任务描述为已完成,例如:
- - 输出工件
- 已验证的状态变更
- 与实际执行工作相关的最终摘要
总结
此技能教授一个简单的纪律:
- - 谨慎启动
- 验证启动
- 真实报告
- 使用心跳作为备份
- 区分启动失败与工作失败
结果是更好的信任、更清晰的状态报告以及更少的幽灵任务。