🦈 The Shark Pattern
A shark that stops swimming dies. An agent that waits for tools wastes compute.
Works with: Claude Code · Codex · Gemini CLI · Cursor · Windsurf · Aider · OpenClaw · any LLM agent
When to Use This Skill
Trigger this skill when the user says:
- - "use the shark pattern"
- "non-blocking agent"
- "never wait for tools"
- "spawn background workers"
- "parallel subagents"
- "keep the main agent moving"
- or when you notice you're about to block on a slow tool (web fetch, SSH, build, test run, API call)
The Rule
Every LLM turn must complete in under 30 seconds.
If any operation would take longer:
- 1. Spawn a remora (
sessions_spawn with mode: "run") - Continue reasoning immediately
- Incorporate remora results when they arrive
You are never in I/O wait. You are always reasoning about something.
Lifecycle
CODEBLOCK0
No nested remoras. If a remora is running, it executes inline — remoras cannot spawn their own remoras. Only the main shark spawns.
The Pattern
Bad (Ralph-style blocking):
CODEBLOCK1
Good (Shark-style non-blocking):
CODEBLOCK2
Implementation
When applying the Shark Pattern, structure your work like this:
1. Identify blocking operations
Before calling any tool, ask: "Will this take more than 20-30 seconds?"
Slow tools (always spawn):
- - Web searches / page fetches
- SSH commands on remote machines
- Build / test / CI runs
- File system scans over large directories
- API calls with unknown latency
- LLM inference calls (coding agents)
Fast tools (run inline, never spawn):
- - Reading local files
- Simple calculations
- String manipulation
- Memory lookups
2. Spawn remoras
CODEBLOCK3
Spawn multiple remoras in parallel when possible — don't serialize unless there's a data dependency.
3. Keep the main fin moving
After spawning, immediately continue:
- - Plan the next step
- Work on a different part of the task
- Summarize what you know so far
- Prepare to incorporate results
4. Incorporate results
When remora results arrive, weave them in and continue. Never re-do work a remora already completed.
If your runtime keeps subagents alive after completion, close them once you've incorporated their result. In Codex that means: wait for the remora, use its output, then close_agent(id) unless you intentionally plan to reuse that same agent.
Timing Budget
| Operation | Budget | Action |
|---|
| File read | < 2s | Inline |
| Web search |
5-30s | Spawn |
| SSH command | 10-120s | Spawn |
| Build/test | 30-300s | Spawn |
| Coding agent | 60-600s | Spawn |
| Memory search | < 3s | Inline |
Example: Multi-Step Research Task
Without Shark (blocking):
CODEBLOCK4
With Shark (non-blocking):
CODEBLOCK5
Output Format
Announce on start
🦈 Shark mode — spawning [N] remoras for [tasks], continuing...
Progress bar (chat-friendly, Unicode only — no images needed)
Use this format after each remora or pilot fish completes. Works in Telegram, Discord, Signal, iMessage — anywhere.
CODEBLOCK6
Symbols:
- -
◉ = remora (completed) - INLINECODE4 = remora (pending)
- INLINECODE5 = remora (running)
- INLINECODE6 = pilot fish (time-bounded)
- INLINECODE7 = done bar (12 blocks)
- INLINECODE8 = partial (filled = elapsed / total budget)
- INLINECODE9 = not started
Progress fill: filled = round(elapsed / timeout * 12) blocks of █, remainder INLINECODE12
Only post an update when something changes (remora completes or pilot fish starts/ends). Don't spam — one update per event.
Final synthesis
After all remoras done:
🦈 All fins in — synthesising [N] results + pilot draft
Then deliver the report.
The Pilot Fish Sub-Pattern
Pilot fish swim alongside sharks doing prep work. When you have idle time, use it.
When one remora returns early and others are still running:
- 1. Spawn a pilot fish — a time-bounded analysis sub-agent
- Give it only the partial results so far + a hard timeout equal to the estimated remaining wait
- Let it pre-validate, pre-analyse, find patterns, draft conclusions
- Kill it (or it self-terminates) when the last primary remora completes
- Incorporate whatever the pilot fish produced into the final synthesis
CODEBLOCK7
Pilot Fish Rules
- - Always time-bounded — pass
runTimeoutSeconds equal to estimated remaining wait - Never blocks — spawned async, main agent continues
- Opportunistic — if it finishes early, bonus; if killed mid-run, partial output is still useful
- One at a time — don't stack pilot fish on pilot fish
- Task: pre-validate data, find gaps, draft structure, flag anomalies, prepare questions
Example
CODEBLOCK8
Decision Tree — When to Spawn
Before every tool call, ask: "Will this take more than 10 seconds?"
CODEBLOCK9
Always spawn: web search/fetch, SSH, build/test, coding agents, CI triggers, API calls with unknown latency
Always inline: file read, memory lookup, string ops, math, local config reads
Error Handling
remoras will fail, timeout, or return garbage. Plan for it.
remora timeout
◉ [A] task ████████████ ⏱ 30s [timeout]
- - Treat as partial result — use whatever was returned
- Do not re-spawn the same task (wastes time, likely to timeout again)
- Note the gap in synthesis: "A timed out — data may be incomplete"
- If A's result is critical, spawn a smaller-scoped follow-up shark
remora crash / error
◉ [A] task ████████████ ❌ [error: connection refused]
- - Log the error inline in the progress bar
- Continue synthesis without that result
- Mention the failure in the final report
- Optionally file an issue / alert if it's infrastructure
- If the runtime still shows the remora as open after completion or error, clean it up immediately. In Codex, close completed remoras with
close_agent(id) once their output is delivered.
Partial results (most common)
- - Most useful — a remora that timed out at 28s has 28s of work in it
- Always check if partial output is usable before discarding
- Progress bar:
⏱ = timeout with partial, ❌ = hard error with nothing
>50% remoras failed
- - Degrade gracefully — fall back to sequential for remaining work
- Note in report: "⚠️ degraded mode — N/M remoras failed"
All remoras failed
- - Fall back to sequential execution for the most critical task only
- Do not spawn another full fleet — you're likely hitting a systemic issue
Forgetting to spawn the pilot fish (most common mistake)
- - You finished a fast inline task, a remora is still running, and you just... wait
- Symptom: main agent idle, no pilot fish, time wasted
- Fix: always ask after any remora completes early — "what can I pre-draft right now?"
- Even if you have nothing obvious, draft the output structure, prepare questions, or outline the synthesis
Pilot fish killed mid-run
- - Normal and expected — whatever it produced is still useful
- Incorporate partial pilot fish output into synthesis
- Don't wait for it or re-spawn it
Terminology
- - remora = a
sessions_spawn call with mode: "run", runtime: "subagent", and runTimeoutSeconds set. A remora is specifically a timed sub-agent — untimed subagents are not remoras. - Pilot fish = a remora spawned after another remora completes, with a short timeout sized to the estimated remaining wait. Purpose: pre-analysis only, never primary work.
- Fleet = the full set of remoras spawned for one task
- Fin moving = the main agent is doing useful work (not waiting)
- No nested remoras = remoras always execute inline — only the main shark spawns
runTimeoutSeconds — confirmed real
Verified against OpenClaw source:
runTimeoutSeconds: z.number().int().min(0).optional() — maps to the subagent wait timeout. Use it. Hard-kills the sub-agent process after N seconds, partial output returned.
Pilot Fish Sizing Formula
CODEBLOCK12
- -
estimatedRemaining = how long you think the slowest remaining remora will take - Cap at 25s so pilot fish always finishes before the main synthesis turn
- If you don't know: use 20s as default
Example: slowest remaining remora estimated at 30s → pilot fish timeout = min(24, 25) = 24s
Hard Limits
- - Never use
yieldMs > 30000 in exec calls — this holds the main turn hostage - Never
process(action=poll, timeout > 20000) in the main session — same reason - Never add
sleep or wait loops in the main thread - Always set
runTimeoutSeconds on remoras — unbound sub-agents are not sharks - Always clean up completed remoras — if your runtime requires explicit teardown, do it right after incorporating the result
- Max 8 concurrent remoras — beyond this, context overhead exceeds the gain
- Never stack pilot fish — one at a time, no pilot fish spawning pilot fish
- Spawn tasks ≤ 3 sentences — longer task descriptions need decomposition first
Enforcing the 30-Second Timeout
The 30s cap isn't just a guideline — here's how to actually enforce it per runtime.
OpenClaw subagents
sessions_spawn({
task: "...",
mode: "run",
runtime: "subagent",
runTimeoutSeconds: 30 // hard kill after 30s — agent gets SIGTERM
})
runTimeoutSeconds is enforced by the OpenClaw runtime — the sub-agent process is killed if it exceeds it. Partial output is still returned.
exec calls (shell, SSH, scripts)
exec({
command: "some-slow-command",
timeout: 30, // hard kill in seconds
background: true, // don't block the main agent turn
yieldMs: 500 // poll back quickly to check
})
timeout kills the process.
background: true means the main agent doesn't wait — it gets a session handle and can check back with
process(poll).
Gemini CLI via exec
timeout 30 gemini -p "task here"
# or on Windows:
Start-Process gemini -ArgumentList '-p "task"' -Wait -Timeout 30
Wrap the CLI invocation with OS-level
timeout /
Start-Process -Timeout.
Pilot fish — always use runTimeoutSeconds
sessions_spawn({
task: "pre-analyse partial results, draft structure, flag gaps",
mode: "run",
runTimeoutSeconds: estimatedRemainingMs / 1000, // die before the last remora
})
Set it to
slightly less than your estimated remaining wait — so the pilot fish always finishes before you need to synthesise.
What happens when timeout fires
- - Sub-agent/process is killed
- Whatever output was produced so far is returned
- Main agent treats it as a partial result — still useful for synthesis
- Log:
[timeout] in the progress bar instead of INLINECODE36
CODEBLOCK17
The LLM turn itself
You can't hard-kill an LLM mid-turn, but you can:
- 1. Keep prompts tight — don't ask for exhaustive analysis in one turn
- Use
thinking: "none" for fast sub-tasks that don't need deep reasoning - Break large tasks into smaller shark-able chunks upfront
Rule of thumb: if a task description is >3 sentences, it probably needs to be split into remoras.
Compatibility — Claude, Codex, Gemini CLI
The Shark Pattern is runtime-agnostic. remoras can be any agent type.
OpenClaw (Claude / Sonnet / Opus)
CODEBLOCK18
Codex
CODEBLOCK19
Codex-specific lifecycle:
- - Spawn with
spawn_agent(...) or the runtime-equivalent remora launcher - Check completion with INLINECODE39
- If you want to reuse the same remora, send more work with INLINECODE40
- Otherwise, once the remora has completed and you've incorporated its result, call
close_agent(id) so the agent does not linger in the session
Gemini CLI
Gemini CLI is a local process — spawn via exec with a timeout:
exec({
command: "gemini -p \"task description here\"",
timeout: 30, // hard cap in seconds
background: true, // don't block main agent
yieldMs: 500 // check back quickly
})
For Gemini sub-tasks, use
exec with
timeout +
background: true rather than
sessions_spawn. Treat the process handle the same way — continue working, collect output when it lands.
Mixed fleets
You can mix runtimes in the same shark run:
CODEBLOCK21
Which to use when
| Task type | Best runtime |
|---|
| Code generation / editing | Codex |
| Web search / summarise |
Gemini CLI |
| Multi-step reasoning | Claude subagent |
| File ops / SSH / shell | exec (background) |
| Pre-analysis / drafting | Claude subagent (pilot fish) |
shark-exec Sub-Skill
For slow shell commands (>5s), use the shark-exec companion skill:
- - Located at
shark-exec/SKILL.md in this repo - Wraps any
exec call in background + cron poller - Guarantees main turn completes in <30s even for 10-minute commands
- Use it instead of inline exec whenever the command might block
Loop Enforcement (Ralph-style)
The 30-second rule is best enforced at the shell level, not inside a turn.
Use shark.sh (or shark.ps1 on Windows) to run Claude in a bounded loop:
CODEBLOCK22
Each iteration:
- 1. Builds a fresh prompt: skill context + task + current state
- Runs
claude --print with a hard timeout 25s shell wrapper - If Claude times out → loop continues (it's expected — shark pattern means short turns)
- If Claude writes
.shark-done → loop exits
This is identical to the Ralph Loop pattern, but with the Shark Pattern as the prompt — Claude spawns remoras for slow work, keeps each turn under 25s, and the shell loop enforces the hard cut.
When to use the loop vs direct claude
| Use case | Approach |
|---|
| Single fast task (<30s total) | INLINECODE53 directly |
| Multi-step task, slow tools |
./shark.sh "..." loop |
| CI/build watching | shark-exec (background + cron) |
| Interactive chat | OpenClaw main session |
Environment variables
| Variable | Default | Description |
|---|
| INLINECODE55 | INLINECODE56 | Maximum iterations before giving up |
| INLINECODE57 |
25 | Per-turn timeout in seconds (hard kill) |
Completion protocol
When Claude determines the task is done, it writes to .shark-done:
TASK_COMPLETE
<brief summary of what was accomplished>
The loop detects this file and exits cleanly.
Commands
When the user invokes these commands, follow the instructions for each.
/shark <task>
Apply the Shark Pattern to the given task. Decompose, spawn remoras for slow ops, keep the main fin moving. Follow all rules in this SKILL.md.
/shark-loop <task> [--max-loops N] [--timeout S]
Run the external shark loop enforcer. Execute:
$env:SHARK_MAX_LOOPS = "<N>"
$env:SHARK_LOOP_TIMEOUT = "<S>"
powershell.exe -ExecutionPolicy Bypass -File "<skill_dir>/shark.ps1" "<task>"
Defaults:
--max-loops 50,
--timeout 25. On Linux/Mac use
shark.sh instead.
/shark-status
Check current shark state:
- 1. Read
<skill_dir>/shark-exec/state/pending.json — report active background jobs (label, command, elapsed time, whether overdue past maxSeconds) - If
.shark-done exists, show its contents - If
SHARK_LOG.md exists, show the last 10 lines - If nothing exists, report "No active shark jobs."
/shark-clean
Remove shark state files: .shark-done, SHARK_LOG.md, shark-exec/state/pending.json. Report what was cleaned.
/shark-autotune
Analyse timing history and recommend optimal settings.
- 1. Read
<skill_dir>/state/timings.jsonl — each line is:
CODEBLOCK25
- 2. If no data, report "No timing data yet. Run tasks with /shark first."
- 3. Compute and report:
-
Total runs (unique task_hash values) and
total loops
-
Median turn time (p50) and
p95 turn time
-
Timeout rate — % of turns with result "timeout"
-
Loops to completion — median and max (count loops per task_hash that has a "done" entry)
-
Wasted headroom — sum of (timeout
s - elapseds) for result "ok" turns
-
Optimal timeout — p95 turn time + 3s buffer, rounded up to nearest 5s
-
Optimal max_loops — p95 loops-to-completion + 2
- 4. Show recommendations:
CODEBLOCK26
- 5. If timeout rate > 30%: "Consider breaking tasks into smaller steps."
- If median turn time < 5s: "Most turns complete fast. Consider lowering timeout."
Timing Instrumentation
Both shark.sh and shark.ps1 automatically record per-loop timings to state/timings.jsonl. Each entry includes:
- -
ts — Unix timestamp - INLINECODE79 — loop iteration number
- INLINECODE80 — actual wall-clock seconds for this turn
- INLINECODE81 — configured timeout for this run
- INLINECODE82 —
"ok" (completed), "timeout" (hit limit), "done" (task finished) - INLINECODE86 — 8-char hash correlating loops within a single run
Use /shark-autotune to analyse this data and tune your settings.
References
- - Ralph Loop (sequential baseline): ghuntley.com/ralph/
- OpenClaw sessions_spawn docs: spawn with
mode: "run", INLINECODE89 - Gemini CLI: INLINECODE90
- The name: sharks use ram ventilation — they literally die if they stop moving
🦈 鲨鱼模式
停止游动的鲨鱼会死亡。等待工具的智能体会浪费算力。
适用于: Claude Code · Codex · Gemini CLI · Cursor · Windsurf · Aider · OpenClaw · 任何LLM智能体
何时使用此技能
当用户说出以下内容时触发此技能:
- - 使用鲨鱼模式
- 非阻塞智能体
- 永远不要等待工具
- 生成后台工作进程
- 并行子智能体
- 保持主智能体持续运行
- 或当你注意到即将阻塞在某个慢速工具上时(网页抓取、SSH、构建、测试运行、API调用)
规则
每个LLM轮次必须在30秒内完成。
如果任何操作需要更长时间:
- 1. 生成一个䲟鱼(sessions_spawn 使用 mode: run)
- 立即继续推理
- 当䲟鱼结果到达时将其整合
你从不处于I/O等待状态。你始终在对某事进行推理。
生命周期
┌─────────────┐
│ 分解任务 │ 将任务拆分为N个独立的子任务
└──────┬──────┘
│ 生成N条䲟鱼(+ 当第一条提前完成时生成1条领航鱼)
▼
┌─────────────┐
│ 生成 │ sessions_spawn × N,全部并行,记录会话ID
└──────┬──────┘
│ 主智能体继续推理(从不等待)
▼
┌─────────────┐ 超时/崩溃
│ 监控 │ ──────────────────► 标记 ⏱/❌(部分结果仍有价值)
└──────┬──────┘
│ 全部完成或截止时间到达
▼
┌─────────────┐
│ 聚合 │ 收集结果,记录失败,合并领航鱼草稿
└──────┬──────┘
│
▼
┌─────────────┐
│ 报告 │ 单一连贯响应,注明失败次数
└─────────────┘
禁止嵌套䲟鱼。 如果䲟鱼正在运行,它以内联方式执行——䲟鱼不能生成自己的䲟鱼。只有主鲨鱼才能生成。
模式
糟糕(Ralph式阻塞):
思考 → 调用慢速工具 → 等待60秒 → 思考 → 调用慢速工具 → 等待45秒 → ...
良好(鲨鱼式非阻塞):
思考 → 生成䲟鱼(慢速工具) → 思考其他事情
→ 生成䲟鱼(另一个工具) → 综合部分结果
→ 接收䲟鱼结果 → 整合 → 继续游动
实现
应用鲨鱼模式时,按以下方式组织工作:
1. 识别阻塞操作
在调用任何工具之前,问:这会花费超过20-30秒吗?
慢速工具(始终生成):
- - 网页搜索/页面抓取
- 远程机器上的SSH命令
- 构建/测试/CI运行
- 大型目录的文件系统扫描
- 延迟未知的API调用
- LLM推理调用(编码智能体)
快速工具(内联运行,从不生成):
2. 生成䲟鱼
sessions_spawn({
task: 执行慢速操作并返回结果,
mode: run,
runtime: subagent,
streamTo: parent // 可选:将输出流式传回
})
尽可能并行生成多条䲟鱼——除非存在数据依赖,否则不要串行化。
3. 保持主鳍持续游动
生成后,立即继续:
- - 规划下一步
- 处理任务的不同部分
- 总结目前已知信息
- 准备整合结果
4. 整合结果
当䲟鱼结果到达时,将其编织进来并继续。永远不要重复䲟鱼已完成的工作。
如果你的运行环境在完成后仍保持子智能体存活,在整合其结果后关闭它们。在Codex中这意味着:等待䲟鱼,使用其输出,然后调用 close_agent(id),除非你有意计划重用同一个智能体。
时间预算
5-30s | 生成 |
| SSH命令 | 10-120s | 生成 |
| 构建/测试 | 30-300s | 生成 |
| 编码智能体 | 60-600s | 生成 |
| 内存搜索 | < 3s | 内联 |
示例:多步骤研究任务
没有鲨鱼模式(阻塞):
- 1. 搜索网页X [等待15s]
- 搜索网页Y [等待12s]
- 抓取页面Z [等待8s]
- SSH检查服务器 [等待30s]
总计:约65秒阻塞
使用鲨鱼模式(非阻塞):
- 1. 生成:搜索X [0s - 已生成]
- 生成:搜索Y [0s - 已生成]
- 生成:抓取Z [0s - 已生成]
- 生成:SSH检查 [0s - 已生成]
- 等待时规划综合方案 [15s的实际思考]
- 所有结果到达 → 综合
总计:约15秒思考 + 并行中的最大(工具时间)
输出格式
开始时宣布
🦈 鲨鱼模式 — 为[任务]生成[N]条䲟鱼,继续...
进度条(聊天友好,仅Unicode——无需图片)
每次䲟鱼或领航鱼完成后使用此格式。适用于Telegram、Discord、Signal、iMessage——任何地方。
🦈 3条䲟鱼 · 1条领航鱼
◉ [A] 任务名称 ████████████ ✅ 9s
◉ [B] 任务名称 ████████████ ✅ 33s
○ [C] 任务名称 ░░░░░░░░░░░░ 待处理
◈ [P] 领航鱼 ██████░░░░░░ ~14s剩余
↳ 继续...
符号:
- - ◉ = 䲟鱼(已完成)
- ○ = 䲟鱼(待处理)
- ⊙ = 䲟鱼(运行中)
- ◈ = 领航鱼(有时间限制)
- ████████████ = 完成条(12个块)
- ██████░░░░░░ = 部分(填充 = 已用时间/总预算)
- ░░░░░░░░░░░░ = 未开始
进度填充: 填充 = round(已用时间/超时时间 * 12) 个 █ 块,剩余 ░
仅在状态发生变化时(䲟鱼完成或领航鱼开始/结束)发布更新。不要刷屏——每个事件一次更新。
最终综合
所有䲟鱼完成后:
🦈 所有鱼鳍归位 — 综合[N]个结果 + 领航鱼草稿
然后交付报告。
领航鱼子模式
领航鱼与鲨鱼并肩游动,做准备工作。当你有空闲时间时,利用它。
当一条䲟鱼提前返回而其他䲟鱼仍在运行时:
- 1. 生成一条领航鱼 — 一个有时间限制的分析子智能体
- 只给它到目前为止的部分结果 + 一个硬超时,等于估计的剩余等待时间
- 让它进行预验证、预分析、发现模式、起草结论
- 终止它(或让它自行终止)当最后一条主要䲟鱼完成时
- 整合领航鱼产生的任何内容到最终综合中
䲟鱼A ──────► 结果(提前)
䲟鱼B ────────────────────────────► 结果
䲟鱼C ──────────────────────────────────► 结果
主线程:生成A、B、C
A完成 → 生成领航鱼(A的结果, 超时=估计剩余时间)
领航鱼:预分析A,起草部分报告,验证数据...
B完成 → 领航鱼仍在运行,将B的结果传入(或终止并重用)
C完成 → 终止领航鱼,综合A+B+C+领航鱼草稿
领航鱼规则