Context Management
Prevent context exhaustion, enforce spawn discipline, and make compaction survivable.
Core Concepts
- 1. Fixed baseline: Typically 5-15% of context consumed before any conversation — system prompt, workspace files, skill descriptions, tool definitions. Varies by setup (more skills/files = higher baseline).
- 60/40 rule: ~60% of consumed context is tool outputs, ~40% conversation. Tool outputs are the primary target for savings.
- Compaction is lossy: Summaries stack cumulatively. Each cycle raises the floor. After 3+ compactions, summaries alone can consume 30%+ of context.
- Sub-agents are disposable context: A sub-agent can burn most of its context investigating something; only the summary (~500 tokens) enters main context.
All percentages are relative to the model's context window. Check session_status for actual window size and usage.
Procedures
When Context Pressure Rises
After every tool-heavy operation (>5 tool calls), assess:
- 1. Run
session_status to check usage - If below 50%: continue normally
- If 50-70%: spawn sub-agents for remaining tool-heavy work (>3 tool calls)
- If 70-85%: spawn sub-agents for ANY tool work (>1 tool call). Warn user.
- If above 85%: write checkpoint (see below), suggest
/compact or INLINECODE3
"What's Eating My Context?" — Estimation Method
Cannot get exact per-component breakdown. Estimate:
CODEBLOCK0
Count messages and tool calls in recent history, multiply by midpoint estimates. Report as ranges, not false precision. For per-operation cost detail, read references/operation-costs.md.
Spawn Policy
If .context-policy.yml exists in workspace root, use it as guidance for spawn thresholds and task categories. Otherwise use these defaults:
Always spawn (regardless of context level):
- - Test suites (>3 tests)
- Multi-file audits (>5 files)
- Build/deploy pipelines
- Research tasks (web search + analysis)
- Bulk file operations
Never spawn (keep in main session):
- - Single commands
- Conversations / discussions
- Quick edits (1-3 files)
- Status checks
- Tasks requiring user input mid-execution
Context-dependent (spawn when context exceeds threshold):
- - Above 50%: spawn if task involves >5 tool calls
- Above 70%: spawn if task involves >2 tool calls
When spawning, write detailed task descriptions. Sub-agents have no conversation context — they only know what the task field tells them.
Pre-Compaction Checkpoint
Before compaction or /new, write .context-checkpoint.md in the workspace root (the agent reads this post-compaction):
CODEBLOCK1
This file survives compaction. On session start or post-compaction, check for it and use it to restore context. Delete after consuming.
Coordination with OpenClaw memoryFlush: OpenClaw may fire its own pre-compaction flush (writing to daily log). The checkpoint is complementary — the flush saves to the daily log, the checkpoint saves structured resume state. Both should exist. If the memoryFlush fires first, compaction may already be in progress. For critical sessions, write checkpoints proactively at 75%, don't wait for 85%.
The scripts/context-checkpoint.sh script handles basic write/read/clear. For the full 5-section checkpoint, write the file directly — multiline content works better that way.
Post-Compaction Recovery
After compaction or /new:
- 1. Read
.context-checkpoint.md if it exists - Read today's daily log if the workspace has one (e.g.
memory/{today}.md) - Resume from the checkpoint's "Next Steps"
- Delete the checkpoint file after restoring context
Proactive Warning Template
When context exceeds 65%, warn:
CODEBLOCK2
Recommendations by level:
- - 65%: "Spawning sub-agents for remaining tool-heavy work."
- 75%: "Recommend compacting soon. Writing checkpoint."
- 85%: "Context critical. Writing checkpoint now. Suggest
/compact or /new."
Session Profiling & Config Advice
After significant work (or on request), profile the current session and recommend config changes.
Step 1: Classify the Session Pattern
Run session_status. Count approximate tool calls and message exchanges. Classify:
| Pattern | Signature | Example |
|---|
| Tool-heavy | Most context from tool results, many exec/read/web calls | Audits, migrations, test suites, debugging |
| Conversational |
Most context from messages, few tool calls | Planning, discussion, decisions |
|
Mixed | Roughly even split | Feature builds (discuss → code → test → discuss) |
|
Bursty | Long quiet periods with intense tool bursts | Monitoring + incident response |
Step 2: Recommend Config
There are four settings that matter. When explaining them to the user, always describe what they do in practice, not just the setting name:
1. When to compress the conversation (reserveTokensFloor)
How full the context gets before the agent summarises and compresses the history. A higher number means it compresses sooner — producing a shorter summary with more room left afterwards.
- -
30000 — waits until nearly full. Risk: huge summary, little room after. - INLINECODE17 — compresses at ~75% full. Good balance.
- INLINECODE18 — compresses early at ~70%. Maximum breathing room.
2. How quickly old tool output is cleared (pruning TTL)
After you stop talking for this long, the agent clears old command outputs, file reads, and search results from memory. Shorter = more aggressive cleanup.
- -
5m — only clears after 5 minutes of silence. Rarely fires during active work. - INLINECODE21 — clears after 2 minutes. Good for most workflows.
- INLINECODE22 — aggressive. Clears fast, but you might need to re-read files.
3. How many recent exchanges are protected from cleanup (keepLastAssistants)
When clearing old tool output, this many of your most recent back-and-forth exchanges are kept untouched.
- -
3 — keeps more history visible. Good for conversations. - INLINECODE25 — moderate protection.
- INLINECODE26 — only the last exchange is safe. Most aggressive cleanup.
4. Minimum size before tool output gets trimmed (minPrunableToolChars)
Only tool results larger than this (in characters) are eligible for trimming. Lower = more things get cleaned up.
- -
50000 (default) — only trims very large outputs (long file reads, huge command output). - INLINECODE29 — also trims medium outputs. Catches more.
- INLINECODE30 — aggressive. Most tool results are eligible.
Recommended combinations by work style:
| Work style | Compress at | Clear after | Protect | Trim above |
|------------|------------|-------------|---------|------------|
| Tool-heavy (audits, tests, debugging) | 60000 | 1m | 1 | 10000 |
| Conversational (planning, discussion) | 30000 | 5m | 3 | 50000 |
| Mixed (code → test → discuss) | 50000 | 2m | 2 | 10000 |
| Bursty (monitoring + incidents) | 50000 | 2m | 1 | 10000 |
Additional tips:
- - Sessions with browser/canvas work: Ensure those tools are protected from cleanup in the config
- Long-running sessions (>2h): Use a higher compression trigger to survive multiple rounds
Step 3: Report
Use a compact list format — tables render poorly on mobile and narrow chat windows. For each setting, show current vs recommended only if they differ. Skip settings that are already correct.
CODEBLOCK3
Lead with what's already right (builds confidence), then highlight what needs changing and why. Keep it short — the user wants a verdict, not a lecture.
If changes are recommended, tell the user everything up front before asking for approval:
- 1. Exact file being modified (full path — get from
gateway config.get) - Exact changes — setting name, current value, new value
- What happens — gateway restart (~2-3 second pause, auto-reconnects)
- Safety net — backup taken first, rollback doc written to temp directory
Example closing:
CODEBLOCK4
For multiple changes, list each one. Never summarise as "4 changes" — spell them out.
Never ask "want me to apply?" without the user seeing the exact file, exact values, and exact consequences. The user decides with full information, not blind trust.
If the user agrees, follow the full procedure below.
Step 4: Learn Over Time
After giving advice, note the session pattern and outcome in the daily log (if the workspace keeps one). Over multiple sessions, patterns emerge — the user's typical work style becomes clear and default config can be permanently tuned.
Applying Config Changes — Mandatory Procedure
When recommending config changes, follow this exact sequence. No shortcuts.
1. Find the Config File
Run gateway config.get to get the config file path and current values. Do not assume the path — it varies by installation.
2. Backup First
CODEBLOCK5
3. Write a Rollback Document
Write a rollback doc to a location the user can access (not the agent workspace — the user may not have access to it). Use a temp directory (/tmp/ on Linux/macOS, or the system temp dir). Include:
CODEBLOCK6
Tell the user where this file is.
4. Explain to the User BEFORE Applying
Tell them:
- - Which file is being modified (full path — get it from
gateway config.get) - What values change (before → after table)
- What "restart" means — the OpenClaw gateway process restarts (not the machine, not any other service). Brief 2-3 second pause, then the session reconnects automatically.
- Where the backup is (full path)
- Where the rollback doc is (full path)
- How to check if something goes wrong
5. Apply with gateway config.patch
Use the
gateway tool with
action: config.patch. Include a clear
note parameter — this message is delivered to the user after the gateway restarts.
6. Post-Restart Confirmation (MANDATORY)
After the gateway restarts and the session reconnects,
immediately confirm to the user:
CODEBLOCK7
Never stay silent after a restart. The user needs to know:
- 1. We're back
- The changes landed
- Where to find the rollback doc
- That we're ready to continue
Reference Docs
For detailed config options and profiles: references/config-guide.md
For per-operation cost estimates: INLINECODE55
上下文管理
防止上下文耗尽,强制执行生成策略,并使压缩可恢复。
核心概念
- 1. 固定基线:通常在任何对话之前消耗5-15%的上下文——系统提示、工作区文件、技能描述、工具定义。因设置而异(更多技能/文件 = 更高基线)。
- 60/40规则:约60%的消耗上下文来自工具输出,约40%来自对话。工具输出是节省的主要目标。
- 压缩有损:摘要会累积叠加。每个周期都会提高基线。经过3次以上压缩后,仅摘要就可能消耗30%以上的上下文。
- 子代理是可丢弃的上下文:子代理可以消耗大部分上下文来调查某件事;只有摘要(约500个token)进入主上下文。
所有百分比均相对于模型的上下文窗口。请查看session_status以获取实际的窗口大小和使用情况。
流程
当上下文压力升高时
每次工具密集型操作(超过5次工具调用)后,评估:
- 1. 运行session_status检查使用情况
- 如果低于50%:正常继续
- 如果50-70%:为剩余的工具密集型工作(超过3次工具调用)生成子代理
- 如果70-85%:为任何工具工作(超过1次工具调用)生成子代理。警告用户。
- 如果高于85%:写入检查点(见下文),建议/compact或/new
什么在消耗我的上下文? — 估算方法
无法获得每个组件的精确分解。估算:
固定基线: ~5-15%(系统提示 + 工作区文件 + 技能 + 工具)
每条用户消息: ~100-500个token
每条助手回复: ~200-1000个token
每次工具调用结果: ~500-5000个token(执行/读取密集型,搜索较轻)
压缩摘要: ~2000-5000个token(累积!)
计算近期历史中的消息和工具调用次数,乘以中间估算值。报告为范围,而非虚假精度。有关每次操作的成本详情,请阅读references/operation-costs.md。
生成策略
如果工作区根目录中存在.context-policy.yml,则将其用作生成阈值和任务类别的指导。否则使用以下默认值:
始终生成(无论上下文级别如何):
- - 测试套件(超过3个测试)
- 多文件审计(超过5个文件)
- 构建/部署流水线
- 研究任务(网络搜索 + 分析)
- 批量文件操作
绝不生成(保留在主会话中):
- - 单个命令
- 对话/讨论
- 快速编辑(1-3个文件)
- 状态检查
- 需要用户在执行过程中输入的任务
取决于上下文(当上下文超过阈值时生成):
- - 超过50%:如果任务涉及超过5次工具调用则生成
- 超过70%:如果任务涉及超过2次工具调用则生成
生成时,编写详细的任务描述。子代理没有对话上下文——它们只知道任务字段告诉它们的内容。
压缩前检查点
在压缩或/new之前,在工作区根目录中写入.context-checkpoint.md(代理在压缩后读取此文件):
markdown
上下文检查点 — {日期} {时间}
当前任务
{你正在做什么}
关键状态
{当前状态的要点列表——已完成的内容、进行中的内容}
本次会话做出的决定
{带理由的决定编号列表}
已更改的文件
{本次会话修改的文件列表}
后续步骤
{恢复后要做什么}
此文件在压缩后仍然存在。在会话开始或压缩后,检查它并使用它来恢复上下文。使用后删除。
与OpenClaw memoryFlush的协调: OpenClaw可能会触发其自身的压缩前刷新(写入每日日志)。检查点是补充性的——刷新保存到每日日志,检查点保存结构化的恢复状态。两者都应存在。如果memoryFlush先触发,压缩可能已经在进行中。对于关键会话,在75%时主动写入检查点,不要等到85%。
scripts/context-checkpoint.sh脚本处理基本的写入/读取/清除。对于完整的5部分检查点,直接写入文件——多行内容效果更好。
压缩后恢复
压缩或/new后:
- 1. 如果存在,读取.context-checkpoint.md
- 如果工作区有今日的每日日志(例如memory/{today}.md),则读取它
- 从检查点的后续步骤恢复
- 恢复上下文后删除检查点文件
主动警告模板
当上下文超过65%时,警告:
⚠️ 上下文:{百分比}%(已用{已用}k/总计{总计}k)。预估剩余:约{剩余调用次数}
次工具调用。{建议}
按级别的建议:
- - 65%:正在为剩余的工具密集型工作生成子代理。
- 75%:建议尽快压缩。正在写入检查点。
- 85%:上下文临界。正在立即写入检查点。建议/compact或/new。
会话分析与配置建议
在完成重要工作后(或应要求),分析当前会话并建议配置更改。
第1步:分类会话模式
运行session_status。估算工具调用次数和消息交换次数。分类:
| 模式 | 特征 | 示例 |
|---|
| 工具密集型 | 大部分上下文来自工具结果,大量exec/read/web调用 | 审计、迁移、测试套件、调试 |
| 对话型 |
大部分上下文来自消息,少量工具调用 | 规划、讨论、决策 |
|
混合型 | 大致均匀分配 | 功能构建(讨论 → 编码 → 测试 → 讨论) |
|
突发型 | 长时间静默期伴有密集的工具爆发 | 监控 + 事件响应 |
第2步:推荐配置
有四个重要的设置。向用户解释时,始终描述它们实际的作用,而不仅仅是设置名称:
1. 何时压缩对话(reserveTokensFloor)
上下文在代理总结和压缩历史之前达到多满。数字越高意味着压缩得越早——产生更短的摘要,之后留有更多空间。
- - 30000 — 等到几乎满。风险:巨大的摘要,之后空间很小。
- 50000 — 在约75%满时压缩。良好的平衡。
- 60000 — 在约70%时早期压缩。最大喘息空间。
2. 旧工具输出清除速度(pruning TTL)
在你停止说话这么长时间后,代理从内存中清除旧的命令输出、文件读取和搜索结果。越短 = 越积极的清理。
- - 5m — 仅在静默5分钟后清除。在活跃工作中很少触发。
- 2m — 在2分钟后清除。适用于大多数工作流程。
- 1m — 激进。清除速度快,但你可能需要重新读取文件。
3. 保护多少最近的交换免受清理(keepLastAssistants)
在清除旧工具输出时,保留你最近这么多次来回交换不受影响。
- - 3 — 保留更多历史可见。适合对话。
- 2 — 中等保护。
- 1 — 只有最后一次交换是安全的。最激进的清理。
4. 工具输出被修剪的最小大小(minPrunableToolChars)
只有大于此值(以字符计)的工具结果才有资格被修剪。越低 = 更多内容被清理。
- - 50000(默认)— 仅修剪非常大的输出(长文件读取、巨大命令输出)。
- 10000 — 也修剪中等输出。捕获更多。
- 5000 — 激进。大多数工具结果都有资格。
按工作风格推荐的组合:
| 工作风格 | 压缩时机 | 清除时间 | 保护 | 修剪阈值 |
|------------|------------|-------------|---------|------------|
| 工具密集型(审计、测试、调试) | 60000 | 1m | 1 | 10000 |
| 对话型(规划、讨论) | 30000 | 5m | 3 | 50000 |
| 混合型(编码 → 测试 → 讨论) | 50000 | 2m | 2 | 10000 |
| 突发型(监控 + 事件) | 50000 | 2m | 1 | 10000 |
额外提示:
- - 涉及浏览器/画布工作的会话:确保这些工具在配置中受到保护,不被清理
- 长时间运行的会话(超过2小时):使用更高的压缩触发阈值以承受多轮压缩
第3步:报告
使用紧凑的列表格式——表格在移动设备和狭窄的聊天窗口上渲染效果不佳。对于每个设置,仅当当前值与推荐值不同时才显示。跳过已经正确的设置。
📊 当前会话分析:{模式}
上下文:{百分比}