Context Compactor
Automatic context compaction for OpenClaw when using local models that don't properly report token limits or context overflow errors.
The Problem
Cloud APIs (Anthropic, OpenAI) report context overflow errors, allowing OpenClaw's built-in compaction to trigger. Local models (MLX, llama.cpp, Ollama) often:
- - Silently truncate context
- Return garbage when context is exceeded
- Don't report accurate token counts
This leaves you with broken conversations when context gets too long.
The Solution
Context Compactor estimates tokens client-side and proactively summarizes older messages before hitting the model's limit.
How It Works
CODEBLOCK0
Installation
CODEBLOCK1
The setup command automatically:
- - Copies plugin files to INLINECODE0
- Adds plugin config to
openclaw.json with sensible defaults
Configuration
Add to openclaw.json:
CODEBLOCK2
Options
| Option | Default | Description |
|---|
| INLINECODE3 | INLINECODE4 | Enable/disable the plugin |
| INLINECODE5 |
8000 | Max context tokens before compaction |
|
keepRecentTokens |
2000 | Tokens to preserve from recent messages |
|
summaryMaxTokens |
1000 | Max tokens for the summary |
|
charsPerToken |
4 | Token estimation ratio |
|
summaryModel | (session model) | Model to use for summarization |
Tuning for Your Model
MLX (8K context models):
CODEBLOCK3
Larger context (32K models):
CODEBLOCK4
Small context (4K models):
CODEBLOCK5
Commands
/compact-now
Force clear the summary cache and trigger fresh compaction on next message.
CODEBLOCK6
/context-stats
Show current context token usage and whether compaction would trigger.
CODEBLOCK7
Output:
CODEBLOCK8
How Summarization Works
When compaction triggers:
- 1. Split messages into "old" (to summarize) and "recent" (to keep)
- Generate summary using the session model (or configured
summaryModel) - Cache the summary to avoid regenerating for the same content
- Inject context with the summary prepended
If the LLM runtime isn't available (e.g., during startup), a fallback truncation-based summary is used.
Differences from Built-in Compaction
| Feature | Built-in | Context Compactor |
|---|
| Trigger | Model reports overflow | Token estimate threshold |
| Works with local models |
❌ (need overflow error) | ✅ |
| Persists to transcript | ✅ | ❌ (session-only) |
| Summarization | Pi runtime | Plugin LLM call |
Context Compactor is complementary — it catches cases before they hit the model's hard limit.
Troubleshooting
Summary quality is poor:
- - Try a better INLINECODE17
- Increase INLINECODE18
- The fallback truncation is used if LLM runtime isn't available
Compaction triggers too often:
- - Increase INLINECODE19
- Decrease
keepRecentTokens (keeps less, summarizes earlier)
Not compacting when expected:
- - Check
/context-stats to see current usage - Verify
enabled: true in config - Check logs for
[context-compactor] messages
Characters per token wrong:
- - Default of 4 works for English
- Try 3 for CJK languages
- Try 5 for highly technical content
Logs
Enable debug logging:
CODEBLOCK9
Look for:
- - INLINECODE24
- INLINECODE25
Links
- - GitHub: https://github.com/E-x-O-Entertainment-Studios-Inc/openclaw-context-compactor
- OpenClaw Docs: https://docs.openclaw.ai/concepts/compaction
上下文压缩器
在使用无法正确报告令牌限制或上下文溢出错误的本地模型时,为OpenClaw提供自动上下文压缩功能。
问题
云API(Anthropic、OpenAI)会报告上下文溢出错误,从而触发OpenClaw内置的压缩功能。而本地模型(MLX、llama.cpp、Ollama)通常会出现以下情况:
- - 静默截断上下文
- 上下文超出时返回乱码
- 无法报告准确的令牌数量
这会导致上下文过长时对话中断。
解决方案
上下文压缩器在客户端估算令牌数量,并在达到模型限制前主动对较旧消息进行摘要。
工作原理
┌─────────────────────────────────────────────────────────────┐
│ 1. 消息到达 │
│ 2. beforeagentstart 钩子触发 │
│ 3. 插件估算总上下文令牌数 │
│ 4. 如果超过 maxTokens: │
│ a. 将消息分为旧消息和最近消息 │
│ b. 对旧消息进行摘要(LLM 或回退方案) │
│ c. 将摘要作为压缩后的上下文注入 │
│ 5. 代理看到:摘要 + 最近消息 + 新消息 │
└─────────────────────────────────────────────────────────────┘
安装
bash
一键安装(推荐)
npx jasper-context-compactor setup
重启网关
openclaw gateway restart
安装命令会自动:
- - 将插件文件复制到 ~/.openclaw/extensions/context-compactor/
- 使用合理的默认值将插件配置添加到 openclaw.json
配置
添加到 openclaw.json:
json
{
plugins: {
entries: {
context-compactor: {
enabled: true,
config: {
maxTokens: 8000,
keepRecentTokens: 2000,
summaryMaxTokens: 1000,
charsPerToken: 4
}
}
}
}
}
选项
| 选项 | 默认值 | 描述 |
|---|
| enabled | true | 启用/禁用插件 |
| maxTokens |
8000 | 触发压缩前的最大上下文令牌数 |
| keepRecentTokens | 2000 | 保留的最近消息令牌数 |
| summaryMaxTokens | 1000 | 摘要的最大令牌数 |
| charsPerToken | 4 | 令牌估算比率 |
| summaryModel | (会话模型) | 用于摘要的模型 |
针对您的模型进行调优
MLX(8K 上下文模型):
json
{
maxTokens: 6000,
keepRecentTokens: 1500,
charsPerToken: 4
}
更大上下文(32K 模型):
json
{
maxTokens: 28000,
keepRecentTokens: 4000,
charsPerToken: 4
}
小上下文(4K 模型):
json
{
maxTokens: 3000,
keepRecentTokens: 800,
charsPerToken: 4
}
命令
/compact-now
强制清除摘要缓存,并在下一条消息时触发新的压缩。
/compact-now
/context-stats
显示当前上下文令牌使用情况以及是否会触发压缩。
/context-stats
输出:
📊 上下文统计
消息:共 47 条
估算令牌数:~6,234
限制:8,000
使用率:77.9%
✅ 在限制范围内
摘要工作原理
当触发压缩时:
- 1. 分割消息 为旧消息(需要摘要)和最近消息(需要保留)
- 生成摘要 使用会话模型(或配置的 summaryModel)
- 缓存摘要 以避免为相同内容重新生成
- 注入上下文 并在前面添加摘要
如果 LLM 运行时不可用(例如启动期间),则会使用基于截断的回退摘要。
与内置压缩的区别
| 功能 | 内置 | 上下文压缩器 |
|---|
| 触发条件 | 模型报告溢出 | 令牌估算阈值 |
| 适用于本地模型 |
❌(需要溢出错误) | ✅ |
| 持久化到记录 | ✅ | ❌(仅会话) |
| 摘要方式 | Pi 运行时 | 插件 LLM 调用 |
上下文压缩器是补充性的——它在达到模型硬限制之前捕获问题。
故障排除
摘要质量差:
- - 尝试更好的 summaryModel
- 增加 summaryMaxTokens
- 如果 LLM 运行时不可用,会使用回退截断
压缩触发过于频繁:
- - 增加 maxTokens
- 减少 keepRecentTokens(保留更少,更早进行摘要)
未按预期进行压缩:
- - 检查 /context-stats 查看当前使用情况
- 验证配置中的 enabled: true
- 检查日志中的 [context-compactor] 消息
每令牌字符数不正确:
- - 英语默认值为 4
- 中日韩语言尝试 3
- 高度技术性内容尝试 5
日志
启用调试日志:
json
{
plugins: {
entries: {
context-compactor: {
config: {
logLevel: debug
}
}
}
}
}
查找:
- - [context-compactor] 当前上下文:~XXXX 令牌
- [context-compactor] 压缩了 X 条消息 → 摘要
链接
- - GitHub:https://github.com/E-x-O-Entertainment-Studios-Inc/openclaw-context-compactor
- OpenClaw 文档:https://docs.openclaw.ai/concepts/compaction