Context Compactor

Automatic context compaction for OpenClaw when using local models that don't properly report token limits or context overflow errors.

The Problem

Cloud APIs (Anthropic, OpenAI) report context overflow errors, allowing OpenClaw's built-in compaction to trigger. Local models (MLX, llama.cpp, Ollama) often:

- Silently truncate context
Return garbage when context is exceeded
Don't report accurate token counts

This leaves you with broken conversations when context gets too long.

The Solution

Context Compactor estimates tokens client-side and proactively summarizes older messages before hitting the model's limit.

How It Works

CODEBLOCK0

Installation

CODEBLOCK1

The setup command automatically:

- Copies plugin files to INLINECODE0
Adds plugin config to openclaw.json with sensible defaults

Configuration

Add to openclaw.json:

CODEBLOCK2

Options

Option	Default	Description
INLINECODE3	INLINECODE4	Enable/disable the plugin
INLINECODE5

Tuning for Your Model

MLX (8K context models):
CODEBLOCK3

Larger context (32K models):
CODEBLOCK4

Small context (4K models):
CODEBLOCK5

Commands

`/compact-now`

Force clear the summary cache and trigger fresh compaction on next message.

CODEBLOCK6

`/context-stats`

Show current context token usage and whether compaction would trigger.

CODEBLOCK7

Output:
CODEBLOCK8

How Summarization Works

When compaction triggers:

1. Split messages into "old" (to summarize) and "recent" (to keep)
Generate summary using the session model (or configured summaryModel)
Cache the summary to avoid regenerating for the same content
Inject context with the summary prepended

If the LLM runtime isn't available (e.g., during startup), a fallback truncation-based summary is used.

Differences from Built-in Compaction

Feature	Built-in	Context Compactor
Trigger	Model reports overflow	Token estimate threshold
Works with local models

Context Compactor is complementary — it catches cases before they hit the model's hard limit.

Troubleshooting

Summary quality is poor:

- Try a better INLINECODE17
Increase INLINECODE18
The fallback truncation is used if LLM runtime isn't available

Compaction triggers too often:

- Increase INLINECODE19
Decrease keepRecentTokens (keeps less, summarizes earlier)

Not compacting when expected:

- Check /context-stats to see current usage
Verify enabled: true in config
Check logs for [context-compactor] messages

Characters per token wrong:

- Default of 4 works for English
Try 3 for CJK languages
Try 5 for highly technical content

Logs

Enable debug logging:

CODEBLOCK9

Look for:

- INLINECODE24
INLINECODE25

上下文压缩器

在使用无法正确报告令牌限制或上下文溢出错误的本地模型时，为OpenClaw提供自动上下文压缩功能。

问题

云API（Anthropic、OpenAI）会报告上下文溢出错误，从而触发OpenClaw内置的压缩功能。而本地模型（MLX、llama.cpp、Ollama）通常会出现以下情况：

- 静默截断上下文
上下文超出时返回乱码
无法报告准确的令牌数量

这会导致上下文过长时对话中断。

解决方案

上下文压缩器在客户端估算令牌数量，并在达到模型限制前主动对较旧消息进行摘要。

工作原理

┌─────────────────────────────────────────────────────────────┐
│ 1. 消息到达 │
│ 2. beforeagentstart 钩子触发 │
│ 3. 插件估算总上下文令牌数 │
│ 4. 如果超过 maxTokens： │
│ a. 将消息分为旧消息和最近消息 │
│ b. 对旧消息进行摘要（LLM 或回退方案） │
│ c. 将摘要作为压缩后的上下文注入 │
│ 5. 代理看到：摘要 + 最近消息 + 新消息 │
└─────────────────────────────────────────────────────────────┘

安装

bash

一键安装（推荐）

npx jasper-context-compactor setup

重启网关

openclaw gateway restart

安装命令会自动：

- 将插件文件复制到 ~/.openclaw/extensions/context-compactor/
使用合理的默认值将插件配置添加到 openclaw.json

配置

添加到 openclaw.json：

json
{
plugins: {
entries: {
context-compactor: {
enabled: true,
config: {
maxTokens: 8000,
keepRecentTokens: 2000,
summaryMaxTokens: 1000,
charsPerToken: 4
}
}
}
}
}

选项

选项	默认值	描述
enabled	true	启用/禁用插件
maxTokens

针对您的模型进行调优

MLX（8K 上下文模型）：
json
{
maxTokens: 6000,
keepRecentTokens: 1500,
charsPerToken: 4
}

更大上下文（32K 模型）：
json
{
maxTokens: 28000,
keepRecentTokens: 4000,
charsPerToken: 4
}

小上下文（4K 模型）：
json
{
maxTokens: 3000,
keepRecentTokens: 800,
charsPerToken: 4
}

命令

/compact-now

强制清除摘要缓存，并在下一条消息时触发新的压缩。

/compact-now

/context-stats

显示当前上下文令牌使用情况以及是否会触发压缩。

/context-stats

输出：

📊 上下文统计

消息：共 47 条

- 用户：23
助手：24
系统：0

估算令牌数：~6,234
限制：8,000
使用率：77.9%

✅ 在限制范围内

摘要工作原理

当触发压缩时：

1. 分割消息 为旧消息（需要摘要）和最近消息（需要保留）
生成摘要 使用会话模型（或配置的 summaryModel）
缓存摘要 以避免为相同内容重新生成
注入上下文 并在前面添加摘要

如果 LLM 运行时不可用（例如启动期间），则会使用基于截断的回退摘要。

与内置压缩的区别

功能	内置	上下文压缩器
触发条件	模型报告溢出	令牌估算阈值
适用于本地模型

上下文压缩器是补充性的——它在达到模型硬限制之前捕获问题。

故障排除

摘要质量差：

- 尝试更好的 summaryModel
增加 summaryMaxTokens
如果 LLM 运行时不可用，会使用回退截断

压缩触发过于频繁：

- 增加 maxTokens
减少 keepRecentTokens（保留更少，更早进行摘要）

未按预期进行压缩：

- 检查 /context-stats 查看当前使用情况
验证配置中的 enabled: true
检查日志中的 [context-compactor] 消息

每令牌字符数不正确：

- 英语默认值为 4
中日韩语言尝试 3
高度技术性内容尝试 5

日志

启用调试日志：

json
{
plugins: {
entries: {
context-compactor: {
config: {
logLevel: debug
}
}
}
}
}

查找：

- [context-compactor] 当前上下文：~XXXX 令牌
[context-compactor] 压缩了 X 条消息 → 摘要

链接

- GitHub：https://github.com/E-x-O-Entertainment-Studios-Inc/openclaw-context-compactor
OpenClaw 文档：https://docs.openclaw.ai/concepts/compaction

context-compactor上下文压缩器