session-distiller
Batch-process OpenClaw session transcripts into structured daily memory files (memory/YYYY-MM-DD.md). Ensures no session knowledge is lost — even from short conversations that end before the real-time memory flush threshold.
Platform: macOS. Reads from ~/.openclaw/agents/main/sessions/. Requires trash CLI.
Components
| Script | Purpose |
|---|
| INLINECODE3 | Batch distill closed sessions + live distill-in-place for approved group chats |
| INLINECODE4 |
Poll context usage, warn at threshold, auto-distill + alert at hard gate |
Prerequisites
- - Python 3.8+
- LiteLLM proxy running at
http://localhost:4000 (used for LLM distillation calls) - INLINECODE6 CLI (
brew install trash) — safe file removal to macOS Trash - OpenClaw gateway running (for context-gate.py status polling)
distill.py Usage
CODEBLOCK0
context-gate.py Usage
CODEBLOCK1
Environment Variables
| Variable | Default | Description |
|---|
| INLINECODE8 | INLINECODE9 | Advisory warning threshold (%) |
| INLINECODE10 |
60 | Hard gate threshold — triggers auto-distill (%) |
|
BOT_TOKEN |
(empty) | Telegram bot token (required for alerts) |
|
CHAT_ID |
(empty) | Telegram chat ID for alerts (required for alerts) |
Configuration
Live Allowlist (LIVE_ALLOWLIST_KEYS)
In scripts/distill.py, the LIVE_ALLOWLIST_KEYS dict controls which sessions get live distill-in-place. Keys are session keys from sessions.json (stable across UUID rotations). Add entries as:
CODEBLOCK2
Key Paths
| Path | Purpose |
|---|
| INLINECODE18 | Source session JSONL files |
| INLINECODE19 |
Session key → UUID index |
|
~/.openclaw/workspace/memory/ | Output daily memory files |
|
prompts/distill.txt | LLM distillation prompt template |
|
offsets.json | Live session offset tracker (runtime state, auto-created) |
|
gate-state.json | Context gate per-session state (runtime state, auto-created) |
LiteLLM Endpoint
Distillation calls go to http://localhost:4000/v1/chat/completions with model claude-opus-4-6. Change the model or endpoint in the distill_transcript() function if needed.
Scheduling
Batch Distill (distill.py)
Daily at 03:00 CST via OpenClaw cron. Runs as a sub-agent in an isolated session.
Known issue: The 03:00 cron has a 600s timeout. With ~44 sessions × LLM calls, it times out (5 consecutive errors observed). Mitigations: run with --min-age-hours 48 to reduce batch size, or split into multiple cron runs.
Context Gate (context-gate.py)
Every 5 minutes, 07:00–22:00 via shell crontab:
CODEBLOCK3
References
- - ROADMAP.md — Future ideas, known issues, Phase 4 community release plans
Changelog
v0.5.1 — 2026-03-17
Hotfix: removed runtime state files from repo and added .gitignore.
- - Added .gitignore covering offsets.json, gate-state.json, captains-log-ingested.json, granola-ingested.json
- Removed state files from git tracking (git rm --cached)
v0.5.0 — 2026-03-17
Configurable paths + flag aliases + de-identification.
- - Added configurable paths constants block (
MEETING_NOTES_DIR, DAILY_LOG_DIR, DAILY_LOG_PATTERN, MEMORY_DIR) — override via env vars without editing the script - Added
--meeting-notes flag as the preferred replacement for INLINECODE33 - Added
--daily-log flag as the preferred replacement for INLINECODE35 - INLINECODE36 and
--captains-log still work but now emit a deprecation warning to stderr - Removed deployment-specific identifiers from docs (SKILL.md, ROADMAP.md); replace
<your-name> / <your-repo> with your own values - Bumped
__version__ to INLINECODE41
v0.4.1 — 2026-03-15
Behavior change: --captains-log mode now moves source files to memory/captains-log/ingested/ after successful ingestion (default cleanup, consistent with --granola mode).
- - Added
--no-move flag to preserve source files in place - Added
--trash flag to trash instead of move - INLINECODE47 is now clean after ingestion — no redundant duplicate files
- Backfill: existing pre-ingested source files moved to
ingested/ on first run
v0.4.0 — 2026-03-14
New feature: Captain's Log AM/PM ingestion (--captains-log mode, closes #7).
- -
--captains-log flag scans memory/ for captains-log-YYYY-MM-DD-am.md and captains-log-YYYY-MM-DD-pm.md files - No LLM call — logs are already structured summaries, appended as-is
- AM ingested before PM for same date (chronological order guaranteed)
- Appends
## Captain's Log — AM (Morning Watch) / ## Captain's Log — PM (Dog Watch) sections - INLINECODE56 sidecar for idempotent re-runs
- Source files preserved — not moved or trashed (reference artifacts)
- INLINECODE57 supported for backlog processing
- Completes the daily LTM pipeline: AM log → Granola meetings → PM log → session distillations
v0.3.1 — 2026-03-14
New feature: Granola meeting note distillation (--granola mode, closes #4).
- -
--granola flag scans memory/granola/ and distills un-ingested meeting notes into their corresponding memory/YYYY-MM-DD.md daily files - Dedicated prompt file
prompts/distill-granola.txt tuned for already-summarized meeting notes (extraction focus vs. summarization) - INLINECODE63 sidecar tracks ingested UUIDs — idempotent on re-runs
- UUID dedup against target daily file (same logic as session distillation)
- Source files moved to
memory/granola/ingested/ on success (safe default) - INLINECODE65 flag to trash instead of move to ingested/
- INLINECODE66 to cap batch size (recommended for large backlogs)
- INLINECODE67 files skipped by default
- Appends
## Granola — {title} ({uuid8}) sections to daily files - Creates daily file if none exists for that date
- First run processed 130 candidates: 76 distilled, 20 skipped, 6 NO_DISTILL
v0.2.0 — 2026-03-12
BREAKING CHANGE (OpenClaw 2026.3.11): Gateway API no longer populates totalTokens, remainingTokens, or percentUsed for live sessions. context-gate.py was completely blind as a result.
Fix: JSONL fallback added to context-gate.py. When API fields are null, gate reads session file on disk, counts message chars, estimates tokens at chars÷4. Token source tagged in log output ([api], [api:inputTokens], or [jsonl-estimate]). Gate will auto-switch back to API source when upstream fixes the regression.
Also fixed: distill.py script path in Session Distiller cron was pointing to old projects/session-distiller/distill.py location — updated to skills/session-distiller/scripts/distill.py. This was causing 5 consecutive cron failures.
Other changes:
- - Cron timeout increased 600s → 1800s
- ROADMAP.md updated with known issue entry for token count regression
Upstream issue status (2026-03-14): OpenClaw upstream addressed the token count issue at the display layer only (openclaw/openclaw #43987, #45268). The underlying API fields (totalTokens, remainingTokens, percentUsed) remain null as of 2026.3.13. The JSONL fallback introduced in v0.2.0 is the permanent workaround. See ROADMAP.md and /session-distiller #5 for details.
v0.1.0 — 2026-03-11
Initial skill packaging of session-distiller project. Converted from projects/session-distiller/ to structured OpenClaw skill. No functional logic changes. Added --version flags to both scripts. Removed hardcoded credentials from context-gate.py (now requires BOT_TOKEN/CHAT_ID env vars). Adjusted relative paths for new scripts/ directory layout.
session-distiller
将OpenClaw会话记录批量处理为结构化的每日记忆文件(memory/YYYY-MM-DD.md)。确保不会丢失任何会话知识——即使是那些在实时记忆刷新阈值之前结束的简短对话。
平台: macOS。从 ~/.openclaw/agents/main/sessions/ 读取。需要 trash 命令行工具。
组件
| 脚本 | 用途 |
|---|
| scripts/distill.py | 批量提取已关闭会话 + 对已批准的群聊进行实时原地提取 |
| scripts/context-gate.py |
轮询上下文使用情况,在阈值时发出警告,在硬性门控时自动提取并发出警报 |
前置条件
- - Python 3.8+
- 在 http://localhost:4000 运行的 LiteLLM 代理(用于LLM提取调用)
- trash 命令行工具(brew install trash)——安全地将文件移至macOS废纸篓
- 运行的OpenClaw网关(用于context-gate.py状态轮询)
distill.py 使用方法
bash
批量:预览已关闭会话的试运行
python3 scripts/distill.py --dry-run
批量:处理所有符合条件的已关闭会话(默认:24小时以上)
python3 scripts/distill.py
批量:提取后保留源文件
python3 scripts/distill.py --no-trash
批量:处理特定会话文件
python3 scripts/distill.py --file
批量:更改最小时间阈值
python3 scripts/distill.py --min-age-hours 48
实时:提取所有已批准的群聊会话
python3 scripts/distill.py --live [--dry-run]
实时:按UUID提取特定会话
python3 scripts/distill.py --live-session [--dry-run]
会议记录:提取到每日记忆文件(推荐)
python3 scripts/distill.py --meeting-notes [--dry-run] [--limit N] [--trash]
会议记录:已弃用的别名(发出弃用警告)
python3 scripts/distill.py --granola [--dry-run] [--limit N] [--trash]
每日日志:将AM/PM文件导入每日记忆文件(推荐)
python3 scripts/distill.py --daily-log [--dry-run] [--limit N]
每日日志:已弃用的别名(发出弃用警告)
python3 scripts/distill.py --captains-log [--dry-run] [--limit N]
版本
python3 scripts/distill.py --version
context-gate.py 使用方法
bash
实时运行——轮询网关,发送Telegram警报
python3 scripts/context-gate.py
试运行——记录将要执行的操作,不发送警报
python3 scripts/context-gate.py --dry-run
版本
python3 scripts/context-gate.py --version
环境变量
| 变量 | 默认值 | 描述 |
|---|
| CONTEXTWARNPCT | 40 | 建议性警告阈值(%) |
| CONTEXTHARDPCT |
60 | 硬性门控阈值——触发自动提取(%) |
| BOTTOKEN | (空)_ | Telegram机器人令牌(警报必需) |
| CHATID | (空)_ | 用于警报的Telegram聊天ID(警报必需) |
配置
实时白名单(LIVEALLOWLISTKEYS)
在 scripts/distill.py 中,LIVEALLOWLISTKEYS 字典控制哪些会话获得实时原地提取。键是来自 sessions.json 的会话键(在UUID轮换中保持稳定)。添加条目如下:
python
LIVEALLOWLISTKEYS = {
agent:main:telegram:group:-5166698025: Claw & Order,
}
关键路径
| 路径 | 用途 |
|---|
| ~/.openclaw/agents/main/sessions/ | 源会话JSONL文件 |
| ~/.openclaw/agents/main/sessions/sessions.json |
会话键 → UUID索引 |
| ~/.openclaw/workspace/memory/ | 输出每日记忆文件 |
| prompts/distill.txt | LLM提取提示模板 |
| offsets.json | 实时会话偏移跟踪器(运行时状态,自动创建) |
| gate-state.json | 每个会话的上下文门控状态(运行时状态,自动创建) |
LiteLLM 端点
提取调用发送到 http://localhost:4000/v1/chat/completions,使用模型 claude-opus-4-6。如有需要,可在 distill_transcript() 函数中更改模型或端点。
调度
批量提取(distill.py)
通过OpenClaw cron每天北京时间03:00执行。作为子代理在隔离会话中运行。
已知问题: 03:00的cron有600秒超时。约44个会话×LLM调用,会超时(观察到5次连续错误)。缓解措施:使用 --min-age-hours 48 减少批处理大小,或拆分为多个cron运行。
上下文门控(context-gate.py)
通过shell crontab每5分钟执行一次,07:00–22:00:
/5 7-22 /Users//.openclaw/scripts/cron-context-gate.sh
参考
变更日志
v0.5.1 — 2026-03-17
热修复:从仓库中移除运行时状态文件并添加.gitignore。
- - 添加了覆盖offsets.json、gate-state.json、captains-log-ingested.json、granola-ingested.json的.gitignore
- 从git跟踪中移除状态文件(git rm --cached)
v0.5.0 — 2026-03-17
可配置路径 + 标志别名 + 去标识化。
- - 添加了可配置路径常量块(MEETINGNOTESDIR、DAILYLOGDIR、DAILYLOGPATTERN、MEMORY_DIR)——通过环境变量覆盖,无需编辑脚本
- 添加了 --meeting-notes 标志作为 --granola 的推荐替代
- 添加了 --daily-log 标志作为 --captains-log 的推荐替代
- --granola 和 --captains-log 仍然有效,但现在会向stderr发出弃用警告
- 从文档(SKILL.md、ROADMAP.md)中移除了部署特定的标识符;将 / 替换为您自己的值
- 将 version 提升至 0.5.0
v0.4.1 — 2026-03-15
行为变更:--captains-log 模式现在在成功导入后将源文件移动到 memory/captains-log/ingested/(默认清理,与 --granola 模式一致)。
- - 添加了 --no-move 标志以保留源文件在原位
- 添加了 --trash 标志以将文件移至废纸篓而非移动
- 导入后 memory/ 保持干净——无冗余重复文件
- 回填:首次运行时将已有的预导入源文件移至 ingested/
v0.4.0 — 2026-03-14
新功能:船长日志AM/PM导入(--captains-log 模式,关闭#7)。
- - --captains-log 标志扫描 memory/ 中的 captains-log-YYYY-MM-DD-am.md 和 captains-log-YYYY-MM-DD-pm.md 文件
- 无需LLM调用——日志已经是结构化摘要,按原样追加
- 同一日期的AM在PM之前导入(保证时间顺序)
- 追加 ## Captains Log — AM (Morning Watch) / ## Captains Log — PM (Dog Watch) 部分
- captains-log-ingested.json 边车文件用于幂等重新运行
- 源文件保留——不移动或丢弃(参考工件)
- 支持 --limit N 用于积压处理
- 完成每日LTM流水线:AM日志 → Granola会议 → PM日志 → 会话提取
v0.3.1 — 2026-03-14
新功能:Granola会议记录提取(--granola 模式,关闭#4)。
- - --granola 标志扫描 memory/granola/ 并将未导入的会议记录提取到对应的 memory/YYYY-MM-DD.md 每日文件中
- 专用提示