Token Optimizer
Comprehensive toolkit for reducing token usage and API costs in OpenClaw deployments. Combines smart model routing, optimized heartbeat intervals, usage tracking, and multi-provider strategies.
Quick Start
Immediate actions (no config changes needed):
- 1. Generate optimized AGENTS.md (BIGGEST WIN!):
CODEBLOCK0
- 2. Check what context you ACTUALLY need:
CODEBLOCK1
- 3. Install optimized heartbeat:
CODEBLOCK2
- 4. Enforce cheaper models for casual chat:
CODEBLOCK3
- 5. Check current token budget:
CODEBLOCK4
Expected savings: 50-80% reduction in token costs for typical workloads (context optimization is the biggest factor!).
Core Capabilities
0. Lazy Skill Loading (NEW in v3.0 — BIGGEST WIN!)
The single highest-impact optimization available. Most agents burn 3,000–15,000 tokens per session loading skill files they never use. Stop that first.
The pattern:
- 1. Create a lightweight
SKILLS.md catalog in your workspace (~300 tokens — list of skills + when to load them) - Only load individual SKILL.md files when a task actually needs them
- Apply the same logic to memory files — load MEMORY.md at startup, daily logs only on demand
Token savings:
| Library size | Before (eager) | After (lazy) | Savings |
|---|
| 5 skills | ~3,000 tokens | ~600 tokens | 80% |
| 10 skills |
~6,500 tokens | ~750 tokens |
88% |
| 20 skills | ~13,000 tokens | ~900 tokens |
93% |
Quick implementation in AGENTS.md:
CODEBLOCK5
Full implementation (with catalog template + optimizer script):
CODEBLOCK6
The companion skill openclaw-skill-lazy-loader includes a SKILLS.md.template, an AGENTS.md.template lazy-loading section, and a context_optimizer.py CLI that recommends exactly which skills to load for any given task.
Lazy loading handles context loading costs. The remaining capabilities below handle runtime costs. Together they cover the full token lifecycle.
1. Context Optimization (NEW!)
Biggest token saver — Only load files you actually need, not everything upfront.
Problem: Default OpenClaw loads ALL context files every session:
- - SOUL.md, AGENTS.md, USER.md, TOOLS.md, MEMORY.md
- docs//.md (hundreds of files)
- memory/2026-.md (daily logs)
- Total: Often 50K+ tokens before user even speaks!
Solution: Lazy loading based on prompt complexity.
Usage:
CODEBLOCK7
Examples:
CODEBLOCK8
Output format:
CODEBLOCK9
Integration pattern:
Before loading context for a new session:
CODEBLOCK10
Generate optimized AGENTS.md:
CODEBLOCK11
Expected savings: 50-80% reduction in context tokens.
2. Smart Model Routing (ENHANCED!)
Automatically classify tasks and route to appropriate model tiers.
NEW: Communication pattern enforcement — Never waste Opus tokens on "hi" or "thanks"!
Usage:
CODEBLOCK12
Examples:
CODEBLOCK13
Patterns enforced to Haiku (NEVER Sonnet/Opus):
Communication:
- - Greetings: hi, hey, hello, yo
- Thanks: thanks, thank you, thx
- Acknowledgments: ok, sure, got it, understood
- Short responses: yes, no, yep, nope
- Single words or very short phrases
Background tasks:
- - Heartbeat checks: "check email", "monitor servers"
- Cronjobs: "scheduled task", "periodic check", "reminder"
- Document parsing: "parse CSV", "extract data from log", "read JSON"
- Log scanning: "scan error logs", "process logs"
Integration pattern:
CODEBLOCK14
Customization:
Edit ROUTING_RULES or COMMUNICATION_PATTERNS in scripts/model_router.py to adjust patterns and keywords.
3. Heartbeat Optimization
Reduce API calls from heartbeat polling with smart interval tracking:
Setup:
CODEBLOCK15
Commands:
CODEBLOCK16
How it works:
- - Tracks last check time for each type (email, calendar, weather, etc.)
- Enforces minimum intervals before re-checking
- Respects quiet hours (23:00-08:00) — skips all checks
- Returns
HEARTBEAT_OK when nothing needs attention (saves tokens)
Default intervals:
- - Email: 60 minutes
- Calendar: 2 hours
- Weather: 4 hours
- Social: 2 hours
- Monitoring: 30 minutes
Integration in HEARTBEAT.md:
CODEBLOCK17
Expected savings: 50% reduction in heartbeat API calls.
Model enforcement: Heartbeat should ALWAYS use Haiku — see updated HEARTBEAT.template.md for model override instructions.
4. Cronjob Optimization (NEW!)
Problem: Cronjobs often default to expensive models (Sonnet/Opus) even for routine tasks.
Solution: Always specify Haiku for 90% of scheduled tasks.
See: assets/cronjob-model-guide.md for comprehensive guide with examples.
Quick reference:
| Task Type | Model | Example |
|---|
| Monitoring/alerts | Haiku | Check server health, disk space |
| Data parsing |
Haiku | Extract CSV/JSON/logs |
| Reminders | Haiku | Daily standup, backup reminders |
| Simple reports | Haiku | Status summaries |
| Content generation | Sonnet | Blog summaries (quality matters) |
| Deep analysis | Sonnet | Weekly insights |
| Complex reasoning | Never use Opus for cronjobs |
Example (good):
CODEBLOCK18
Example (bad):
CODEBLOCK19
Savings: Using Haiku instead of Opus for 10 daily cronjobs = $17.70/month saved per agent.
Integration with model_router:
CODEBLOCK20
5. Token Budget Tracking
Monitor usage and alert when approaching limits:
Setup:
CODEBLOCK21
Output format:
CODEBLOCK22
Status levels:
- -
ok: Below 80% of daily limit - INLINECODE12 : 80-99% of daily limit
- INLINECODE13 : Over daily limit
Integration pattern:
Before starting expensive operations, check budget:
CODEBLOCK23
Customization:
Edit daily_limit_usd and warn_threshold parameters in function calls.
6. Multi-Provider Strategy
See references/PROVIDERS.md for comprehensive guide on:
- - Alternative providers (OpenRouter, Together.ai, Google AI Studio)
- Cost comparison tables
- Routing strategies by task complexity
- Fallback chains for rate-limited scenarios
- API key management
Quick reference:
| Provider | Model | Cost/MTok | Use Case |
|---|
| Anthropic | Haiku 4 | $0.25 | Simple tasks |
| Anthropic |
Sonnet 4.5 | $3.00 | Balanced default |
| Anthropic | Opus 4 | $15.00 | Complex reasoning |
| OpenRouter | Gemini 2.5 Flash | $0.075 | Bulk operations |
| Google AI | Gemini 2.0 Flash Exp | FREE | Dev/testing |
| Together | Llama 3.3 70B | $0.18 | Open alternative |
Configuration Patches
See assets/config-patches.json for advanced optimizations:
Implemented by this skill:
- - ✅ Heartbeat optimization (fully functional)
- ✅ Token budget tracking (fully functional)
- ✅ Model routing logic (fully functional)
Native OpenClaw 2026.2.15 — apply directly:
- - ✅ Session pruning (
contextPruning: cache-ttl) — auto-trims old tool results after Anthropic cache TTL expires - ✅ Bootstrap size limits (
bootstrapMaxChars / bootstrapTotalMaxChars) — caps workspace file injection size - ✅ Cache retention long (
cacheRetention: "long" for Opus) — amortizes cache write costs
Requires OpenClaw core support:
- - ⏳ Prompt caching (Anthropic API feature — verify current status)
- ⏳ Lazy context loading (use
context_optimizer.py script today) - ⏳ Multi-provider fallback (partially supported)
Apply config patches:
CODEBLOCK24
Native OpenClaw Diagnostics (2026.2.15+)
OpenClaw 2026.2.15 added built-in commands that complement this skill's Python scripts. Use these first for quick diagnostics before reaching for the scripts.
Context breakdown
/context list → token count per injected file (shows exactly what's eating your prompt)
/context detail → full breakdown including tools, skills, and system prompt sections
Use before applying bootstrap_size_limits — see which files are oversized, then set
bootstrapMaxChars accordingly.
Per-response usage tracking
/usage tokens → append token count to every reply
/usage full → append tokens + cost estimate to every reply
/usage cost → show cumulative cost summary from session logs
/usage off → disable usage footer
Combine with token_tracker.py —
/usage cost gives session totals;
token_tracker.py tracks daily budget.
Session status
/status → model, context %, last response tokens, estimated cost
Cache TTL Heartbeat Alignment (NEW in v1.4.0)
The problem: Anthropic charges ~3.75x more for cache writes than cache reads. If your agent goes idle and the 1h cache TTL expires, the next request re-writes the entire prompt cache — expensive.
The fix: Set heartbeat interval to 55min (just under the 1h TTL). The heartbeat keeps the cache warm, so every subsequent request pays cache-read rates instead.
CODEBLOCK28
Apply to your OpenClaw config:
CODEBLOCK29
Who benefits: Anthropic API key users only. OAuth profiles already default to 1h heartbeat (OpenClaw smart default). API key profiles default to 30min — bumping to 55min is both cheaper (fewer calls) and cache-warm.
Deployment Patterns
For Personal Use
- 1. Install optimized INLINECODE28
- Run budget checks before expensive operations
- Manually route complex tasks to Opus only when needed
Expected savings: 20-30%
For Managed Hosting (xCloud, etc.)
- 1. Default all agents to Haiku
- Route user interactions to Sonnet
- Reserve Opus for explicitly complex requests
- Use Gemini Flash for background operations
- Implement daily budget caps per customer
Expected savings: 40-60%
For High-Volume Deployments
- 1. Use multi-provider fallback (OpenRouter + Together.ai)
- Implement aggressive routing (80% Gemini, 15% Haiku, 5% Sonnet)
- Deploy local Ollama for offline/cheap operations
- Batch heartbeat checks (every 2-4 hours, not 30 min)
Expected savings: 70-90%
Integration Examples
Workflow: Smart Task Handling
CODEBLOCK30
Workflow: Optimized Heartbeat
CODEBLOCK31
Troubleshooting
Issue: Scripts fail with "module not found"
- - Fix: Ensure Python 3.7+ is installed. Scripts use only stdlib.
Issue: State files not persisting
- - Fix: Check that
~/.openclaw/workspace/memory/ directory exists and is writable.
Issue: Budget tracking shows $0.00
- - Fix:
token_tracker.py needs integration with OpenClaw's session_status tool. Currently tracks manually recorded usage.
Issue: Routing suggests wrong model tier
- - Fix: Customize
ROUTING_RULES in model_router.py for your specific patterns.
Maintenance
Daily:
- - Check budget status: INLINECODE34
Weekly:
- - Review routing accuracy (are suggestions correct?)
- Adjust heartbeat intervals based on activity
Monthly:
- - Compare costs before/after optimization
- Review and update
PROVIDERS.md with new options
Cost Estimation
Example: 100K tokens/day workload
Without skill:
- - 50K context tokens + 50K conversation tokens = 100K total
- All Sonnet: 100K × $3/MTok = $0.30/day = $9/month
| Strategy | Context | Model | Daily Cost | Monthly | Savings |
|---|
| Baseline (no optimization) | 50K | Sonnet | $0.30 | $9.00 | 0% |
| Context opt only |
10K (-80%) | Sonnet | $0.18 | $5.40 | 40% |
| Model routing only | 50K | Mixed | $0.18 | $5.40 | 40% |
|
Both (this skill) |
10K |
Mixed |
$0.09 |
$2.70 |
70% |
| Aggressive + Gemini | 10K | Gemini | $0.03 | $0.90 |
90% |
Key insight: Context optimization (50K → 10K tokens) saves MORE than model routing!
xCloud hosting scenario (100 customers, 50K tokens/customer/day):
- - Baseline (all Sonnet, full context): $450/month
- With token-optimizer: $135/month
- Savings: $315/month per 100 customers (70%)
Resources
Scripts (4 total)
- -
context_optimizer.py — Context loading optimization and lazy loading (NEW!) model_router.py — Task classification, model suggestions, and communication enforcement (ENHANCED!)heartbeat_optimizer.py — Interval management and check schedulingtoken_tracker.py — Budget monitoring and alerts
References
- -
PROVIDERS.md — Alternative AI providers, pricing, and routing strategies
Assets (3 total)
- -
HEARTBEAT.template.md — Drop-in optimized heartbeat template with Haiku enforcement (ENHANCED!) cronjob-model-guide.md — Complete guide for choosing models in cronjobs (NEW!)config-patches.json — Advanced configuration examples
Future Enhancements
Ideas for extending this skill:
- 1. Auto-routing integration — Hook into OpenClaw message pipeline
- Real-time usage tracking — Parse session_status automatically
- Cost forecasting — Predict monthly spend based on recent usage
- Provider health monitoring — Track API latency and failures
- A/B testing — Compare quality across different routing strategies
Token Optimizer
用于减少OpenClaw部署中令牌使用和API成本的综合工具包。结合智能模型路由、优化的心跳间隔、使用跟踪和多提供商策略。
快速开始
立即操作(无需更改配置):
- 1. 生成优化的AGENTS.md(最大收益!):
bash
python3 scripts/context_optimizer.py generate-agents
# 创建 AGENTS.md.optimized — 审查并替换当前的 AGENTS.md
- 2. 检查你实际需要的上下文:
bash
python3 scripts/context_optimizer.py recommend hi, how are you?
# 显示:仅需2个文件(而不是50+!)
- 3. 安装优化的心跳:
bash
cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
- 4. 对闲聊强制使用更便宜的模型:
bash
python3 scripts/model_router.py thanks!
# 单提供商Anthropic设置:使用Sonnet,而非Opus
# 多提供商设置(OpenRouter/Together):使用Haiku以获得最大节省
- 5. 检查当前令牌预算:
bash
python3 scripts/token_tracker.py check
预期节省: 对于典型工作负载,令牌成本降低50-80%(上下文优化是最大因素!)。
核心能力
0. 懒加载技能(v3.0新增 — 最大收益!)
可用的最高影响优化。 大多数代理每次会话加载他们从不使用的技能文件,消耗3,000–15,000个令牌。首先阻止这种情况。
模式:
- 1. 在工作区创建一个轻量级的SKILLS.md目录(约300个令牌 — 技能列表 + 何时加载它们)
- 仅在任务实际需要时加载单个SKILL.md文件
- 对内存文件应用相同逻辑 — 启动时加载MEMORY.md,按需加载每日日志
令牌节省:
| 库大小 | 之前(急切) | 之后(懒加载) | 节省 |
|---|
| 5个技能 | ~3,000令牌 | ~600令牌 | 80% |
| 10个技能 |
~6,500令牌 | ~750令牌 |
88% |
| 20个技能 | ~13,000令牌 | ~900令牌 |
93% |
在AGENTS.md中快速实现:
markdown
技能
会话开始时:读取SKILLS.md(仅索引 — 约300个令牌)。
仅在任务需要时加载单个技能文件。
永远不要预先加载所有技能。
完整实现(包含目录模板 + 优化脚本):
bash
clawhub install openclaw-skill-lazy-loader
配套技能openclaw-skill-lazy-loader包含一个SKILLS.md.template、一个AGENTS.md.template懒加载部分,以及一个context_optimizer.py CLI,它可以为任何给定任务推荐需要加载的确切技能。
懒加载处理上下文加载成本。以下能力处理运行时成本。 它们共同覆盖完整的令牌生命周期。
1. 上下文优化(新增!)
最大的令牌节省器 — 只加载你实际需要的文件,而不是预先加载所有内容。
问题: 默认OpenClaw每次会话加载所有上下文文件:
- - SOUL.md, AGENTS.md, USER.md, TOOLS.md, MEMORY.md
- docs//.md(数百个文件)
- memory/2026-.md(每日日志)
- 总计:用户说话前经常超过50K令牌!
解决方案: 基于提示复杂度的懒加载。
用法:
bash
python3 scripts/context_optimizer.py recommend <用户提示>
示例:
bash
简单问候 → 最小上下文(仅2个文件!)
context_optimizer.py recommend hi
→ 加载:SOUL.md, IDENTITY.md
→ 跳过:其他所有内容
→ 节省:约80%的上下文
标准工作 → 选择性加载
context_optimizer.py recommend write a function
→ 加载:SOUL.md, IDENTITY.md, memory/TODAY.md
→ 跳过:docs, 旧内存, 知识库
→ 节省:约50%的上下文
复杂任务 → 完整上下文
context_optimizer.py recommend analyze our entire architecture
→ 加载:SOUL.md, IDENTITY.md, MEMORY.md, memory/TODAY+YESTERDAY.md
→ 条件加载:仅相关文档
→ 节省:约30%的上下文
输出格式:
json
{
complexity: simple,
context_level: minimal,
recommended_files: [SOUL.md, IDENTITY.md],
file_count: 2,
savings_percent: 80,
skip_patterns: [docs//.md, memory/20.md]
}
集成模式:
在为新会话加载上下文之前:
python
from contextoptimizer import recommendcontext_bundle
user_prompt = thanks for your help
recommendation = recommendcontextbundle(user_prompt)
if recommendation[context_level] == minimal:
# 仅加载 SOUL.md + IDENTITY.md
# 跳过其他所有内容
# 节省约80%令牌!
生成优化的AGENTS.md:
bash
context_optimizer.py generate-agents
创建带有懒加载指令的 AGENTS.md.optimized
审查并替换当前的 AGENTS.md
预期节省: 上下文令牌减少50-80%。
2. 智能模型路由(增强版!)
自动分类任务并路由到适当的模型层级。
新增:通信模式强制 — 永远不要在hi或thanks上浪费Opus令牌!
用法:
bash
python3 scripts/model_router.py <用户提示> [当前模型] [强制层级]
示例:
bash
通信(新增!)→ 始终使用Haiku
python3 scripts/model_router.py thanks!
python3 scripts/model_router.py hi
python3 scripts/model_router.py ok got it
→ 强制:Haiku(闲聊永远不使用Sonnet/Opus)
简单任务 → 建议Haiku
python3 scripts/model_router.py read the log file
中等任务 → 建议Sonnet
python3 scripts/model_router.py write a function to parse JSON
复杂任务 → 建议Opus
python3 scripts/model_router.py design a microservices architecture
强制使用Haiku的模式(永远不使用Sonnet/Opus):
通信:
- - 问候:hi, hey, hello, yo
- 感谢:thanks, thank you, thx
- 确认:ok, sure, got it, understood
- 简短回复:yes, no, yep, nope
- 单个词或非常短的短语
后台任务:
- - 心跳检查:check email, monitor servers
- 定时任务:scheduled task, periodic check, reminder
- 文档解析:parse CSV, extract data from log, read JSON
- 日志扫描:scan error logs, process logs
集成模式:
python
from modelrouter import routetask
user_prompt = show me the config
routing = routetask(userprompt)
if routing[should_switch]:
# 使用 routing[recommended_model]
# 节省 routing[costsavingspercent]
自定义:
编辑scripts/modelrouter.py中的ROUTINGRULES或COMMUNICATION_PATTERNS以调整模式和关键词。
3. 心跳优化
通过智能间隔跟踪减少心跳轮询的API调用:
设置:
bash
复制模板到工作区
cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
规划哪些检查应该运行
python3 scripts/heartbeat_optimizer.py plan
命令:
bash
检查特定类型是否应该现在运行
heartbeat_optimizer.py check email
heartbeat_optimizer.py check calendar
记录已执行的检查
heartbeat_optimizer.py record email
更新检查间隔(秒)
heartbeat_optimizer.py interval email 7200 # 2小时
重置状态
heartbeat_optimizer.py reset
工作原理:
- - 跟踪每种类型的最后检查时间(email, calendar, weather等)
- 在重新检查前强制执行最小间隔
- 尊重静默时间(23:00-08:00)— 跳过所有检查
- 当无需关注时返回HEARTBEAT_OK(节省令牌)
默认间隔:
- - 电子邮件:60分钟
- 日历:2小时
- 天气:4小时
- 社交