Token Optimizer

Comprehensive toolkit for reducing token usage and API costs in OpenClaw deployments. Combines smart model routing, optimized heartbeat intervals, usage tracking, and multi-provider strategies.

Quick Start

Immediate actions (no config changes needed):

1. Generate optimized AGENTS.md (BIGGEST WIN!):

CODEBLOCK0

2. Check what context you ACTUALLY need:

CODEBLOCK1

3. Install optimized heartbeat:

CODEBLOCK2

4. Enforce cheaper models for casual chat:

CODEBLOCK3

5. Check current token budget:

CODEBLOCK4

Expected savings: 50-80% reduction in token costs for typical workloads (context optimization is the biggest factor!).

Core Capabilities

0. Lazy Skill Loading (NEW in v3.0 — BIGGEST WIN!)

The single highest-impact optimization available. Most agents burn 3,000–15,000 tokens per session loading skill files they never use. Stop that first.

The pattern:

1. Create a lightweight SKILLS.md catalog in your workspace (~300 tokens — list of skills + when to load them)
Only load individual SKILL.md files when a task actually needs them
Apply the same logic to memory files — load MEMORY.md at startup, daily logs only on demand

Token savings:

Library size	Before (eager)	After (lazy)	Savings
5 skills	~3,000 tokens	~600 tokens	80%
10 skills

Quick implementation in AGENTS.md:

CODEBLOCK5

Full implementation (with catalog template + optimizer script):

CODEBLOCK6

The companion skill openclaw-skill-lazy-loader includes a SKILLS.md.template, an AGENTS.md.template lazy-loading section, and a context_optimizer.py CLI that recommends exactly which skills to load for any given task.

Lazy loading handles context loading costs. The remaining capabilities below handle runtime costs. Together they cover the full token lifecycle.

1. Context Optimization (NEW!)

Biggest token saver — Only load files you actually need, not everything upfront.

Problem: Default OpenClaw loads ALL context files every session:

- SOUL.md, AGENTS.md, USER.md, TOOLS.md, MEMORY.md
docs//.md (hundreds of files)
memory/2026-.md (daily logs)
Total: Often 50K+ tokens before user even speaks!

Solution: Lazy loading based on prompt complexity.

Usage:
CODEBLOCK7

Examples:
CODEBLOCK8

Output format:
CODEBLOCK9

Integration pattern:
Before loading context for a new session:
CODEBLOCK10

Generate optimized AGENTS.md:
CODEBLOCK11

Expected savings: 50-80% reduction in context tokens.

2. Smart Model Routing (ENHANCED!)

Automatically classify tasks and route to appropriate model tiers.

NEW: Communication pattern enforcement — Never waste Opus tokens on "hi" or "thanks"!

Usage:
CODEBLOCK12

Examples:
CODEBLOCK13

Patterns enforced to Haiku (NEVER Sonnet/Opus):

Communication:

- Greetings: hi, hey, hello, yo
Thanks: thanks, thank you, thx
Acknowledgments: ok, sure, got it, understood
Short responses: yes, no, yep, nope
Single words or very short phrases

Background tasks:

- Heartbeat checks: "check email", "monitor servers"
Cronjobs: "scheduled task", "periodic check", "reminder"
Document parsing: "parse CSV", "extract data from log", "read JSON"
Log scanning: "scan error logs", "process logs"

Integration pattern:
CODEBLOCK14

Customization:
Edit ROUTING_RULES or COMMUNICATION_PATTERNS in scripts/model_router.py to adjust patterns and keywords.

3. Heartbeat Optimization

Reduce API calls from heartbeat polling with smart interval tracking:

Setup:
CODEBLOCK15

Commands:
CODEBLOCK16

How it works:

- Tracks last check time for each type (email, calendar, weather, etc.)
Enforces minimum intervals before re-checking
Respects quiet hours (23:00-08:00) — skips all checks
Returns HEARTBEAT_OK when nothing needs attention (saves tokens)

Default intervals:

- Email: 60 minutes
Calendar: 2 hours
Weather: 4 hours
Social: 2 hours
Monitoring: 30 minutes

Integration in HEARTBEAT.md:
CODEBLOCK17

Expected savings: 50% reduction in heartbeat API calls.

Model enforcement: Heartbeat should ALWAYS use Haiku — see updated HEARTBEAT.template.md for model override instructions.

4. Cronjob Optimization (NEW!)

Problem: Cronjobs often default to expensive models (Sonnet/Opus) even for routine tasks.

Solution: Always specify Haiku for 90% of scheduled tasks.

See: assets/cronjob-model-guide.md for comprehensive guide with examples.

Quick reference:

Task Type	Model	Example
Monitoring/alerts	Haiku	Check server health, disk space
Data parsing

Example (good):
CODEBLOCK18

Example (bad):
CODEBLOCK19

Savings: Using Haiku instead of Opus for 10 daily cronjobs = $17.70/month saved per agent.

Integration with model_router:
CODEBLOCK20

5. Token Budget Tracking

Monitor usage and alert when approaching limits:

Setup:
CODEBLOCK21

Output format:
CODEBLOCK22

Status levels:

- ok: Below 80% of daily limit
INLINECODE12: 80-99% of daily limit
INLINECODE13: Over daily limit

Integration pattern:
Before starting expensive operations, check budget:
CODEBLOCK23

Customization:
Edit daily_limit_usd and warn_threshold parameters in function calls.

6. Multi-Provider Strategy

See references/PROVIDERS.md for comprehensive guide on:

- Alternative providers (OpenRouter, Together.ai, Google AI Studio)
Cost comparison tables
Routing strategies by task complexity
Fallback chains for rate-limited scenarios
API key management

Quick reference:

Provider	Model	Cost/MTok	Use Case
Anthropic	Haiku 4	$0.25	Simple tasks
Anthropic

Configuration Patches

See assets/config-patches.json for advanced optimizations:

Implemented by this skill:

- ✅ Heartbeat optimization (fully functional)
✅ Token budget tracking (fully functional)
✅ Model routing logic (fully functional)

Native OpenClaw 2026.2.15 — apply directly:

- ✅ Session pruning (contextPruning: cache-ttl) — auto-trims old tool results after Anthropic cache TTL expires
✅ Bootstrap size limits (bootstrapMaxChars / bootstrapTotalMaxChars) — caps workspace file injection size
✅ Cache retention long (cacheRetention: "long" for Opus) — amortizes cache write costs

Requires OpenClaw core support:

- ⏳ Prompt caching (Anthropic API feature — verify current status)
⏳ Lazy context loading (use context_optimizer.py script today)
⏳ Multi-provider fallback (partially supported)

Apply config patches:
CODEBLOCK24

Native OpenClaw Diagnostics (2026.2.15+)

OpenClaw 2026.2.15 added built-in commands that complement this skill's Python scripts. Use these first for quick diagnostics before reaching for the scripts.

Context breakdown

/context list    → token count per injected file (shows exactly what's eating your prompt)
/context detail  → full breakdown including tools, skills, and system prompt sections

Use before applying bootstrap_size_limits — see which files are oversized, then set bootstrapMaxChars accordingly.

Per-response usage tracking

/usage tokens    → append token count to every reply
/usage full      → append tokens + cost estimate to every reply
/usage cost      → show cumulative cost summary from session logs
/usage off       → disable usage footer

Combine with token_tracker.py — /usage cost gives session totals; token_tracker.py tracks daily budget.

Session status

/status          → model, context %, last response tokens, estimated cost

Cache TTL Heartbeat Alignment (NEW in v1.4.0)

The problem: Anthropic charges ~3.75x more for cache writes than cache reads. If your agent goes idle and the 1h cache TTL expires, the next request re-writes the entire prompt cache — expensive.

The fix: Set heartbeat interval to 55min (just under the 1h TTL). The heartbeat keeps the cache warm, so every subsequent request pays cache-read rates instead.

CODEBLOCK28

Apply to your OpenClaw config:
CODEBLOCK29

Who benefits: Anthropic API key users only. OAuth profiles already default to 1h heartbeat (OpenClaw smart default). API key profiles default to 30min — bumping to 55min is both cheaper (fewer calls) and cache-warm.

Deployment Patterns

For Personal Use

1. Install optimized INLINECODE28
Run budget checks before expensive operations
Manually route complex tasks to Opus only when needed

Expected savings: 20-30%

For Managed Hosting (xCloud, etc.)

1. Default all agents to Haiku
Route user interactions to Sonnet
Reserve Opus for explicitly complex requests
Use Gemini Flash for background operations
Implement daily budget caps per customer

Expected savings: 40-60%

For High-Volume Deployments

1. Use multi-provider fallback (OpenRouter + Together.ai)
Implement aggressive routing (80% Gemini, 15% Haiku, 5% Sonnet)
Deploy local Ollama for offline/cheap operations
Batch heartbeat checks (every 2-4 hours, not 30 min)

Expected savings: 70-90%

Integration Examples

Workflow: Smart Task Handling

CODEBLOCK30

Workflow: Optimized Heartbeat

CODEBLOCK31

Troubleshooting

Issue: Scripts fail with "module not found"

- Fix: Ensure Python 3.7+ is installed. Scripts use only stdlib.

Issue: State files not persisting

- Fix: Check that ~/.openclaw/workspace/memory/ directory exists and is writable.

Issue: Budget tracking shows $0.00

- Fix: token_tracker.py needs integration with OpenClaw's session_status tool. Currently tracks manually recorded usage.

Issue: Routing suggests wrong model tier

- Fix: Customize ROUTING_RULES in model_router.py for your specific patterns.

Maintenance

Daily:

- Check budget status: INLINECODE34

Weekly:

- Review routing accuracy (are suggestions correct?)
Adjust heartbeat intervals based on activity

Monthly:

- Compare costs before/after optimization
Review and update PROVIDERS.md with new options

Cost Estimation

Example: 100K tokens/day workload

Without skill:

- 50K context tokens + 50K conversation tokens = 100K total
All Sonnet: 100K × $3/MTok = $0.30/day = $9/month

Strategy	Context	Model	Daily Cost	Monthly	Savings
Baseline (no optimization)	50K	Sonnet	$0.30	$9.00	0%
Context opt only

10K (-80%) | Sonnet | $0.18 | $5.40 | 40% |
| Model routing only | 50K | Mixed | $0.18 | $5.40 | 40% |
| Both (this skill) | 10K | Mixed | $0.09 | $2.70 | 70% |
| Aggressive + Gemini | 10K | Gemini | $0.03 | $0.90 | 90% |

Key insight: Context optimization (50K → 10K tokens) saves MORE than model routing!

xCloud hosting scenario (100 customers, 50K tokens/customer/day):

- Baseline (all Sonnet, full context): $450/month
With token-optimizer: $135/month
Savings: $315/month per 100 customers (70%)

Resources

Scripts (4 total)

- context_optimizer.py — Context loading optimization and lazy loading (NEW!)
model_router.py — Task classification, model suggestions, and communication enforcement (ENHANCED!)
heartbeat_optimizer.py — Interval management and check scheduling
token_tracker.py — Budget monitoring and alerts

References

- PROVIDERS.md — Alternative AI providers, pricing, and routing strategies

Assets (3 total)

- HEARTBEAT.template.md — Drop-in optimized heartbeat template with Haiku enforcement (ENHANCED!)
cronjob-model-guide.md — Complete guide for choosing models in cronjobs (NEW!)
config-patches.json — Advanced configuration examples

Future Enhancements

Ideas for extending this skill:

1. Auto-routing integration — Hook into OpenClaw message pipeline
Real-time usage tracking — Parse session_status automatically
Cost forecasting — Predict monthly spend based on recent usage
Provider health monitoring — Track API latency and failures
A/B testing — Compare quality across different routing strategies

Token Optimizer

用于减少OpenClaw部署中令牌使用和API成本的综合工具包。结合智能模型路由、优化的心跳间隔、使用跟踪和多提供商策略。

快速开始

立即操作（无需更改配置）：

1. 生成优化的AGENTS.md（最大收益！）：

bash python3 scripts/context_optimizer.py generate-agents # 创建 AGENTS.md.optimized — 审查并替换当前的 AGENTS.md

2. 检查你实际需要的上下文：

bash python3 scripts/context_optimizer.py recommend hi, how are you? # 显示：仅需2个文件（而不是50+！）

3. 安装优化的心跳：

bash cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md

4. 对闲聊强制使用更便宜的模型：

bash python3 scripts/model_router.py thanks! # 单提供商Anthropic设置：使用Sonnet，而非Opus # 多提供商设置（OpenRouter/Together）：使用Haiku以获得最大节省

5. 检查当前令牌预算：

bash python3 scripts/token_tracker.py check

预期节省： 对于典型工作负载，令牌成本降低50-80%（上下文优化是最大因素！）。

核心能力

0. 懒加载技能（v3.0新增 — 最大收益！）

可用的最高影响优化。 大多数代理每次会话加载他们从不使用的技能文件，消耗3,000–15,000个令牌。首先阻止这种情况。

模式：

1. 在工作区创建一个轻量级的SKILLS.md目录（约300个令牌 — 技能列表 + 何时加载它们）
仅在任务实际需要时加载单个SKILL.md文件
对内存文件应用相同逻辑 — 启动时加载MEMORY.md，按需加载每日日志

令牌节省：

库大小	之前（急切）	之后（懒加载）	节省
5个技能	~3,000令牌	~600令牌	80%
10个技能

~6,500令牌 | ~750令牌 | 88% |
| 20个技能 | ~13,000令牌 | ~900令牌 | 93% |

在AGENTS.md中快速实现：

markdown

技能

会话开始时：读取SKILLS.md（仅索引 — 约300个令牌）。
仅在任务需要时加载单个技能文件。
永远不要预先加载所有技能。

完整实现（包含目录模板 + 优化脚本）：

bash
clawhub install openclaw-skill-lazy-loader

配套技能openclaw-skill-lazy-loader包含一个SKILLS.md.template、一个AGENTS.md.template懒加载部分，以及一个context_optimizer.py CLI，它可以为任何给定任务推荐需要加载的确切技能。

懒加载处理上下文加载成本。以下能力处理运行时成本。 它们共同覆盖完整的令牌生命周期。

1. 上下文优化（新增！）

最大的令牌节省器 — 只加载你实际需要的文件，而不是预先加载所有内容。

问题： 默认OpenClaw每次会话加载所有上下文文件：

- SOUL.md, AGENTS.md, USER.md, TOOLS.md, MEMORY.md
docs//.md（数百个文件）
memory/2026-.md（每日日志）
总计：用户说话前经常超过50K令牌！

解决方案： 基于提示复杂度的懒加载。

用法：
bash
python3 scripts/context_optimizer.py recommend <用户提示>

示例：
bash

简单问候 → 最小上下文（仅2个文件！）

context_optimizer.py recommend hi
→ 加载：SOUL.md, IDENTITY.md
→ 跳过：其他所有内容
→ 节省：约80%的上下文

标准工作 → 选择性加载

context_optimizer.py recommend write a function → 加载：SOUL.md, IDENTITY.md, memory/TODAY.md → 跳过：docs, 旧内存, 知识库 → 节省：约50%的上下文

复杂任务 → 完整上下文

context_optimizer.py recommend analyze our entire architecture → 加载：SOUL.md, IDENTITY.md, MEMORY.md, memory/TODAY+YESTERDAY.md → 条件加载：仅相关文档 → 节省：约30%的上下文

输出格式：
json
{
complexity: simple,
context_level: minimal,
recommended_files: [SOUL.md, IDENTITY.md],
file_count: 2,
savings_percent: 80,
skip_patterns: [docs//.md, memory/20.md]
}

集成模式：
在为新会话加载上下文之前：
python
from contextoptimizer import recommendcontext_bundle

user_prompt = thanks for your help
recommendation = recommendcontextbundle(user_prompt)

if recommendation[context_level] == minimal:
# 仅加载 SOUL.md + IDENTITY.md
# 跳过其他所有内容
# 节省约80%令牌！

生成优化的AGENTS.md：
bash
context_optimizer.py generate-agents

创建带有懒加载指令的 AGENTS.md.optimized

审查并替换当前的 AGENTS.md

预期节省： 上下文令牌减少50-80%。

2. 智能模型路由（增强版！）

自动分类任务并路由到适当的模型层级。

新增：通信模式强制 — 永远不要在hi或thanks上浪费Opus令牌！

用法：
bash
python3 scripts/model_router.py <用户提示> [当前模型] [强制层级]

示例：
bash

通信（新增！）→ 始终使用Haiku

python3 scripts/model_router.py thanks!
python3 scripts/model_router.py hi
python3 scripts/model_router.py ok got it
→ 强制：Haiku（闲聊永远不使用Sonnet/Opus）

简单任务 → 建议Haiku

python3 scripts/model_router.py read the log file

中等任务 → 建议Sonnet

python3 scripts/model_router.py write a function to parse JSON

复杂任务 → 建议Opus

python3 scripts/model_router.py design a microservices architecture

强制使用Haiku的模式（永远不使用Sonnet/Opus）：

通信：

- 问候：hi, hey, hello, yo
感谢：thanks, thank you, thx
确认：ok, sure, got it, understood
简短回复：yes, no, yep, nope
单个词或非常短的短语

后台任务：

- 心跳检查：check email, monitor servers
定时任务：scheduled task, periodic check, reminder
文档解析：parse CSV, extract data from log, read JSON
日志扫描：scan error logs, process logs

集成模式：
python
from modelrouter import routetask

user_prompt = show me the config
routing = routetask(userprompt)

if routing[should_switch]:
# 使用 routing[recommended_model]
# 节省 routing[costsavingspercent]

自定义：
编辑scripts/modelrouter.py中的ROUTINGRULES或COMMUNICATION_PATTERNS以调整模式和关键词。

3. 心跳优化

通过智能间隔跟踪减少心跳轮询的API调用：

设置：
bash

复制模板到工作区

cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md

规划哪些检查应该运行

python3 scripts/heartbeat_optimizer.py plan

命令：
bash

检查特定类型是否应该现在运行

heartbeat_optimizer.py check email
heartbeat_optimizer.py check calendar

记录已执行的检查

heartbeat_optimizer.py record email

更新检查间隔（秒）

heartbeat_optimizer.py interval email 7200 # 2小时

重置状态

heartbeat_optimizer.py reset

工作原理：

- 跟踪每种类型的最后检查时间（email, calendar, weather等）
在重新检查前强制执行最小间隔
尊重静默时间（23:00-08:00）— 跳过所有检查
当无需关注时返回HEARTBEAT_OK（节省令牌）

默认间隔：

- 电子邮件：60分钟
日历：2小时
天气：4小时
社交

token-optimizer令牌优化器

token-optimizer

Token Optimizer

Quick Start

Core Capabilities

0. Lazy Skill Loading (NEW in v3.0 — BIGGEST WIN!)

1. Context Optimization (NEW!)

2. Smart Model Routing (ENHANCED!)

3. Heartbeat Optimization

4. Cronjob Optimization (NEW!)

5. Token Budget Tracking

6. Multi-Provider Strategy

Configuration Patches

Native OpenClaw Diagnostics (2026.2.15+)

Context breakdown

Per-response usage tracking

Session status

Cache TTL Heartbeat Alignment (NEW in v1.4.0)

Deployment Patterns

For Personal Use

For Managed Hosting (xCloud, etc.)

For High-Volume Deployments

Integration Examples

Workflow: Smart Task Handling

Workflow: Optimized Heartbeat

Troubleshooting

Maintenance

Cost Estimation

Resources

Scripts (4 total)

References

Assets (3 total)

Future Enhancements

Token Optimizer

快速开始

核心能力

0. 懒加载技能（v3.0新增 — 最大收益！）

技能

1. 上下文优化（新增！）

简单问候 → 最小上下文（仅2个文件！）

标准工作 → 选择性加载

复杂任务 → 完整上下文

创建带有懒加载指令的 AGENTS.md.optimized

审查并替换当前的 AGENTS.md

2. 智能模型路由（增强版！）

通信（新增！）→ 始终使用Haiku

简单任务 → 建议Haiku

中等任务 → 建议Sonnet

复杂任务 → 建议Opus

3. 心跳优化

复制模板到工作区

规划哪些检查应该运行

检查特定类型是否应该现在运行

记录已执行的检查

更新检查间隔（秒）

重置状态

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement