Proxy Token Optimizer
Reduces LLM API costs for the openclaw-manager multi-tenant proxy platform through four strategies:
- 1. Model-tier routing — Route prompts to the cheapest capable model
- Heartbeat optimization — Cheapest model + longer intervals for heartbeat calls
- Context lazy loading — Load only the context files each prompt actually needs
- Platform usage analytics — Real data from PostgreSQL, not estimates
Why these strategies matter
The openclaw-manager platform proxies LLM requests for multiple OpenClaw instances through providers like zai-proxy, zai-coding-proxy, and kimi-coding-proxy. Each provider offers models at different price points (e.g., glm-4.7 vs glm-4.7-flashx). Without optimization, every request — including simple greetings and heartbeat pings — uses the default (expensive) model, and every session loads the full context regardless of need. These four strategies target the highest-impact cost drivers.
Quick start
All instance-side scripts run locally with no dependencies. Platform-side scripts need DB access.
CODEBLOCK0
Scripts reference
Instance-side (pure local, no network, no DB)
scripts/model_router.py
Routes prompts to the right model tier based on complexity analysis.
Tier logic:
- - cheap →
glm-4.7-flashx: Greetings, acknowledgments, heartbeats, cron jobs, log parsing. Cost savings: 5-10x vs standard. - standard →
glm-4.7: Code writing, debugging, explanations. Default for unclear prompts. - premium →
glm-4.7 (or k2p5 for kimi): Architecture design, deep analysis, strategy planning.
Supports Chinese and English patterns. Provider-aware — works with zai-proxy, zai-coding-proxy, and kimi-coding-proxy.
CODEBLOCK1
scripts/context_optimizer.py
Analyzes prompt complexity to recommend which context files to load, reducing unnecessary token consumption.
Context levels:
| Level | When | Files loaded | Token savings |
|---|
| minimal | "hi", "thanks", short msgs | SOUL.md + IDENTITY.md (2) | ~80% |
| standard |
"write a function", normal work | + memory/TODAY.md + conditional | ~50% |
| full | "design architecture", complex tasks | + MEMORY.md + all conditional | ~30% |
Also generates an optimized AGENTS.md template with lazy-loading rules baked in:
CODEBLOCK2
scripts/heartbeat_config.py
Generates openclaw.json configuration patches for heartbeat optimization:
- - Forces heartbeat model to
glm-4.7-flashx (cheapest available) - Sets interval to 55 minutes (keeps prompt cache warm within 1-hour TTL, avoids cache rebuild cost)
CODEBLOCK3
Platform-side (requires DB connection)
These scripts query the usage_records PostgreSQL table for real data. Run from the openclaw-manager project root with the virtualenv activated.
scripts/usage_report.py
Generates usage reports from actual database records — not estimates.
CODEBLOCK4
Overview includes: total calls/tokens, per-provider breakdown, per-model breakdown, top 10 instances by consumption, 7-day daily trend.
Instance report includes: per-model distribution, daily trend, lifetime totals.
scripts/quota_advisor.py
Compares actual 24-hour usage against quota plan limits to find mismatches:
- - Wasteful: Usage below 20% of plan limit → suggest downgrade
- Throttled: Usage above 80% of plan limit → suggest upgrade
CODEBLOCK5
Unified CLI
INLINECODE21 wraps all the above into a single entry point:
CODEBLOCK6
Project integration points
This skill works with existing openclaw-manager infrastructure:
| Component | File | How this skill uses it |
|---|
| Provider config | INLINECODE22 | Model names/endpoints for routing |
| Proxy routing |
config_service.py | Where
_inject_proxy_providers() registers models |
| Usage recording |
proxy_common/usage_recorder.py | Source of real usage data |
| Quota plans |
config/llm_proxy.yaml | Plan definitions for quota advisor |
| Instance model |
app/models.py | Instance metadata for reports |
Expected savings
| Strategy | Mechanism | Impact |
|---|
| Context lazy loading | Fewer tokens per request | 50-80% context reduction |
| Model routing (flashx) |
Lower per-token price | 5-10x on simple tasks |
| Heartbeat → flashx | Lower heartbeat cost | Significant per-instance savings |
| Heartbeat interval 55min | Fewer API calls | ~45% fewer heartbeat calls |
Proxy Token Optimizer
通过四种策略降低openclaw-manager多租户代理平台的LLM API成本:
- 1. 模型层级路由 — 将提示路由到最便宜且能胜任的模型
- 心跳优化 — 对心跳调用使用最便宜模型 + 更长间隔
- 上下文懒加载 — 仅加载每个提示实际需要的上下文文件
- 平台使用分析 — 基于PostgreSQL的真实数据,而非估算
这些策略为何重要
openclaw-manager平台通过zai-proxy、zai-coding-proxy和kimi-coding-proxy等提供商为多个OpenClaw实例代理LLM请求。每个提供商提供不同价位的模型(例如glm-4.7 vs glm-4.7-flashx)。未经优化时,每个请求——包括简单的问候和心跳ping——都使用默认(昂贵)模型,且每个会话无论是否需要都加载完整上下文。这四种策略针对影响最大的成本驱动因素。
快速开始
所有实例端脚本在本地运行,无依赖项。平台端脚本需要数据库访问。
bash
模型路由 — 哪个模型应处理此提示?
python3 scripts/model_router.py 谢谢!
→ {tier: cheap, recommended_model: zai-proxy/glm-4.7-flashx}
上下文优化 — 此提示需要哪些文件?
python3 scripts/context_optimizer.py recommend 你好
→ {contextlevel: minimal, recommendedfiles: [SOUL.md, IDENTITY.md]}
心跳配置 — 生成openclaw.json补丁
python3 scripts/heartbeat_config.py patch
→ {agents: {defaults: {heartbeat: {every: 55m, model: zai-proxy/glm-4.7-flashx}}}}
统一CLI(所有命令集中一处)
python3 scripts/cli.py --help
脚本参考
实例端(纯本地,无网络,无数据库)
scripts/model_router.py
基于复杂度分析将提示路由到正确的模型层级。
层级逻辑:
- - cheap → glm-4.7-flashx:问候、确认、心跳、定时任务、日志解析。成本节省:相比标准模型5-10倍。
- standard → glm-4.7:代码编写、调试、解释。不明确提示的默认选项。
- premium → glm-4.7(或kimi的k2p5):架构设计、深度分析、策略规划。
支持中英文模式。可感知提供商——兼容zai-proxy、zai-coding-proxy和kimi-coding-proxy。
bash
python3 scripts/model_router.py <提示> [提供商]
python3 scripts/model_router.py compare # 显示所有提供商模型
scripts/context_optimizer.py
分析提示复杂度,推荐加载哪些上下文文件,减少不必要的token消耗。
上下文级别:
| 级别 | 适用场景 | 加载文件 | Token节省 |
|---|
| minimal | 你好、谢谢、短消息 | SOUL.md + IDENTITY.md(2个) | ~80% |
| standard |
写一个函数、正常工作 | + memory/TODAY.md + 条件性文件 | ~50% |
| full | 设计架构、复杂任务 | + MEMORY.md + 所有条件性文件 | ~30% |
同时生成内置懒加载规则的优化版AGENTS.md模板:
bash
python3 scripts/context_optimizer.py recommend <提示>
python3 scripts/context_optimizer.py generate-agents # 创建AGENTS.md.optimized
scripts/heartbeat_config.py
生成用于心跳优化的openclaw.json配置补丁:
- - 强制心跳模型为glm-4.7-flashx(最便宜可用模型)
- 设置间隔为55分钟(在1小时TTL内保持提示缓存温暖,避免缓存重建成本)
bash
python3 scripts/heartbeat_config.py recommend [缓存TTL分钟数]
python3 scripts/heartbeat_config.py patch # 输出openclaw.json的JSON补丁
平台端(需要数据库连接)
这些脚本查询usage_records PostgreSQL表获取真实数据。在openclaw-manager项目根目录下使用激活的虚拟环境运行。
scripts/usage_report.py
从实际数据库记录生成使用报告——而非估算。
bash
python3 scripts/usage_report.py overview [天数] # 平台级摘要
python3 scripts/usage_report.py instance <名称> [天数] # 单个实例详情
摘要包括: 总调用次数/token数、按提供商细分、按模型细分、消费前10实例、7天每日趋势。
实例报告包括: 按模型分布、每日趋势、生命周期总计。
scripts/quota_advisor.py
将实际24小时使用量与配额计划限制进行比较,发现不匹配:
- - 浪费型: 使用量低于计划限制的20% → 建议降级
- 受限型: 使用量超过计划限制的80% → 建议升级
bash
python3 scripts/quota_advisor.py analyze # 检查所有实例
python3 scripts/quota_advisor.py plans # 显示可用配额计划
统一CLI
scripts/cli.py将上述所有功能封装为单一入口点:
bash
python3 scripts/cli.py route <提示> # 模型路由
python3 scripts/cli.py context <提示> # 上下文推荐
python3 scripts/cli.py generate-agents # 生成AGENTS.md
python3 scripts/cli.py heartbeat # 心跳配置
python3 scripts/cli.py overview [天数] # 平台使用(需要数据库)
python3 scripts/cli.py report <名称> [天数] # 实例报告(需要数据库)
python3 scripts/cli.py advisor # 配额建议(需要数据库)
项目集成点
此技能与现有openclaw-manager基础设施配合使用:
| 组件 | 文件 | 此技能如何使用 |
|---|
| 提供商配置 | config/model.yaml | 用于路由的模型名称/端点 |
| 代理路由 |
config
service.py | inject
proxyproviders()注册模型的位置 |
| 使用记录 | proxy
common/usagerecorder.py | 真实使用数据的来源 |
| 配额计划 | config/llm_proxy.yaml | 配额顾问的计划定义 |
| 实例模型 | app/models.py | 报告的实例元数据 |
预期节省
| 策略 | 机制 | 影响 |
|---|
| 上下文懒加载 | 每次请求更少的token | 上下文减少50-80% |
| 模型路由(flashx) |
更低的每token价格 | 简单任务5-10倍 |
| 心跳 → flashx | 更低的心跳成本 | 每个实例显著节省 |
| 心跳间隔55分钟 | 更少的API调用 | 心跳调用减少约45% |