Proxy Token Optimizer

Reduces LLM API costs for the openclaw-manager multi-tenant proxy platform through four strategies:

1. Model-tier routing — Route prompts to the cheapest capable model
Heartbeat optimization — Cheapest model + longer intervals for heartbeat calls
Context lazy loading — Load only the context files each prompt actually needs
Platform usage analytics — Real data from PostgreSQL, not estimates

Why these strategies matter

The openclaw-manager platform proxies LLM requests for multiple OpenClaw instances through providers like zai-proxy, zai-coding-proxy, and kimi-coding-proxy. Each provider offers models at different price points (e.g., glm-4.7 vs glm-4.7-flashx). Without optimization, every request — including simple greetings and heartbeat pings — uses the default (expensive) model, and every session loads the full context regardless of need. These four strategies target the highest-impact cost drivers.

Quick start

All instance-side scripts run locally with no dependencies. Platform-side scripts need DB access.

CODEBLOCK0

Scripts reference

Instance-side (pure local, no network, no DB)

`scripts/model_router.py`

Routes prompts to the right model tier based on complexity analysis.

Tier logic:

- cheap → glm-4.7-flashx: Greetings, acknowledgments, heartbeats, cron jobs, log parsing. Cost savings: 5-10x vs standard.
standard → glm-4.7: Code writing, debugging, explanations. Default for unclear prompts.
premium → glm-4.7 (or k2p5 for kimi): Architecture design, deep analysis, strategy planning.

Supports Chinese and English patterns. Provider-aware — works with zai-proxy, zai-coding-proxy, and kimi-coding-proxy.

CODEBLOCK1

`scripts/context_optimizer.py`

Analyzes prompt complexity to recommend which context files to load, reducing unnecessary token consumption.

Context levels:

Level	When	Files loaded	Token savings
minimal	"hi", "thanks", short msgs	SOUL.md + IDENTITY.md (2)	~80%
standard

Also generates an optimized AGENTS.md template with lazy-loading rules baked in:

CODEBLOCK2

`scripts/heartbeat_config.py`

Generates openclaw.json configuration patches for heartbeat optimization:

- Forces heartbeat model to glm-4.7-flashx (cheapest available)
Sets interval to 55 minutes (keeps prompt cache warm within 1-hour TTL, avoids cache rebuild cost)

CODEBLOCK3

Platform-side (requires DB connection)

These scripts query the usage_records PostgreSQL table for real data. Run from the openclaw-manager project root with the virtualenv activated.

`scripts/usage_report.py`

Generates usage reports from actual database records — not estimates.

CODEBLOCK4

Overview includes: total calls/tokens, per-provider breakdown, per-model breakdown, top 10 instances by consumption, 7-day daily trend.

Instance report includes: per-model distribution, daily trend, lifetime totals.

`scripts/quota_advisor.py`

Compares actual 24-hour usage against quota plan limits to find mismatches:

- Wasteful: Usage below 20% of plan limit → suggest downgrade
Throttled: Usage above 80% of plan limit → suggest upgrade

CODEBLOCK5

Unified CLI

INLINECODE21 wraps all the above into a single entry point:

CODEBLOCK6

Project integration points

This skill works with existing openclaw-manager infrastructure:

Component	File	How this skill uses it
Provider config	INLINECODE22	Model names/endpoints for routing
Proxy routing

Expected savings

Strategy	Mechanism	Impact
Context lazy loading	Fewer tokens per request	50-80% context reduction
Model routing (flashx)

Proxy Token Optimizer

通过四种策略降低openclaw-manager多租户代理平台的LLM API成本：

1. 模型层级路由 — 将提示路由到最便宜且能胜任的模型
心跳优化 — 对心跳调用使用最便宜模型 + 更长间隔
上下文懒加载 — 仅加载每个提示实际需要的上下文文件
平台使用分析 — 基于PostgreSQL的真实数据，而非估算

这些策略为何重要

openclaw-manager平台通过zai-proxy、zai-coding-proxy和kimi-coding-proxy等提供商为多个OpenClaw实例代理LLM请求。每个提供商提供不同价位的模型（例如glm-4.7 vs glm-4.7-flashx）。未经优化时，每个请求——包括简单的问候和心跳ping——都使用默认（昂贵）模型，且每个会话无论是否需要都加载完整上下文。这四种策略针对影响最大的成本驱动因素。

快速开始

所有实例端脚本在本地运行，无依赖项。平台端脚本需要数据库访问。

bash

模型路由 — 哪个模型应处理此提示？

python3 scripts/model_router.py 谢谢！

→ {tier: cheap, recommended_model: zai-proxy/glm-4.7-flashx}

上下文优化 — 此提示需要哪些文件？

python3 scripts/context_optimizer.py recommend 你好

→ {contextlevel: minimal, recommendedfiles: [SOUL.md, IDENTITY.md]}

心跳配置 — 生成openclaw.json补丁

python3 scripts/heartbeat_config.py patch

→ {agents: {defaults: {heartbeat: {every: 55m, model: zai-proxy/glm-4.7-flashx}}}}

统一CLI（所有命令集中一处）

python3 scripts/cli.py --help

脚本参考

实例端（纯本地，无网络，无数据库）

scripts/model_router.py

基于复杂度分析将提示路由到正确的模型层级。

层级逻辑：

- cheap → glm-4.7-flashx：问候、确认、心跳、定时任务、日志解析。成本节省：相比标准模型5-10倍。
standard → glm-4.7：代码编写、调试、解释。不明确提示的默认选项。
premium → glm-4.7（或kimi的k2p5）：架构设计、深度分析、策略规划。

支持中英文模式。可感知提供商——兼容zai-proxy、zai-coding-proxy和kimi-coding-proxy。

bash
python3 scripts/model_router.py <提示> [提供商]
python3 scripts/model_router.py compare # 显示所有提供商模型

scripts/context_optimizer.py

分析提示复杂度，推荐加载哪些上下文文件，减少不必要的token消耗。

上下文级别：

级别	适用场景	加载文件	Token节省
minimal	你好、谢谢、短消息	SOUL.md + IDENTITY.md（2个）	~80%
standard

同时生成内置懒加载规则的优化版AGENTS.md模板：

bash
python3 scripts/context_optimizer.py recommend <提示>
python3 scripts/context_optimizer.py generate-agents # 创建AGENTS.md.optimized

scripts/heartbeat_config.py

生成用于心跳优化的openclaw.json配置补丁：

- 强制心跳模型为glm-4.7-flashx（最便宜可用模型）
设置间隔为55分钟（在1小时TTL内保持提示缓存温暖，避免缓存重建成本）

bash
python3 scripts/heartbeat_config.py recommend [缓存TTL分钟数]
python3 scripts/heartbeat_config.py patch # 输出openclaw.json的JSON补丁

平台端（需要数据库连接）

这些脚本查询usage_records PostgreSQL表获取真实数据。在openclaw-manager项目根目录下使用激活的虚拟环境运行。

scripts/usage_report.py

从实际数据库记录生成使用报告——而非估算。

bash
python3 scripts/usage_report.py overview [天数] # 平台级摘要
python3 scripts/usage_report.py instance <名称> [天数] # 单个实例详情

摘要包括： 总调用次数/token数、按提供商细分、按模型细分、消费前10实例、7天每日趋势。

实例报告包括： 按模型分布、每日趋势、生命周期总计。

scripts/quota_advisor.py

将实际24小时使用量与配额计划限制进行比较，发现不匹配：

- 浪费型： 使用量低于计划限制的20% → 建议降级
受限型： 使用量超过计划限制的80% → 建议升级

bash
python3 scripts/quota_advisor.py analyze # 检查所有实例
python3 scripts/quota_advisor.py plans # 显示可用配额计划

统一CLI

scripts/cli.py将上述所有功能封装为单一入口点：

bash
python3 scripts/cli.py route <提示> # 模型路由
python3 scripts/cli.py context <提示> # 上下文推荐
python3 scripts/cli.py generate-agents # 生成AGENTS.md
python3 scripts/cli.py heartbeat # 心跳配置
python3 scripts/cli.py overview [天数] # 平台使用（需要数据库）
python3 scripts/cli.py report <名称> [天数] # 实例报告（需要数据库）
python3 scripts/cli.py advisor # 配额建议（需要数据库）

项目集成点

此技能与现有openclaw-manager基础设施配合使用：

组件	文件	此技能如何使用
提供商配置	config/model.yaml	用于路由的模型名称/端点
代理路由

预期节省

策略	机制	影响
上下文懒加载	每次请求更少的token	上下文减少50-80%
模型路由（flashx）

proxy-token-optimizer代理令牌优化