Agent Audit
Scan your entire OpenClaw setup and get actionable cost/performance recommendations.
What This Skill Does
- 1. Scans config — reads OpenClaw config to map models to agents/tasks
- Analyzes cron history — checks every cron job's model, token usage, runtime, success rate
- Classifies tasks — determines complexity level of each task
- Calculates costs — per agent, per cron, per task type using provider pricing
- Recommends changes — with confidence levels and risk warnings
- Generates report — markdown report with specific savings estimates
Running the Audit
CODEBLOCK0
Options:
CODEBLOCK1
How It Works
Phase 1: Discovery
- - Read OpenClaw config (
~/.openclaw/openclaw.json or similar) - List all cron jobs and their configurations
- List all agents and their default models
- Detect provider (Anthropic, OpenAI, Google, xAI) from model names
Phase 2: History Analysis
- - Pull cron job run history (last 7 days by default)
- Calculate per-job: avg tokens, avg runtime, success rate, model used
- Pull session history where available
- Calculate total token spend by model tier
Phase 3: Task Classification
Classify each task into complexity tiers:
| Tier | Examples | Recommended Models |
|---|
| Simple | Health checks, status reports, reminders, notifications | Cheapest tier (Haiku, GPT-4o-mini, Flash, Grok-mini) |
| Medium |
Content drafts, research, summarization, data analysis | Mid tier (Sonnet, GPT-4o, Pro, Grok) |
|
Complex | Coding, architecture, security review, nuanced writing | Top tier (Opus, GPT-4.5, Ultra, Grok-2) |
Classification signals:
- - Simple: Short output (<500 tokens), low thinking requirement, repetitive pattern, status/health tasks
- Medium: Medium output, some reasoning needed, creative but templated, research tasks
- Complex: Long output, multi-step reasoning, code generation, security-critical, tasks that previously failed on weaker models
Phase 4: Recommendations
For each task where the model tier doesn't match complexity:
CODEBLOCK2
Safety Rules — NEVER Recommend Downgrading:
- - Coding/development tasks
- Security reviews or audits
- Tasks that have previously failed on weaker models
- Tasks where the user explicitly chose a higher model
- Complex multi-step reasoning tasks
- Anything the user flagged as critical
Phase 5: Report Generation
Output a clean markdown report with:
- 1. Overview — total agents, crons, monthly spend estimate
- Per-agent breakdown — model, usage, cost
- Per-cron breakdown — model, frequency, avg tokens, cost
- Recommendations — sorted by savings potential
- Total potential savings — monthly estimate
- One-liner config changes — exact model strings to swap
Model Pricing Reference
See references/model-pricing.md for current pricing across all providers.
Update this file when prices change.
Task Classification Details
See references/task-classification.md for detailed heuristics
on how tasks are classified into complexity tiers.
Important Notes
- - This skill is read-only — it never changes your config automatically
- All recommendations include risk levels and confidence scores
- When unsure about a task's complexity, it defaults to keeping the current model
- The audit should be re-run periodically (monthly) as usage patterns change
- Token counts are estimates based on cron history — actual costs depend on your provider's billing
Agent Audit
扫描你的整个OpenClaw配置,并获得可执行的成本/性能建议。
此技能的功能
- 1. 扫描配置 — 读取OpenClaw配置,将模型映射到代理/任务
- 分析cron历史 — 检查每个cron作业的模型、Token使用量、运行时间、成功率
- 分类任务 — 确定每个任务的复杂程度
- 计算成本 — 按代理、按cron、按任务类型,使用提供商定价计算
- 推荐更改 — 附带置信度等级和风险警告
- 生成报告 — 包含具体节省估算的Markdown报告
运行审计
bash
python3 {baseDir}/scripts/audit.py
选项:
bash
python3 {baseDir}/scripts/audit.py --format markdown # 完整报告(默认)
python3 {baseDir}/scripts/audit.py --format summary # 仅快速摘要
python3 {baseDir}/scripts/audit.py --dry-run # 显示将要分析的内容
python3 {baseDir}/scripts/audit.py --output /path/to/report.md # 保存到文件
工作原理
阶段1:发现
- - 读取OpenClaw配置(~/.openclaw/openclaw.json或类似文件)
- 列出所有cron作业及其配置
- 列出所有代理及其默认模型
- 从模型名称检测提供商(Anthropic、OpenAI、Google、xAI)
阶段2:历史分析
- - 拉取cron作业运行历史(默认最近7天)
- 计算每个作业:平均Token数、平均运行时间、成功率、使用的模型
- 拉取会话历史(如果可用)
- 按模型层级计算总Token支出
阶段3:任务分类
将每个任务分类到复杂度层级:
| 层级 | 示例 | 推荐模型 |
|---|
| 简单 | 健康检查、状态报告、提醒、通知 | 最便宜层级(Haiku、GPT-4o-mini、Flash、Grok-mini) |
| 中等 |
内容草稿、研究、摘要、数据分析 | 中等层级(Sonnet、GPT-4o、Pro、Grok) |
|
复杂 | 编码、架构、安全审查、精细写作 | 顶级层级(Opus、GPT-4.5、Ultra、Grok-2) |
分类信号:
- - 简单:输出短(<500 Token),思考需求低,重复模式,状态/健康任务
- 中等:输出中等,需要一定推理,有创意但模板化,研究任务
- 复杂:输出长,多步推理,代码生成,安全关键,之前在较弱模型上失败的任务
阶段4:推荐
对于每个模型层级与复杂度不匹配的任务:
⚠️ 推荐:将Knox Bot健康检查从opus降级为haiku
当前:anthropic/claude-opus-4(输入$15/M,输出$75/M)
建议:anthropic/claude-haiku(输入$0.25/M,输出$1.25/M)
原因:简单的状态检查,平均输出300 Token
预计节省:$X.XX/月
风险:低 — 任务是简单的模式匹配
置信度:高
安全规则 — 绝不推荐降级:
- - 编码/开发任务
- 安全审查或审计
- 之前在较弱模型上失败的任务
- 用户明确选择更高模型的任务
- 复杂的多步推理任务
- 用户标记为关键的任何任务
阶段5:报告生成
输出一份清晰的Markdown报告,包含:
- 1. 概览 — 总代理数、cron数、每月支出估算
- 按代理细分 — 模型、使用量、成本
- 按cron细分 — 模型、频率、平均Token数、成本
- 推荐 — 按节省潜力排序
- 总潜在节省 — 每月估算
- 单行配置更改 — 要替换的确切模型字符串
模型定价参考
请参阅 references/model-pricing.md 了解所有提供商的最新定价。
当价格变化时更新此文件。
任务分类详情
请参阅 references/task-classification.md 了解任务如何分类到复杂度层级的详细启发式规则。
重要说明
- - 此技能是只读的 — 它永远不会自动更改你的配置
- 所有推荐都包含风险等级和置信度评分
- 当不确定任务复杂度时,默认保留当前模型
- 审计应定期重新运行(每月),因为使用模式会变化
- Token计数基于cron历史估算 — 实际成本取决于你的提供商计费方式