Cron Cost Guard
Prevent silent token budget burns from misconfigured AI agent cron jobs.
Quick Audit
Run this checklist on every cron setup or when investigating a cost spike:
1. Session Isolation (Critical)
Check every cron job for session binding conflicts:
CODEBLOCK0
Red flags:
- -
sessionKey: "agent:main:main" with sessionTarget: "isolated" → stale binding, will cause model conflicts - INLINECODE2 pointing to a different agent than the session owner → cross-agent model contamination
- INLINECODE3 → likely stuck in a retry loop
- INLINECODE4 containing
LiveSessionModelSwitchError → model-switch loop confirmed
Fix: Remove and recreate the job without sessionKey. Set sessionTarget: "isolated".
2. Model Conflicts
In multi-agent setups (e.g., Agent A on Claude, Agent B on GPT), each agent's crons must be scoped to that agent only.
- - Set
agentId explicitly on every cron job - Set
model explicitly in the payload when available - Never let Agent B's cron inherit Agent A's session model
3. System Prompt Size
Audit injected workspace files:
CODEBLOCK1
Target: < 20KB total injected. Move large files (playbooks, heartbeat templates, reference docs) to references/ for on-demand reading.
| Size | Status |
|---|
| < 20KB | Healthy |
| 20-40KB |
Trim soon |
| > 40KB | Trim now — every API call is bloated |
4. Cost Monitoring
CODEBLOCK2
Look for sessions with high estimatedCostUsd but low output tokens — that's a retry loop signature.
| Metric | Healthy | Warning | Critical |
|---|
| Cron consecutive errors | 0 | 1-2 | ≥3 |
| Session cost (cron) |
< $0.50 | $0.50-2.00 | > $2.00 |
| Model switch retries | 0 | 1-2 | ≥3 |
Diagnosis: Token Spike
For detailed diagnosis steps and post-incident checklist, read references/diagnosis.md.
Prevention Rules
- 1. Every cron job:
sessionTarget: "isolated", no stale INLINECODE13 - Every cron job: explicit
timeoutSeconds (never unlimited) - Multi-agent: explicit
agentId matching the agent that should run it - After changing default model: audit all cron jobs for conflicts
- Weekly: check
consecutiveErrors across all jobs — anything ≥ 3 needs investigation
Cron Cost Guard
防止因AI代理定时任务配置错误导致的静默令牌预算消耗。
快速审计
在每个定时任务设置时或调查成本激增时,运行此检查清单:
1. 会话隔离(关键)
检查每个定时任务的会话绑定冲突:
cron list (includeDisabled: true)
危险信号:
- - sessionKey: agent:main:main 搭配 sessionTarget: isolated → 过期绑定,将导致模型冲突
- agentId 指向与会话所有者不同的代理 → 跨代理模型污染
- consecutiveErrors >= 3 → 可能陷入重试循环
- lastError 包含 LiveSessionModelSwitchError → 确认模型切换循环
修复:移除并重新创建不含 sessionKey 的任务。设置 sessionTarget: isolated。
2. 模型冲突
在多代理设置中(例如,代理A使用Claude,代理B使用GPT),每个代理的定时任务必须仅限定于该代理。
- - 在每个定时任务上显式设置 agentId
- 在可用时,在负载中显式设置 model
- 绝不允许代理B的定时任务继承代理A的会话模型
3. 系统提示词大小
审计注入的工作区文件:
bash
wc -c MEMORY.md SOUL.md AGENTS.md TOOLS.md USER.md QUEUE.md
目标:总注入量 < 20KB。将大文件(剧本、心跳模板、参考文档)移至 references/ 目录,按需读取。
尽快精简 |
| > 40KB | 立即精简 — 每次API调用都过于臃肿 |
4. 成本监控
sessions_list (limit: 10, messageLimit: 1)
查找具有高 estimatedCostUsd 但低输出令牌的会话——这是重试循环的特征。
| 指标 | 健康 | 警告 | 严重 |
|---|
| 定时任务连续错误 | 0 | 1-2 | ≥3 |
| 会话成本(定时任务) |
< $0.50 | $0.50-2.00 | > $2.00 |
| 模型切换重试次数 | 0 | 1-2 | ≥3 |
诊断:令牌激增
有关详细的诊断步骤和事后检查清单,请阅读 references/diagnosis.md。
预防规则
- 1. 每个定时任务:sessionTarget: isolated,无过期 sessionKey
- 每个定时任务:显式设置 timeoutSeconds(绝不无限)
- 多代理:显式设置 agentId 匹配应运行该任务的代理
- 更改默认模型后:审计所有定时任务是否存在冲突
- 每周:检查所有任务的 consecutiveErrors — 任何 ≥ 3 的都需要调查