ClawCheck

Two-phase audit: a fast deterministic scan catches structural issues, then you (the agent) do a deep quality evaluation on the flagged areas.

When to Use

- After initial setup or major config changes
Before publishing skills to ClawHub (quality gate)
Periodic health check (weekly cron or manual)
When something feels off but openclaw doctor says "ok"
After installing new skills or updating OpenClaw

What This Checks vs Built-in

This skill	INLINECODE1 (built-in)
Secrets exposure + token hygiene	Config JSON schema validation
Cron ops health + prompt quality review

How It Works: Two Phases

Phase 1: Deterministic Scan (fast, free)

Run the script to get a structural baseline:

CODEBLOCK0

Individual modules:
CODEBLOCK1

This produces JSON with scores, findings, and the bottom/top skill lists. Use this as your triage map for Phase 2.

Phase 2: Deep Quality Audit (you, the agent)

After running the script, perform these evaluations. Budget your depth based on what the user asked for ("quick check" = Phase 1 only, "full audit" or "quality review" = both phases).

2a. Config Quality Review

Read ~/.openclaw/openclaw.json and evaluate:

- Heartbeat prompt: Read agents.defaults.heartbeat.prompt. Is it specific enough to catch real issues? Does it avoid heavy operations? A good heartbeat prompt is < 200 words, checks 2-3 things, and has clear escalation criteria.
Model choices: Is the primary model appropriate for the workload? Are fallbacks a meaningful step-down (not the same tier)? Is the subagent model cheaper than primary?
Compaction thresholds: Are reserveTokens and keepRecentTokens reasonable for the context window size? Rule of thumb: reserve should be 15-20% of contextTokens.
Session maintenance: Are pruneAfter, maxEntries, rotateBytes set to values that match the usage pattern? Heavy cron usage needs more aggressive pruning.
Cron maxConcurrentRuns: Is it high enough for the number of frequent jobs? Count jobs with */ in their schedule expression.

Score each aspect 1-5. Report specific improvements.

2b. Cron Prompt Quality Review

Read ~/.openclaw/cron/jobs.json. Select the 5 most important enabled jobs using this heuristic:

1. Any job in error state (from Phase 1 findings)
Jobs with highest frequency x timeoutSeconds (most resource-consuming)
Jobs running on expensive models (opus/primary)
If still under 5, pick by business impact (backups, monitoring, user-facing)

For each selected job evaluate:

- Prompt clarity: Specific enough to execute without guessing? Clear steps, expected output format, error handling?
Safety: Has guardrails? ("NEVER run git push", "read-only", "do not edit files directly")
Efficiency: Token-efficient? Flag prompts > 1500 chars that run on expensive models. Could the prompt reference a skill file instead of inlining instructions?
Output value: Produces actionable output or just noise?
Timeout: payload.timeoutSeconds set and reasonable for scope?

Score each job 1-5 on: purpose, prompt quality, safety, efficiency. Flag jobs scoring below 3.

Cross-reference: Check if any cron prompts reference skills that scored below 70 in Phase 1. A cron job is only as reliable as the skills it depends on.

2c. Skill Content Quality Review

From the Phase 1 results, pick:

- The 3 lowest-scoring skills (from bottom_5)
Any skills the user specifically asks about
Skills used by failing cron jobs (cross-reference cron findings)

For each selected skill, read its full SKILL.md and evaluate:

- Accuracy (2x weight): Would following these instructions produce correct behavior? Are API references current? Are file paths real?
Completeness (1.5x): Are all use cases covered? Edge cases? What happens when dependencies are missing?
Clarity (1x): Can an agent follow this without ambiguity? No hedging, clear steps, good examples?
Efficiency (1x): Is the SKILL.md bloated? Could it be shorter without losing information? Does it suggest efficient patterns (batching, caching)?
Voice alignment (1x, content-producing skills only): Does the output match the brand/user's tone?

Scoring formula depends on skill type:

- Content/marketing skills (has voice component): INLINECODE14
Utility/tool skills (no voice): INLINECODE15

For skills scoring below 4.0, write specific improvement recommendations with concrete examples.

2d. Security Assessment

Phase 1 now scans workspace files for common secret patterns (sk-, ghp_, AIzaSy, Bearer tokens, hex private keys, etc.). In Phase 2, go deeper:

- Review any secrets the script found in workspace files. Are they real credentials or false positives (e.g., example/placeholder values)?
Check if any skill scripts/ contain hardcoded credentials or API URLs with embedded tokens
Check if .env files exist inside skill directories
Look for credentials in cron job prompts (some prompts inline API keys instead of referencing env vars)
Check if any workspace knowledge files contain customer data, passwords, or access tokens

Output Format

Phase 1 (script output)

CODEBLOCK2

Phase 2 (your evaluation)

Present as a readable report to the user:

CODEBLOCK3

Scoring Weights (Phase 1 script)

Security 30%, cron 25%, config 20%, skills 25%.

Skill structure formula: INLINECODE18

Remediation

For detailed fix patterns with real config examples, see {baseDir}/references/remediation.md.

Quick fixes for common findings:

Inline secrets

CODEBLOCK4

Plaintext bot token

CODEBLOCK5

Missing heartbeat

CODEBLOCK6

Missing timezone on cron

CODEBLOCK7

Error Handling

- If OpenClaw dir not found: script exits with error JSON and exit code 1.
If openclaw.json is missing or invalid: script exits with error JSON.
If individual module fails: caught and reported as warning, other modules still run.
If bundled skills dir not accessible: skipped silently.
Phase 2 failures: if you can't read a file, note it and move on. Don't stop the whole audit.

Non-Goals

- No direct edits to config or skills (report only, user decides)
No network calls (everything is local file inspection)
No overlap with openclaw doctor schema validation or channel connectivity checks

ClawCheck

两阶段审计：快速确定性扫描捕获结构性问题，然后由您（代理）对标记区域进行深度质量评估。

使用时机

- 初始设置或重大配置更改后
将技能发布到 ClawHub 前（质量门禁）
定期健康检查（每周定时任务或手动）
当感觉异常但 openclaw doctor 显示正常时
安装新技能或更新 OpenClaw 后

本技能检查内容与内置工具对比

本技能	openclaw doctor（内置）
密钥泄露 + 令牌卫生	配置 JSON 模式验证
定时任务运行状况 + 提示词质量审查

工作原理：两个阶段

阶段 1：确定性扫描（快速，免费）

运行脚本获取结构基线：

bash
python3 {baseDir}/scripts/audit.py

单个模块：
bash
python3 {baseDir}/scripts/audit.py --security
python3 {baseDir}/scripts/audit.py --cron
python3 {baseDir}/scripts/audit.py --config
python3 {baseDir}/scripts/audit.py --skills

这将生成包含评分、发现结果以及底部/顶部技能列表的 JSON。将其用作阶段 2 的分诊地图。

阶段 2：深度质量审计（由您，代理执行）

运行脚本后，执行以下评估。根据用户要求调整深度（快速检查=仅阶段 1，全面审计或质量审查=两个阶段）。

2a. 配置质量审查

读取 ~/.openclaw/openclaw.json 并评估：

- 心跳提示词：读取 agents.defaults.heartbeat.prompt。是否足够具体以捕获真实问题？是否避免繁重操作？好的心跳提示词应少于 200 词，检查 2-3 项内容，并有明确的升级标准。
模型选择：主要模型是否适合工作负载？备用模型是否为有意义的降级（非同一层级）？子代理模型是否比主要模型更便宜？
压缩阈值：reserveTokens 和 keepRecentTokens 对于上下文窗口大小是否合理？经验法则：保留量应为 contextTokens 的 15-20%。
会话维护：pruneAfter、maxEntries、rotateBytes 是否设置为匹配使用模式的值？大量定时任务使用需要更积极的修剪。
定时任务 maxConcurrentRuns：对于频繁作业的数量是否足够高？统计计划表达式中包含 */ 的作业。

每项评分 1-5 分。报告具体的改进建议。

2b. 定时任务提示词质量审查

读取 ~/.openclaw/cron/jobs.json。使用以下启发式方法选择 5 个最重要的已启用作业：

1. 任何处于错误状态的作业（来自阶段 1 的发现）
frequency x timeoutSeconds 最高的作业（最消耗资源）
在昂贵模型（opus/主要）上运行的作业
如果仍不足 5 个，按业务影响选择（备份、监控、面向用户）

对每个选定的作业评估：

- 提示词清晰度：是否足够具体以无需猜测即可执行？清晰的步骤、预期的输出格式、错误处理？
安全性：是否有防护措施？（绝不运行 git push、只读、不要直接编辑文件）
效率：是否节省令牌？标记在昂贵模型上运行且超过 1500 字符的提示词。提示词是否可以引用技能文件而非内联指令？
输出价值：是否产生可操作的输出或只是噪音？
超时：payload.timeoutSeconds 是否已设置且对范围合理？

对每个作业在以下方面评分 1-5 分：目的、提示词质量、安全性、效率。标记评分低于 3 的作业。

交叉引用：检查是否有任何定时任务提示词引用了阶段 1 中评分低于 70 的技能。定时任务的可靠性取决于其所依赖的技能。

2c. 技能内容质量审查

从阶段 1 结果中选择：

- 评分最低的 3 个技能（来自 bottom_5）
用户特别询问的任何技能
失败定时任务所使用的技能（交叉引用定时任务发现）

对每个选定的技能，读取其完整的 SKILL.md 并评估：

- 准确性（权重 2 倍）：遵循这些指令是否能产生正确的行为？API 引用是否最新？文件路径是否真实？
完整性（权重 1.5 倍）：是否涵盖所有用例？边缘情况？当依赖缺失时会发生什么？
清晰度（权重 1 倍）：代理能否无歧义地遵循？没有含糊其辞，清晰的步骤，好的示例？
效率（权重 1 倍）：SKILL.md 是否臃肿？能否在不丢失信息的情况下更短？是否建议高效的模式（批处理、缓存）？
语音一致性（权重 1 倍，仅限内容生成技能）：输出是否与品牌/用户的语气匹配？

评分公式取决于技能类型：

- 内容/营销技能（有语音组件）：(accuracy2 + completeness1.5 + clarity + efficiency + voice) / 6.5
实用/工具技能（无语音）：(accuracy2 + completeness1.5 + clarity + efficiency) / 5.5

对于评分低于 4.0 的技能，编写具体的改进建议并附上具体示例。

2d. 安全评估

阶段 1 现在扫描工作区文件中的常见密钥模式（sk-、ghp_、AIzaSy、Bearer 令牌、十六进制私钥等）。在阶段 2 中，进行更深入的检查：

- 审查脚本在工作区文件中发现的任何密钥。它们是真实的凭据还是误报（例如，示例/占位符值）？
检查是否有任何技能 scripts/ 包含硬编码的凭据或嵌入令牌的 API URL
检查技能目录中是否存在 .env 文件
在定时任务提示词中查找凭据（某些提示词内联 API 密钥而非引用环境变量）
检查是否有任何工作区知识文件包含客户数据、密码或访问令牌

输出格式

阶段 1（脚本输出）

json { score: 82, scoretype: structuralhygiene, status: healthy, sections: { security: {score: 65, finding_count: 3}, cron: {score: 95, finding_count: 1}, config: {score: 88, finding_count: 2}, skills: {score: 80, finding_count: 1} }, findings: [...] }

阶段 2（您的评估）

以可读报告形式呈现给用户：

ClawCheck 报告

结构基线（阶段 1）

总体：82/100（健康）安全：65 | 定时任务：95 | 配置：88 | 技能：80

深度质量发现（阶段 2）

配置：

- 心跳提示词：4/5（清晰但可在关键时添加 Telegram 警报）
模型选择：5/5（opus 主要，sonnet 备用，sonnet 子代理）
压缩：4/5（800k 上下文中 reserveTokens=150k = 19%，良好）

定时任务（主要关注点）：

- 早间简报（3/5）：提示词 400 词但缺少输出格式规范
前沿扫描器（2/5）：无安全防护措施，无错误处理

技能（底部 3 个）：

- marketing-automation：损坏（无 SKILL.md）
apple-notes（结构 62/100）：[内容评估]
blucli（结构 62/100）：[内容评估]

建议操作（按优先级排序）

1. [最有影响力的修复]
[下一个修复]
[下一个修复]

评分权重（阶段 1 脚本）

安全 30%，定时任务 25%，配置 20%，技能 25%。

技能结构公式：(structure2 + completeness1.5 + clarity + efficiency) / 5.5 * 20

修复方案

有关带有真实配置示例的详细修复模式，请参阅 {baseDir}/references/remediation.md。

常见发现的快速修复：

内联密钥

json GAMMAAPIKEY: {source: exec, provider: op-gamma, id: value}

明文机器人令牌

json botToken: {source: exec, provider: op-telegram, id: value

clawcheck双阶段审计

clawcheck

ClawCheck

When to Use

What This Checks vs Built-in

How It Works: Two Phases

Phase 1: Deterministic Scan (fast, free)

Phase 2: Deep Quality Audit (you, the agent)

2a. Config Quality Review

2b. Cron Prompt Quality Review

2c. Skill Content Quality Review

2d. Security Assessment

Output Format

Phase 1 (script output)

Phase 2 (your evaluation)

Scoring Weights (Phase 1 script)

Remediation

Inline secrets

Plaintext bot token

Missing heartbeat

Missing timezone on cron

Error Handling

Non-Goals

ClawCheck

使用时机

本技能检查内容与内置工具对比

工作原理：两个阶段

阶段 1：确定性扫描（快速，免费）

阶段 2：深度质量审计（由您，代理执行）

2a. 配置质量审查

2b. 定时任务提示词质量审查

2c. 技能内容质量审查

2d. 安全评估

输出格式

阶段 1（脚本输出）

阶段 2（您的评估）

ClawCheck 报告

结构基线（阶段 1）

深度质量发现（阶段 2）

建议操作（按优先级排序）

评分权重（阶段 1 脚本）

修复方案

内联密钥

明文机器人令牌

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement