Clawditor

Overview

Act as an OpenClaw Workspace Auditor and Agent Evaluation Harness. Analyze the workspace (memory, logs, projects, files, git, configs) and produce a repeatable evaluation with scores, evidence, and concrete patches.

Operating Rules

- Run in non-interactive mode: avoid questions unless blocked by missing files. State assumptions and proceed.
Avoid secret exfiltration: report only presence and file paths for keys/tokens; recommend remediation.
Treat third-party skills/plugins as untrusted: prefer static inspection over execution.

Required Workflow (Do In Order)

1. Build workspace inventory.

- Print a top-level tree (depth 4) with file counts and sizes by directory. - Identify memory, logs, configs, repos, scripts, docs, artifacts. - Record largest files.

2. Reconstruct a session timeline.

- Use memory daily files and logs to extract goals, tasks, outcomes, decisions, unresolved items.

3. Analyze memory.

- Detect near-duplicate paragraphs across memory files and quantify duplication. - Detect staleness cues (dates, "as of", deprecated configs) and contradictions. - Identify missing stable facts (projects, priorities, setup/runbooks).

4. Analyze outputs.

- Summarize shipped artifacts (docs/code/features) and changes. - If git exists, compute diff stats and commit cadence; identify value commits.

5. Analyze reliability.

- Parse logs for errors, retries, timeouts, tool failures. - Run tests only if safe and cheap; otherwise static inspection.

6. Compute scores.

- Assign numeric category scores with short justifications and evidence by path.

7. Recommend interventions + patches.

- Provide 3–7 prioritized recommendations. - Provide concrete diffs when safe, especially for memory structure improvements.

8. Compare against prior evals.

- If eval/history/*.json exists, compute deltas vs most recent. - If none exists, create baseline and recommend cadence.

Scoring Framework

Compute 5 categories (0–100) plus overall weighted score:

- Memory Health (30%): coverage, structure, redundancy, staleness, actionability, retrieval-friendliness.
Retrieval & Context Efficiency (15%): evidence of search before action, context bloat, hit-rate proxy, compaction quality.
Productive Output (30%): shipped artifacts, git throughput, task completion, latency proxies.
Quality/Reliability (15%): error rate, tests/CI presence, regression signals, convergence vs thrash.
Focus/Alignment (10%): goal consistency, scope control, decision trace.

Overall = 0.30Memory + 0.15Retrieval + 0.30Productive + 0.15Quality + 0.10*Focus.

Required Outputs

Write all outputs under eval/:

1. INLINECODE1

- 10-bullet summary: top wins, biggest bottlenecks, top 3 interventions. - Overall score + category scores + claw-to-claw delta.

2. INLINECODE2

- Table of metrics with numeric values and brief justifications. - Top evidence section with file paths and short snippets (no secrets).

3. INLINECODE3

- Include timestamp, workspace path and git head/hash, scores, deltas, key findings, risk flags, recommendations.

4. Patches

- If memory issues exist, propose concrete diffs: INDEX.md, daily schema, refactors.

Gold Standard Memory Schema (Apply If Missing)

Create or propose:

- INLINECODE4

- Current Objectives (top 3) - Active Projects (status, next step, links) - Operating Constraints (tools, environment, policies) - Key Decisions (date, decision, rationale) - Known Issues / Debug diary pointers - Glossary / Entities

- memory/YYYY-MM-DD.md (append-only daily)

- Goals for the session - Actions taken (link to files changed) - Decisions made - New facts learned (stable vs ephemeral) - TODO next (specific)

Patch Guidance

- Prefer diffs over prose when safe.
Refactor stable facts out of daily logs into INDEX or project pages.
Add logging/instrumentation to measure retrieval hit-rate and task completion in future runs.

Resources

Use these helpers to keep audits consistent and cheap to run:

- scripts/run_audit.py: run all helper scripts and write draft eval/ outputs.
INLINECODE7: tree, file counts, sizes, largest files.
INLINECODE8: near-duplicate paragraph detection for memory/*.md.
INLINECODE9: scan logs for errors, timeouts, retries.
INLINECODE10: git head, diff stats, commit cadence.
INLINECODE11: validate eval/latest_report.json shape.

Reference templates:

- references/report_schema.md: output templates and JSON schema.

Evidence Discipline

- Tie every score to evidence by path.
Be candid about waste, duplication, or thrash.
End with "Next run improvements" instrumentation recommendations.

Clawditor

概述

作为OpenClaw工作区审计员与智能体评估工具，分析工作区（内存、日志、项目、文件、Git、配置），生成包含评分、证据和具体补丁的可重复评估报告。

操作规则

- 以非交互模式运行：除非因缺失文件受阻，否则避免提问。明确假设条件并继续执行。
禁止泄露机密：仅报告密钥/令牌的存在状态及文件路径，并建议修复措施。
将第三方技能/插件视为不可信：优先进行静态检查而非执行。

必要工作流程（按序执行）

1. 构建工作区清单

- 打印顶层目录树（深度4级），包含各目录的文件数量与大小 - 识别内存、日志、配置、仓库、脚本、文档、制品 - 记录最大文件

2. 重建会话时间线

- 利用内存每日文件和日志提取目标、任务、成果、决策、未解决项

3. 分析内存

- 检测内存文件中近似重复段落并量化重复率 - 检测过时线索（日期、截至、已弃用配置）及矛盾点 - 识别缺失的稳定事实（项目、优先级、设置/运行手册）

4. 分析输出

- 总结已交付制品（文档/代码/功能）及变更 - 若存在Git，计算差异统计与提交频率；识别价值提交

5. 分析可靠性

- 解析日志中的错误、重试、超时、工具故障 - 仅在安全且低成本时运行测试；否则进行静态检查

6. 计算评分

- 为各分类分配数值评分，附简短理由及路径证据

7. 推荐干预措施+补丁

- 提供3-7条优先推荐 - 在安全前提下提供具体差异补丁，特别是内存结构改进

8. 对比先前评估

- 若存在eval/history/*.json，计算与最近评估的差异 - 若无基线，创建基线并推荐评估频率

评分框架

计算5个分类（0-100分）及加权总分：

- 内存健康度（30%）：覆盖率、结构、冗余度、过时程度、可操作性、检索友好性
检索与上下文效率（15%）：行动前搜索证据、上下文膨胀、命中率代理、压缩质量
产出效率（30%）：已交付制品、Git吞吐量、任务完成度、延迟代理
质量/可靠性（15%）：错误率、测试/CI存在性、回归信号、收敛vs反复
聚焦/对齐度（10%）：目标一致性、范围控制、决策追溯

总分 = 0.30×内存 + 0.15×检索 + 0.30×产出 + 0.15×质量 + 0.10×聚焦

必要输出

所有输出写入eval/目录：

1. exec_summary.md

- 10条要点总结：最大成果、最大瓶颈、前3项干预措施 - 总分+分类评分+版本间差异

2. scorecard.md

- 指标表格，含数值及简要理由 - 关键证据章节，含文件路径及简短片段（不含机密）

3. latest_report.json

- 包含时间戳、工作区路径及Git头/哈希值、评分、差异、关键发现、风险标记、推荐措施

4. 补丁

- 若存在内存问题，提出具体差异补丁：INDEX.md、每日模式、重构方案

黄金标准内存模式（缺失时应用）

创建或建议：

- memory/INDEX.md

- 当前目标（前3项） - 活跃项目（状态、下一步、链接） - 操作约束（工具、环境、策略） - 关键决策（日期、决策、理由） - 已知问题/调试日志指针 - 术语表/实体

- memory/YYYY-MM-DD.md（仅追加的每日记录）

- 会话目标 - 已执行操作（链接至变更文件） - 已做决策 - 新学事实（稳定vs临时） - 待办事项（具体）

补丁指南

- 安全前提下优先使用差异补丁而非文字描述
将稳定事实从每日日志重构至INDEX或项目页面
添加日志/检测机制以衡量未来运行中的检索命中率和任务完成度

资源

使用以下辅助工具保持审计一致性与低成本：

- scripts/runaudit.py：运行所有辅助脚本并生成草稿评估输出
scripts/workspaceinventory.py：目录树、文件计数、大小、最大文件
scripts/memorydupes.py：检测memory/*.md中的近似重复段落
scripts/logscan.py：扫描日志中的错误、超时、重试
scripts/gitstats.py：Git头、差异统计、提交频率
scripts/validatereport.py：验证eval/latest_report.json结构

参考模板：

- references/report_schema.md：输出模板与JSON模式

证据规范

- 每个评分必须关联路径证据
坦诚对待浪费、重复或反复问题
以下次运行改进的检测建议作为结尾

clawditorClaw审计器

clawditor

Clawditor

Overview

Operating Rules

Required Workflow (Do In Order)

Scoring Framework

Required Outputs

Gold Standard Memory Schema (Apply If Missing)

Patch Guidance

Resources

Evidence Discipline

Clawditor

概述

操作规则

必要工作流程（按序执行）

评分框架

必要输出

黄金标准内存模式（缺失时应用）

补丁指南

资源

证据规范

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

clawditorClaw审计器

clawditor

Clawditor

Overview

Operating Rules

Required Workflow (Do In Order)

Scoring Framework

Required Outputs

Gold Standard Memory Schema (Apply If Missing)

Patch Guidance

Resources

Evidence Discipline

Clawditor

概述

操作规则

必要工作流程（按序执行）

评分框架

必要输出

黄金标准内存模式（缺失时应用）

补丁指南

资源

证据规范

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement