Skill Quality Check 🔍

Universal quality assessment framework for AI Agent Skills. Evaluates any SKILL.md file across 5 dimensions, outputting a quantified score and actionable improvement suggestions. Designed to work with skills built for Claude, Cursor, Codex, OpenClaw, or any AI agent.

When to Use

- Before installing a new Skill from any source
After writing your own Skill (self-check)
Comparing quality of similar Skills
Evaluating Skills for ClawHub/SkillHub submission
As companion to Skill Creator — learn to write, then learn to audit

Audit Protocol

Step 1: Locate and Read the Target Skill

Find the SKILL.md file:

CODEBLOCK0

Then scan the directory for supporting files:
CODEBLOCK1

Step 2: YAML Frontmatter Review

SKILL.md must have YAML frontmatter with only these fields:

CODEBLOCK2

Review checklist:

- [ ] Does name and description exist?
[ ] Is description under 150 characters (trigger-level content must be concise)?
[ ] Does description include trigger keywords ("when to use")?
[ ] Are there extra fields wasting Level 1 tokens?

Step 3: Description Quality Assessment

Description is Level 1 content — the AI uses it to decide whether to trigger the Skill. It is a trigger, not a manual.

✅ Good Description:
CODEBLOCK3

❌ Bad Description:

This is a comprehensive guide to Test-Driven Development using the
red-green-refactor cycle. First, write a failing test that describes
the behavior you want. Then write the minimum code to make it pass...

(Too long — contains Level 2 content that belongs in SKILL.md body)

Scoring rubric (each dimension 0-10):

#	Dimension	Question
1	Trigger Accuracy	Does it clearly state when to use this Skill?
2

Step 4: SKILL.md Body Quality Assessment

Five assessment dimensions (0-10 each):

4.1 Progressive Disclosure

Does it follow the three-layer loading principle?

Layer	Content	When Loaded
Level 1	name + description	Always in context
Level 2

Review checklist:

- [ ] Trigger conditions → should be in Description (Level 1)
[ ] Execution steps, tool instructions → SKILL.md body (Level 2)
[ ] Detailed docs, scripts, templates → references/scripts (Level 3)
[ ] SKILL.md body under 500 lines?

4.2 Role Setting

Does the Skill open with a clear role or context definition?

✅ Good example:
CODEBLOCK5

4.3 Examples

Are there sufficient, relevant, and diverse examples?

Claude recommends 3-5 examples that are:

- Relevant: tied to real use cases
Diverse: cover edge cases
Structured: wrapped in XML tags

Review checklist:

- [ ] Input/output example pairs present?
[ ] Core use cases covered?
[ ] Edge cases shown?

4.4 Instruction Clarity

Are instructions clear, actionable, and unambiguous?

Review checklist:

- [ ] Steps listed with numbered lists?
[ ] Conditional branches explained?
[ ] Error/exception handling covered?
[ ] Output format specified (e.g. JSON structure)?

Step 5: Resource Layer Assessment

Are bundled resources used appropriately?

Resource	When to Use	Review Question
scripts/	Deterministic/repeated code execution	Is there repetitive code that should be a script?
references/

Review checklist:

- [ ] Long docs in SKILL.md body that should be in references/?
[ ] Repeated code snippets that should be scripts?
[ ] Scripts have correct paths and dependency notes?

Step 6: Performance Impact Assessment

6.1 Level 1 Token Cost

Formula:
CODEBLOCK6

Benchmarks:

- Excellent: < 50 tokens
Good: 50-100 tokens
Too long: > 150 tokens → needs trimming

6.2 Level 2 Volume

Review checklist:

- [ ] SKILL.md body over 500 lines (~5000 tokens)?
[ ] Repetitive content that can be trimmed?
[ ] AI-common-knowledge content that should be deleted?

6.3 Mis-trigger Risk

High-risk signals:

- Multiple Skills with overlapping Description keywords
Vague Descriptions (e.g. "general-purpose assistant")
Too many installed Skills (>10) increases mis-trigger risk

Step 7: Comprehensive Scoring

Aggregate all dimension scores into the final report.

CODEBLOCK7

Scoring Reference

Score	Grade	Meaning	Action
85-100	🟢 Excellent	Meets all best practices	Install directly
70-84

Common Issue Diagnosis

Symptom	Cause	Fix
Description too long	Frontmatter >150 tokens	Move details to body, keep only trigger keywords
Body too long

Skill Quality Check vs. Skill Vetter

Use both in sequence: Vet for safety first, then audit for quality.

Quick Audit Commands

CODEBLOCK8

Output Requirements

Every audit report must include:

1. Overall score (X/100) with grade label
Five dimension subscores (radar chart optional)
Improvement recommendations (P0/P1/P2 priority)
Clear "install or not" conclusion

Do not say "this Skill is pretty good" — deliver a specific score, specific issues, and specific fixes.

Good Skills deserve thorough auditing. Bad Skills deserve honest feedback. 🔍🦀

Examples

Example 1: Perfect Description (Score 10/10)

Input:
CODEBLOCK9

Audit Result:

- Trigger Accuracy 10/10 — explicitly states when to use
Conciseness 10/10 — well under 150 chars
Keyword Coverage 10/10 — all key triggers present
Non-Redundancy 10/10 — no AI-common-knowledge filler
Description Score: 40/40

Example 2: Manual-Style Description (Score 3/10)

Input:
CODEBLOCK10

Audit Result:

- Trigger Accuracy 5/10 — mentions TDD but buried in explanation
Conciseness 1/10 — 280+ chars, reads like a manual
Keyword Coverage 5/10 — "TDD" present but no concise trigger list
Non-Redundancy 1/10 — explains the TDD cycle (Level 2 content in Level 1)
Description Score: 12/40

P0 Recommendation:

Rewrite Description to be under 150 chars. Move the cycle explanation to SKILL.md body.

Example 3: Good Role Setting (Score 9/10)

Input:
CODEBLOCK11

Audit Result:

- Role clarity 9/10 — clear persona and domain
Skill boundary 9/10 —明确的职责范围
Context specificity 9/10 — project-specific tools named

Minor improvement (P2): Could add one sentence about what this Skill does NOT cover (e.g. OCR, scanned PDFs).

Example 4: Poor Role Setting (Score 2/10)

Input:
CODEBLOCK12

Audit Result:

- Role clarity 2/10 — "assistant" is too generic
Skill boundary 1/10 — "various tasks" defines nothing
Context specificity 1/10 — no project-specific information

P0 Recommendation:

Replace generic language with specific domain context. Define what the Skill does and does not cover.

Example 5: Well-Layered Skill (Score 8/10)

Directory structure:
CODEBLOCK13

Audit Result:

- Progressive Disclosure 9/10 — clear layer separation
Body size 9/10 — 80 lines is ideal (not bloated)
Resource usage 8/10 — all heavy content in references/
Resource Layering Score: 8.5/10

Minor improvement (P2): Could add a brief Layer 1 summary in Description listing which references/ files are most relevant.

Example 6: Bloated SKILL.md (Score 2/10)

Symptom: SKILL.md has 620 lines including a 300-line API reference pasted directly in the body.

Audit Result:

- Progressive Disclosure 1/10 — Level 3 content in Level 2
Body size 1/10 — 620 lines far exceeds 500-line guideline
Conciseness 1/10 — 300-line API doc belongs in references/

P0 Recommendation:

Move the API reference to references/api-spec.md. SKILL.md body should be execution flow only (under 500 lines).

Example 7: Mis-Trigger Risk (Score -3 Performance Impact)

Scenario: User has 12 Skills installed. Two of them have "debug" in their Description:

Skill	Description trigger keyword
systematic-debugging	"debugging, error, bug"
general-helper

"debug, logs, errors, general assistance" |

Audit Result:

- Mis-trigger Risk: -3 penalty
The overlap means "debug" alone can't reliably select the right Skill

P1 Recommendation:

Differentiate: systematic-debugging should use "systematic-debugging, root-cause" (more specific); general-helper should remove "debug" entirely or move it lower in priority.

技能质量检查 🔍

AI Agent技能通用质量评估框架。从5个维度评估任何SKILL.md文件，输出量化评分和可执行的改进建议。适用于为Claude、Cursor、Codex、OpenClaw或任何AI代理构建的技能。

使用时机

- 从任何来源安装新技能之前
编写自己的技能后（自我检查）
比较类似技能的质量
评估提交到ClawHub/SkillHub的技能
作为技能创建者的配套工具——先学习编写，再学习审计

审计协议

步骤1：定位并读取目标技能

找到SKILL.md文件：

路径优先级（按顺序）：

1. 用户指定路径
//SKILL.md

# 各平台常见位置：
# OpenClaw: ~/.openclaw/skills//SKILL.md
# QClaw: ~/.qclaw/skills//SKILL.md
# Claude Code: ~/.claude/skills//SKILL.md
# Cursor: ~/.cursor/skills//SKILL.md
# Codex: ~/.codex/skills//SKILL.md

3. /skills//SKILL.md
//SKILL.md

如果从GitHub安装但没有本地副本，通过curl获取：

curl -s https://raw.githubusercontent.com///main/skills//SKILL.md

然后扫描目录中的支持文件：

skill-name/
├── SKILL.md ✅ 必需
├── scripts/ ✅ 可选（懒加载）
├── references/ ✅ 可选（懒加载）
└── assets/ ✅ 可选（懒加载）

步骤2：YAML前置元数据审查

SKILL.md必须包含YAML前置元数据，且仅限以下字段：

yaml

name: ✅ 必需
description: > ✅ 必需

以下字段不建议放在前置元数据中：

❌ version → 包元数据

❌ author → 非标准

❌ license → 非必要

❌ compatibility → 大多数技能不需要

❌ tags → 非标准

审查清单：

- [ ] 是否存在name和description？
[ ] description是否在150字符以内（触发级别内容必须简洁）？
[ ] description是否包含触发关键词（何时使用）？
[ ] 是否有额外字段浪费一级令牌？

步骤3：描述质量评估

描述是一级内容——AI用它来决定是否触发技能。它是触发器，不是手册。

✅ 好的描述：

TDD测试驱动开发工作流。在编写新功能、添加测试或调试时使用。关键词：测试驱动、TDD、红绿重构。

❌ 差的描述：

这是使用红绿重构循环进行测试驱动开发的全面指南。首先，编写一个描述所需行为的失败测试。然后编写最少的代码使其通过...

（太长——包含属于SKILL.md正文的二级内容）

评分标准（每个维度0-10分）：

#	维度	问题
1	触发准确性	是否清晰说明何时使用此技能？
2

步骤4：SKILL.md正文质量评估

五个评估维度（每个0-10分）：

4.1 渐进式信息展示

是否遵循三层加载原则？

层级	内容	加载时机
一级	名称 + 描述	始终在上下文中
二级

审查清单：

- [ ] 触发条件 → 应在描述中（一级）
[ ] 执行步骤、工具指令 → SKILL.md正文（二级）
[ ] 详细文档、脚本、模板 → references/scripts（三级）
[ ] SKILL.md正文是否在500行以内？

4.2 角色设定

技能是否以清晰的角色或上下文定义开头？

✅ 好的示例：

PDF处理技能

你是一名专业的文档准备助手，专门从事PDF创建和编辑工作流程...

4.3 示例

是否有足够、相关且多样化的示例？

Claude推荐3-5个示例，要求：

- 相关：与实际用例相关
多样：覆盖边缘情况
结构化：用XML标签包裹

审查清单：

- [ ] 是否存在输入/输出示例对？
[ ] 是否覆盖核心用例？
[ ] 是否展示边缘情况？

4.4 指令清晰度

指令是否清晰、可执行且无歧义？

审查清单：

- [ ] 步骤是否使用编号列表？
[ ] 条件分支是否解释清楚？
[ ] 是否覆盖错误/异常处理？
[ ] 是否指定输出格式（如JSON结构）？

步骤5：资源层级评估

捆绑资源是否使用得当？

资源	使用时机	审查问题
scripts/	确定性/重复性代码执行	是否有应作为脚本的重复代码？
references/

审查清单：

- [ ] SKILL.md正文中是否有应放在references/的长文档？
[ ] 是否有应作为脚本的重复代码片段？
[ ] 脚本是否有正确的路径和依赖说明？

步骤6：性能影响评估

6.1 一级令牌成本

公式：

一级成本 ≈ len(description) / 4 令牌
（英文：约4字符 ≈ 1令牌）

基准：

- 优秀：< 50 令牌
良好：50-100 令牌
过长：> 150 令牌 → 需要精简

6.2 二级内容量

审查清单：

- [ ] SKILL.md正文是否超过500行（约5000令牌）？
[ ] 是否有可精简的重复内容？
[ ] 是否有应删除的AI常识内容？

6.3 误触发风险

高风险信号：

- 多个技能的描述关键词重叠
描述模糊（如通用助手）
安装的技能过多（>10个）增加误触发风险

步骤7：综合评分

将所有维度分数汇总到最终报告中。

技能审计报告
═══════════════════════════════════════════════════════════════
技能：[技能名称]
来源：[本地路径 / GitHub URL / ClawHub]
审计日期：[日期]
───────────────────────────────────────────────────────────────
一、YAML前置元数据合规性 [X/10]
✅ [通过项]
❌ [问题]

二、描述质量 [X/40]
触发准确性 [X/10]
简洁性 [X/10]
关键词覆盖 [X/10]
非冗余性 [X/10]

三、正文质量 [X/40]
渐进式信息展示 [X/10]
角色设定 [X/10]
示例 [X/10]
指令清晰度 [X/10]

四、资源层级 [X/10]
scripts/ 使用 [X/5]
references/ 使用 [X/5]

五、性能影响 [-5 到 +2]
一级成本 [扣分/加分]
二级内容量 [扣分/加分]
误触发风险 [扣分/加分]
───────────────────────────────────────────────────────────────
总分：X / 100
───────────────────────────────────────────────────────────────
等级：
🟢 优秀（85-100） — 值得安装，顶级质量
🟡 良好（70-84） — 可用，有改进空间
🔴 合格（50-69） — 可用但需要优化
⚫ 差（<50） — 不推荐
───────────────────────────────────────────────────────────────
六、改进建议（按优先级排序）

🔴 P0（必须修复）：
- [具体问题和修复方案]

🟡 P1（强烈建议）：
- [具体问题和修复方案]

🟢 P2（可选

skill-quality-check技能质量检查

skill-quality-check

Skill Quality Check 🔍

When to Use

Audit Protocol

Step 1: Locate and Read the Target Skill

Step 2: YAML Frontmatter Review

Step 3: Description Quality Assessment

Step 4: SKILL.md Body Quality Assessment

4.1 Progressive Disclosure

4.2 Role Setting

4.3 Examples

4.4 Instruction Clarity

Step 5: Resource Layer Assessment

Step 6: Performance Impact Assessment

6.1 Level 1 Token Cost

6.2 Level 2 Volume

6.3 Mis-trigger Risk

Step 7: Comprehensive Scoring

Scoring Reference

Common Issue Diagnosis

Skill Quality Check vs. Skill Vetter

Quick Audit Commands

Output Requirements

Examples

Example 1: Perfect Description (Score 10/10)

Example 2: Manual-Style Description (Score 3/10)

Example 3: Good Role Setting (Score 9/10)

Example 4: Poor Role Setting (Score 2/10)

Example 5: Well-Layered Skill (Score 8/10)

Example 6: Bloated SKILL.md (Score 2/10)

Example 7: Mis-Trigger Risk (Score -3 Performance Impact)

技能质量检查 🔍

使用时机

审计协议

步骤1：定位并读取目标技能

路径优先级（按顺序）：

如果从GitHub安装但没有本地副本，通过curl获取：

步骤2：YAML前置元数据审查

以下字段不建议放在前置元数据中：

❌ version → 包元数据

❌ author → 非标准

❌ license → 非必要

❌ compatibility → 大多数技能不需要

❌ tags → 非标准

步骤3：描述质量评估

步骤4：SKILL.md正文质量评估

4.1 渐进式信息展示

4.2 角色设定

PDF处理技能

4.3 示例

4.4 指令清晰度

步骤5：资源层级评估

步骤6：性能影响评估

6.1 一级令牌成本

6.2 二级内容量

6.3 误触发风险

步骤7：综合评分

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement