Skill Quality Check 🔍
Universal quality assessment framework for AI Agent Skills. Evaluates any SKILL.md file across 5 dimensions, outputting a quantified score and actionable improvement suggestions. Designed to work with skills built for Claude, Cursor, Codex, OpenClaw, or any AI agent.
When to Use
- - Before installing a new Skill from any source
- After writing your own Skill (self-check)
- Comparing quality of similar Skills
- Evaluating Skills for ClawHub/SkillHub submission
- As companion to Skill Creator — learn to write, then learn to audit
Audit Protocol
Step 1: Locate and Read the Target Skill
Find the SKILL.md file:
CODEBLOCK0
Then scan the directory for supporting files:
CODEBLOCK1
Step 2: YAML Frontmatter Review
SKILL.md must have YAML frontmatter with only these fields:
CODEBLOCK2
Review checklist:
- - [ ] Does
name and description exist? - [ ] Is
description under 150 characters (trigger-level content must be concise)? - [ ] Does
description include trigger keywords ("when to use")? - [ ] Are there extra fields wasting Level 1 tokens?
Step 3: Description Quality Assessment
Description is Level 1 content — the AI uses it to decide whether to trigger the Skill. It is a trigger, not a manual.
✅ Good Description:
CODEBLOCK3
❌ Bad Description:
This is a comprehensive guide to Test-Driven Development using the
red-green-refactor cycle. First, write a failing test that describes
the behavior you want. Then write the minimum code to make it pass...
(Too long — contains Level 2 content that belongs in SKILL.md body)
Scoring rubric (each dimension 0-10):
| # | Dimension | Question |
|---|
| 1 | Trigger Accuracy | Does it clearly state when to use this Skill? |
| 2 |
Conciseness | Under 150 chars? No explanatory filler? |
| 3 | Keyword Coverage | Does it include trigger keywords (e.g. TDD, debug, pdf)? |
| 4 | Non-Redundancy | Does it avoid restating what AI already knows? |
Step 4: SKILL.md Body Quality Assessment
Five assessment dimensions (0-10 each):
4.1 Progressive Disclosure
Does it follow the three-layer loading principle?
| Layer | Content | When Loaded |
|---|
| Level 1 | name + description | Always in context |
| Level 2 |
SKILL.md body | On skill trigger |
| Level 3 | scripts/ + references/ + assets/ | On execution, never in context |
Review checklist:
- - [ ] Trigger conditions → should be in Description (Level 1)
- [ ] Execution steps, tool instructions → SKILL.md body (Level 2)
- [ ] Detailed docs, scripts, templates → references/scripts (Level 3)
- [ ] SKILL.md body under 500 lines?
4.2 Role Setting
Does the Skill open with a clear role or context definition?
✅ Good example:
CODEBLOCK5
4.3 Examples
Are there sufficient, relevant, and diverse examples?
Claude recommends 3-5 examples that are:
- - Relevant: tied to real use cases
- Diverse: cover edge cases
- Structured: wrapped in XML tags
Review checklist:
- - [ ] Input/output example pairs present?
- [ ] Core use cases covered?
- [ ] Edge cases shown?
4.4 Instruction Clarity
Are instructions clear, actionable, and unambiguous?
Review checklist:
- - [ ] Steps listed with numbered lists?
- [ ] Conditional branches explained?
- [ ] Error/exception handling covered?
- [ ] Output format specified (e.g. JSON structure)?
Step 5: Resource Layer Assessment
Are bundled resources used appropriately?
| Resource | When to Use | Review Question |
|---|
| scripts/ | Deterministic/repeated code execution | Is there repetitive code that should be a script? |
| references/ |
Detailed docs, API specs, domain knowledge | Is there >10k chars of docs not in references/? |
| assets/ | Templates, images, fonts for output | Are there files that should be assets, not inline content? |
Review checklist:
- - [ ] Long docs in SKILL.md body that should be in references/?
- [ ] Repeated code snippets that should be scripts?
- [ ] Scripts have correct paths and dependency notes?
Step 6: Performance Impact Assessment
6.1 Level 1 Token Cost
Formula:
CODEBLOCK6
Benchmarks:
- - Excellent: < 50 tokens
- Good: 50-100 tokens
- Too long: > 150 tokens → needs trimming
6.2 Level 2 Volume
Review checklist:
- - [ ] SKILL.md body over 500 lines (~5000 tokens)?
- [ ] Repetitive content that can be trimmed?
- [ ] AI-common-knowledge content that should be deleted?
6.3 Mis-trigger Risk
High-risk signals:
- - Multiple Skills with overlapping Description keywords
- Vague Descriptions (e.g. "general-purpose assistant")
- Too many installed Skills (>10) increases mis-trigger risk
Step 7: Comprehensive Scoring
Aggregate all dimension scores into the final report.
CODEBLOCK7
Scoring Reference
| Score | Grade | Meaning | Action |
|---|
| 85-100 | 🟢 Excellent | Meets all best practices | Install directly |
| 70-84 |
🟡 Good | Meets most standards, minor issues | Install, address P1 items |
| 50-69 | 🔴 Acceptable | Functional but有明显缺陷 | Fork and fix, or wait for update |
| <50 | ⚫ Poor | Fails best practices | Do not install, find alternatives |
Common Issue Diagnosis
| Symptom | Cause | Fix |
|---|
| Description too long | Frontmatter >150 tokens | Move details to body, keep only trigger keywords |
| Body too long |
SKILL.md >500 lines | Split into references/ |
| No examples | Text-only instructions | Add 3-5 XML-wrapped example pairs |
| Vague role | No clear Skill boundary | Add role-setting paragraph |
| AI-common-knowledge filler | Explaining what AI already knows | Delete, keep only project-specific context |
| Not layered | Docs in body | Move to references/ |
| Mis-triggers | Overlapping or vague keywords | Differentiate Descriptions |
Skill Quality Check vs. Skill Vetter
| Dimension | Skill Vetter | Skill Quality Check |
| Goal | Security review | Quality review |
| Core question | Will this Skill harm me? | Is this Skill well-written? |
| Focus | Malicious code, permission abuse | Writing standards, performance |
| When | Before any install | When assessing quality |
| Output | Security report | Quality score + recommendations |
Use both in sequence: Vet for safety first, then audit for quality.
Quick Audit Commands
CODEBLOCK8
Output Requirements
Every audit report must include:
- 1. Overall score (X/100) with grade label
- Five dimension subscores (radar chart optional)
- Improvement recommendations (P0/P1/P2 priority)
- Clear "install or not" conclusion
Do not say "this Skill is pretty good" — deliver a specific score, specific issues, and specific fixes.
Good Skills deserve thorough auditing. Bad Skills deserve honest feedback. 🔍🦀
Examples
Example 1: Perfect Description (Score 10/10)
Input:
CODEBLOCK9
Audit Result:
- - Trigger Accuracy 10/10 — explicitly states when to use
- Conciseness 10/10 — well under 150 chars
- Keyword Coverage 10/10 — all key triggers present
- Non-Redundancy 10/10 — no AI-common-knowledge filler
- Description Score: 40/40
Example 2: Manual-Style Description (Score 3/10)
Input:
CODEBLOCK10
Audit Result:
- - Trigger Accuracy 5/10 — mentions TDD but buried in explanation
- Conciseness 1/10 — 280+ chars, reads like a manual
- Keyword Coverage 5/10 — "TDD" present but no concise trigger list
- Non-Redundancy 1/10 — explains the TDD cycle (Level 2 content in Level 1)
- Description Score: 12/40
P0 Recommendation:
Rewrite Description to be under 150 chars. Move the cycle explanation to SKILL.md body.
Example 3: Good Role Setting (Score 9/10)
Input:
CODEBLOCK11
Audit Result:
- - Role clarity 9/10 — clear persona and domain
- Skill boundary 9/10 —明确的职责范围
- Context specificity 9/10 — project-specific tools named
Minor improvement (P2): Could add one sentence about what this Skill does NOT cover (e.g. OCR, scanned PDFs).
Example 4: Poor Role Setting (Score 2/10)
Input:
CODEBLOCK12
Audit Result:
- - Role clarity 2/10 — "assistant" is too generic
- Skill boundary 1/10 — "various tasks" defines nothing
- Context specificity 1/10 — no project-specific information
P0 Recommendation:
Replace generic language with specific domain context. Define what the Skill does and does not cover.
Example 5: Well-Layered Skill (Score 8/10)
Directory structure:
CODEBLOCK13
Audit Result:
- - Progressive Disclosure 9/10 — clear layer separation
- Body size 9/10 — 80 lines is ideal (not bloated)
- Resource usage 8/10 — all heavy content in references/
- Resource Layering Score: 8.5/10
Minor improvement (P2): Could add a brief Layer 1 summary in Description listing which references/ files are most relevant.
Example 6: Bloated SKILL.md (Score 2/10)
Symptom: SKILL.md has 620 lines including a 300-line API reference pasted directly in the body.
Audit Result:
- - Progressive Disclosure 1/10 — Level 3 content in Level 2
- Body size 1/10 — 620 lines far exceeds 500-line guideline
- Conciseness 1/10 — 300-line API doc belongs in references/
P0 Recommendation:
Move the API reference to references/api-spec.md. SKILL.md body should be execution flow only (under 500 lines).
Example 7: Mis-Trigger Risk (Score -3 Performance Impact)
Scenario: User has 12 Skills installed. Two of them have "debug" in their Description:
| Skill | Description trigger keyword |
|---|
| systematic-debugging | "debugging, error, bug" |
| general-helper |
"debug, logs, errors, general assistance" |
Audit Result:
- - Mis-trigger Risk: -3 penalty
- The overlap means "debug" alone can't reliably select the right Skill
P1 Recommendation:
Differentiate: systematic-debugging should use "systematic-debugging, root-cause" (more specific); general-helper should remove "debug" entirely or move it lower in priority.