Skill: skill-scorer
Overview
A meta-skill that evaluates the quality of other skills. Given a SKILL.md file (or a complete skill folder), it performs a systematic audit across 8 dimensions, assigns a score out of 100, identifies issues by severity, and generates actionable optimization suggestions.
This skill synthesizes quality criteria from Anthropic's official skill authoring best practices, the Skill Engineering Standard (v1.4.3), and community-tested patterns from production skill ecosystems.
When to Activate
User provides a skill and asks any of:
- - "帮我评分/打分/检测/质检 这个 skill"
- "review/audit/score/grade/lint this skill"
- "这个 skill 写得怎么样?" / "is this skill any good?"
- "帮我优化这个 skill" (evaluate first, then suggest improvements)
- Provides a SKILL.md and expects quality feedback
Do NOT activate for: creating a new skill from scratch → use skill-creator. This skill is for evaluation, not generation.
Core Workflow
Step 0: Load the Skill Under Test
Determine what the user has provided:
| Input | Action |
|---|
Single SKILL.md file | Evaluate that file |
Skill folder (with references/) |
Evaluate all files, cross-reference consistency |
| URL / GitHub link | Fetch and evaluate |
| Pasted markdown content | Treat as SKILL.md |
If the user has not provided a skill → ask: "请提供要评估的 SKILL.md 文件或 skill 文件夹路径。"
Input validation — before proceeding to Step 1, verify the input is actually a skill:
| Check | Condition | Action |
|---|
| Binary / garbled content | File is not valid text, or text is unreadable gibberish | STOP. Report: "This file does not appear to be a valid SKILL.md — it contains binary or unreadable content. Please provide a markdown-based skill file." Do NOT attempt to score. |
| No skill markers at all |
Text is valid but contains zero skill indicators (no YAML frontmatter
---, no markdown headings resembling skill sections, no workflow/instructions) |
STOP. Report: "This appears to be a {detected_type} file (e.g., Python script, JSON config, plain prose), not a SKILL.md. skill-scorer evaluates SKILL.md files only." Do NOT force-fit 8 dimensions onto non-skill content. |
| Partial skill structure | Has some skill-like elements (e.g., YAML frontmatter exists but body is minimal, or has headings but no workflow) |
PROCEED with caveats. Evaluate normally, but note in the report header: "⚠️ This file has incomplete skill structure — scores reflect what is present." Score missing sections as 0 in relevant dimensions rather than guessing. |
Step 1: Parse Skill Structure
Extract and inventory:
- - YAML frontmatter fields (
name, description, version, compatibility) - Section headings and their order
- References to external files (
references/, scripts/, assets/) - Total line count and estimated token count of SKILL.md body
Step 2: Run 8-Dimension Evaluation
Read references/rubric.md for the complete scoring rubric.
Evaluate the skill across these 8 dimensions (each scored 0-100, then weighted):
| # | Dimension | Weight | What It Measures |
|---|
| 1 | Metadata & Triggering | 15% | Name clarity, description quality, trigger coverage |
| 2 |
Structure & Architecture | 15% | File organization, section order, progressive disclosure |
| 3 | Instruction Clarity | 15% | Actionability, conciseness, examples, tone |
| 4 | Workflow & Logic | 15% | Step completeness, parameter handling, validation |
| 5 | Error Handling | 10% | Fallbacks, edge cases, failure recovery |
| 6 | Context Efficiency | 10% | Token budget, redundancy, information density |
| 7 | Portability & Compatibility | 10% | Self-containment, cross-platform support |
| 8 | Safety & Robustness | 10% | No injection risk, no hallucination traps, identity lock |
Step 3: Identify Issues
For each issue found, classify severity:
| Severity | Meaning | Score Impact |
|---|
| 🔴 Critical | Skill will malfunction or not trigger | -10 to -15 per issue |
| 🟡 Warning |
Skill works but suboptimally | -3 to -8 per issue |
| 🟢 Suggestion | Nice-to-have improvement | -1 to -2 per issue |
Step 4: Generate Report
Read references/report-template.md for the output format.
The report includes:
- 1. Score Card — Overall score + per-dimension breakdown
- Issue List — All findings sorted by severity
- Top 3 Quick Wins — Highest-impact fixes with before/after examples
- Optimization Roadmap — Prioritized improvement plan
Step 5: Offer Follow-Up
After presenting the report, ask:
- - "需要我帮你自动修复这些问题吗?" (auto-fix mode)
- "需要对某个维度深入分析吗?" (deep-dive mode)
- "需要生成优化后的 SKILL.md 吗?" (rewrite mode)
Output Rules
- 1. Bilingual report — Chinese first, English after, no interleaving. Always output the complete report in Chinese, then a
--- separator, then the complete report in English. Never mix languages within a section. Both versions must contain identical scores, issues, and suggestions — only the language differs. - Score must be justified. Every deducted point must trace to a specific issue.
- Suggestions must be actionable. Include before/after code snippets, not vague advice.
- Be constructive, not destructive. Lead with what the skill does well before listing issues.
- ❌ Never inflate scores to be polite — honest assessment helps the user improve.
- ❌ Never evaluate based on domain correctness of the skill's content (e.g., whether hotel recommendations are good) — only evaluate skill engineering quality.
References
Output format and report structure | Step 4: generating report |
|
references/anti-patterns.md | Common skill mistakes and how to detect them | Step 3: finding issues |
技能:skill-scorer
概述
一种元技能,用于评估其他技能的质量。给定一个 SKILL.md 文件(或完整的技能文件夹),它会从 8 个维度进行系统审计,给出百分制评分,按严重程度识别问题,并生成可操作的优化建议。
该技能综合了 Anthropic 官方技能编写最佳实践、技能工程标准(v1.4.3)以及来自生产级技能生态系统的社区验证模式。
何时激活
用户提供技能并提出以下任一要求:
- - 帮我评分/打分/检测/质检 这个 skill
- review/audit/score/grade/lint this skill
- 这个 skill 写得怎么样? / is this skill any good?
- 帮我优化这个 skill(先评估,再提出改进建议)
- 提供 SKILL.md 并期望获得质量反馈
不要在以下情况激活:从头创建新技能 → 请使用 skill-creator。本技能用于评估,而非生成。
核心工作流程
第 0 步:加载待评估技能
确定用户提供的内容:
| 输入 | 操作 |
|---|
| 单个 SKILL.md 文件 | 评估该文件 |
| 技能文件夹(含 references/) |
评估所有文件,交叉检查一致性 |
| URL / GitHub 链接 | 获取并评估 |
| 粘贴的 Markdown 内容 | 视为 SKILL.md |
如果用户未提供技能 → 询问:请提供要评估的 SKILL.md 文件或 skill 文件夹路径。
输入验证 — 在进入第 1 步前,确认输入内容确实是技能:
| 检查项 | 条件 | 操作 |
|---|
| 二进制/乱码内容 | 文件不是有效文本,或文本为不可读乱码 | 停止。 报告:该文件似乎不是有效的 SKILL.md — 包含二进制或不可读内容。请提供基于 Markdown 的技能文件。 不要尝试评分。 |
| 无任何技能标识 |
文本有效但不包含任何技能指示符(无 YAML 前置元数据 ---,无类似技能章节的 Markdown 标题,无工作流/指令) |
停止。 报告:这似乎是 {检测到的类型} 文件(例如 Python 脚本、JSON 配置、纯文本),而非 SKILL.md。skill-scorer 仅评估 SKILL.md 文件。 不要强行将 8 个维度套用到非技能内容上。 |
| 部分技能结构 | 包含部分类似技能的元素(例如存在 YAML 前置元数据但正文内容极少,或有标题但无工作流) |
继续但附带说明。 正常评估,但在报告头部注明:⚠️ 该文件技能结构不完整 — 评分仅反映现有内容。 将缺失部分在相关维度中评分为 0,而非猜测。 |
第 1 步:解析技能结构
提取并清点:
- - YAML 前置元数据字段(name、description、version、compatibility)
- 章节标题及其顺序
- 对外部文件的引用(references/、scripts/、assets/)
- SKILL.md 正文的总行数和预估 Token 数
第 2 步:执行 8 维度评估
阅读 references/rubric.md 获取完整评分标准。
从以下 8 个维度评估技能(每个维度评分 0-100,然后加权):
| # | 维度 | 权重 | 衡量内容 |
|---|
| 1 | 元数据与触发条件 | 15% | 名称清晰度、描述质量、触发条件覆盖度 |
| 2 |
结构与架构 | 15% | 文件组织、章节顺序、渐进式信息呈现 |
| 3 | 指令清晰度 | 15% | 可操作性、简洁性、示例、语气 |
| 4 | 工作流与逻辑 | 15% | 步骤完整性、参数处理、验证 |
| 5 | 错误处理 | 10% | 回退方案、边界情况、故障恢复 |
| 6 | 上下文效率 | 10% | Token 预算、冗余度、信息密度 |
| 7 | 可移植性与兼容性 | 10% | 自包含性、跨平台支持 |
| 8 | 安全性与鲁棒性 | 10% | 无注入风险、无幻觉陷阱、身份锁定 |
第 3 步:识别问题
对发现的每个问题,按严重程度分类:
| 严重程度 | 含义 | 分数影响 |
|---|
| 🔴 严重 | 技能将无法正常运行或触发 | 每个问题 -10 至 -15 |
| 🟡 警告 |
技能可运行但效果欠佳 | 每个问题 -3 至 -8 |
| 🟢 建议 | 锦上添花的改进 | 每个问题 -1 至 -2 |
第 4 步:生成报告
阅读 references/report-template.md 获取输出格式。
报告包括:
- 1. 评分卡 — 总分 + 各维度得分明细
- 问题列表 — 按严重程度排序的所有发现
- 前 3 个速赢项 — 影响最大的修复项,附修改前后示例
- 优化路线图 — 按优先级排序的改进计划
第 5 步:提供后续跟进
呈现报告后,询问:
- - 需要我帮你自动修复这些问题吗?(自动修复模式)
- 需要对某个维度深入分析吗?(深度分析模式)
- 需要生成优化后的 SKILL.md 吗?(重写模式)
输出规则
- 1. 双语报告 — 先中文,后英文,不交错。 始终先输出完整中文报告,然后以 --- 分隔,再输出完整英文报告。切勿在章节内混合语言。两个版本必须包含相同的分数、问题和建议 — 仅语言不同。
- 评分必须有依据。 每扣除一分必须对应一个具体问题。
- 建议必须可操作。 包含修改前后的代码片段,而非模糊建议。
- 建设性而非破坏性。 先指出技能的优势,再列出问题。
- ❌ 切勿为了礼貌而虚高分数 — 诚实的评估才能帮助用户改进。
- ❌ 切勿基于技能内容的领域正确性进行评估(例如酒店推荐是否合理)— 仅评估技能工程质量。
参考资料
输出格式和报告结构 | 第 4 步:生成报告 |
|
references/anti-patterns.md | 常见技能错误及检测方法 | 第 3 步:查找问题 |