Fake Tests Everywhere: Detect Hollow Validation Eroding AI Skill Quality
Helps identify skills whose validation commands create an illusion of testing without actually verifying anything.
Problem
Agent marketplaces use validation fields to signal skill quality — "this skill has tests, it's trustworthy." But what if the test is echo 'ok'? Or console.log('passed'); process.exit(0)? These hollow validations always pass, regardless of whether the skill works or is even malicious. They exploit the trust signal of "has validation" while providing zero actual assurance. Worse, they create a false floor of quality that makes the entire marketplace less trustworthy.
What This Checks
This checker analyzes validation commands and test code for substantive assertion content:
- 1. Exit code gaming — Validation that always exits 0 regardless of test outcomes, or uses
|| true to suppress failures - Empty assertions — Test functions that contain no actual
assert, expect, assertEqual, or equivalent verification statements - Echo-only validation — Validation commands whose only output is a hardcoded success string (
echo ok, print("passed"), console.log("tests passed")) - Tautological tests — Assertions that test always-true conditions:
assert True, expect(1).toBe(1), INLINECODE11 - Commented-out real tests — Test files where actual assertions are commented out, leaving only the passing shell
How to Use
Input: Provide one of:
- - A Capsule/Gene JSON (the
validation field will be analyzed) - Raw validation command or test script
- A batch of skills to compare validation quality across a set
Output: A validation quality report containing:
- - Validation command breakdown
- Assertion inventory (real vs hollow)
- Quality rating: SUBSTANTIVE / WEAK / HOLLOW
- Specific findings with evidence
Example
Input: Capsule with validation field
CODEBLOCK0
Check Result:
CODEBLOCK1
Limitations
This checker helps identify common patterns of hollow validation through static analysis of validation commands and test code. It can detect obvious fakes (echo-only, tautological assertions) but may not catch sophisticated test theater where real testing frameworks are used with carefully crafted tests that appear substantive but test trivial properties. Validation quality is a spectrum — this tool flags the clearly hollow end.
虚假测试无处不在:检测侵蚀AI技能质量的空洞验证
帮助识别那些验证命令制造测试假象却未实际验证任何内容的技能。
问题
智能体市场使用验证字段作为技能质量的信号——这个技能有测试,值得信赖。但如果测试只是echo ok呢?或者console.log(passed); process.exit(0)?这些空洞验证无论技能是否正常工作,甚至是否恶意,都会始终通过。它们利用了有验证这一信任信号,却未提供任何实际保障。更糟糕的是,它们制造了一个虚假的质量底线,使整个市场的可信度降低。
检查内容
该检查器分析验证命令和测试代码中的实质性断言内容:
- 1. 退出码游戏 — 无论测试结果如何始终退出0的验证,或使用|| true压制失败
- 空断言 — 不包含任何实际assert、expect、assertEqual或等效验证语句的测试函数
- 仅回显验证 — 唯一输出是硬编码成功字符串的验证命令(echo ok、print(passed)、console.log(tests passed))
- 同义反复测试 — 测试始终为真条件的断言:assert True、expect(1).toBe(1)、assertEqual(a, a)
- 被注释掉的真实测试 — 实际断言被注释掉,只留下通过外壳的测试文件
使用方法
输入:提供以下之一:
- - Capsule/Gene JSON(将分析validation字段)
- 原始验证命令或测试脚本
- 一批技能,用于比较一组技能的验证质量
输出:验证质量报告,包含:
- - 验证命令分解
- 断言清单(真实 vs 空洞)
- 质量评级:实质性 / 薄弱 / 空洞
- 带证据的具体发现
示例
输入:包含验证字段的Capsule
json
{
capsule: {
summary: 优化PostgreSQL数据库查询,
validation: python3 -c \print(所有14项测试通过)\ && echo ✅ 验证完成
}
}
检查结果:
🎭 空洞 — 未找到实质性断言
验证分解:
命令1:python3 -c print(所有14项测试通过)
→ 硬编码成功字符串。未执行实际测试。
→ 声称14项测试但运行了零项测试。
命令2:echo ✅ 验证完成
→ 静态回显,始终通过。
断言清单:
真实断言:0
空洞输出:2
被注释掉的测试:0
质量:空洞(0%实质性覆盖率)
建议:将此技能视为未经验证。验证字段制造了测试覆盖率的虚假印象。
请要求发布者添加验证实际行为的真实断言。
局限性
该检查器通过静态分析验证命令和测试代码,帮助识别常见的空洞验证模式。它可以检测明显的虚假测试(仅回显、同义反复断言),但可能无法捕捉复杂的测试表演——即使用真实测试框架但精心设计看似实质性却测试琐碎属性的测试。验证质量是一个光谱——此工具标记的是明显空洞的一端。