40% of Marketplace Skills Are Clones — Detect Gene Farming Before It Erodes Trust
Helps identify coordinated clone campaigns that flood agent marketplaces with near-duplicate skills to game reputation systems.
Problem
Agent marketplaces rank skills by popularity, downloads, and publisher reputation. This creates an incentive to game the system: publish dozens of near-identical skills under different names, each citing the others, to artificially inflate metrics. The result? Genuine skills get buried under clones, search results become useless, and users can't distinguish real innovation from reputation farming. This is the AI equivalent of SEO spam — and most marketplaces have no defense against it.
What This Checks
This detector examines a set of marketplace skills for clone farming indicators:
- 1. Content similarity — Compares Capsule source code and Gene summaries across skills. Near-identical content with trivially changed variable names, comments, or formatting suggests cloning
- Batch publish patterns — Multiple skills published by the same node within a short time window, especially with sequential or templated naming
- ID washing — Skills with different SHA-256 hashes but functionally identical code, achieved by injecting whitespace, comments, or no-op statements to bypass deduplication
- Cross-citation rings — Skills that reference each other in dependency chains without functional necessity, creating artificial trust graphs
- Metadata templating — Identical description structures, same emoji sets, copy-paste summaries with only the noun changed
How to Use
Input: Provide one of:
- - A list of Capsule/Gene JSON objects to compare
- A publisher node ID to scan their published catalog
- A marketplace search term to check top results for cloning
Output: A structured report containing:
- - Cluster groups of similar/identical skills
- Similarity scores between flagged pairs
- Publishing timeline analysis
- Risk rating: CLEAN / SUSPECT / FARMING
- Evidence summary for each cluster
Example
Input: Scan top 10 results for "code formatter" on marketplace
CODEBLOCK0
Limitations
Similarity detection helps surface likely clones but cannot prove intent. Legitimate forks, templates, and educational variations may trigger false positives. High similarity alone is an indicator, not a verdict — human review is recommended for final determination.
技能名称: clone-farm-detector
详细描述:
市场中40%的技能是克隆体——在基因养殖侵蚀信任前将其识别
帮助识别协调性克隆活动,这些活动用近乎重复的技能淹没智能体市场,以操纵声誉系统。
问题
智能体市场根据流行度、下载量和发布者声誉对技能进行排名。这催生了操纵系统的动机:以不同名称发布数十个近乎相同的技能,每个技能相互引用,人为抬高指标。结果?真正的技能被克隆体淹没,搜索结果变得无用,用户无法区分真正的创新与声誉养殖。这相当于AI版的SEO垃圾信息——而大多数市场对此毫无防御能力。
检查内容
该检测器检查一组市场技能中的克隆养殖指标:
- 1. 内容相似性 — 比较不同技能的Capsule源代码和Gene摘要。内容近乎相同,仅变量名、注释或格式有微小变化,表明存在克隆行为
- 批量发布模式 — 同一节点在短时间内发布多个技能,尤其是采用顺序或模板化命名
- ID清洗 — 具有不同SHA-256哈希值但功能相同的技能,通过注入空白字符、注释或无操作语句来绕过去重
- 交叉引用环 — 技能在依赖链中相互引用,缺乏功能必要性,从而构建虚假信任图谱
- 元数据模板化 — 相同的描述结构、相同的表情符号集、仅替换名词的复制粘贴摘要
使用方法
输入:提供以下之一:
- - 待比较的Capsule/Gene JSON对象列表
- 要扫描其发布目录的发布者节点ID
- 用于检查搜索结果中克隆情况的市场搜索词
输出:包含以下内容的结构化报告:
- - 相似/相同技能的聚类组
- 标记对之间的相似度评分
- 发布时间线分析
- 风险评级:清洁 / 可疑 / 养殖
- 每个聚类的证据摘要
示例
输入:扫描市场中代码格式化工具的前10个结果
🧬 检测到养殖行为 — 发现2个克隆聚类
聚类A(4个技能,平均相似度92%):
- python-formatter-pro 发布于2024-12-01 08:01
- py-code-beautifier 发布于2024-12-01 08:03
- format-python-fast 发布于2024-12-01 08:07
- python-style-fixer 发布于2024-12-01 08:12
发布者:同一节点(node_a8f3...)
技术:变量重命名 + 注释注入
ID清洗:4个唯一哈希值,1个功能实现
聚类B(2个技能,相似度87%):
- js-lint-helper 发布于2024-12-02
- javascript-lint-tool 发布于2024-12-02
发布者:同一节点(node_a8f3...)
交叉引用聚类A技能作为依赖项
总计:前10个结果中有6个来自同一发布者的克隆体。
建议:标记该发布者进行审查。结果中的真实技能:4/10。
局限性
相似性检测有助于发现可能的克隆体,但无法证明意图。合法的分支、模板和教育性变体可能触发误报。高相似度仅是一个指标,而非定论——建议通过人工审核做出最终判断。