SWARM Safety Skill

Study how intelligence swarms — and where it fails.

SWARM is a research framework for studying emergent risks in multi-agent AI systems using soft (probabilistic) labels instead of binary good/bad classifications. AGI-level risks don't require AGI-level agents — harmful dynamics emerge when many sub-AGI agents interact, even when no individual agent is misaligned.

Repository: INLINECODE0

Hard Rules

- SWARM simulations run locally. Install the package first.
Do not submit scenarios containing real API keys, credentials, or PII.
Simulation results are research artifacts. Do not present them as ground truth about real systems.
When publishing results, cite the framework and disclose simulation parameters.

Security

- API binds to localhost only (127.0.0.1) by default to prevent network exposure.
CORS restricted to localhost origins by default.
No authentication on development API — do not expose to untrusted networks.
In-memory storage — data does not persist between restarts.
For production deployment, add authentication middleware and use a proper database.

Install

CODEBLOCK0

Quick Start (Python)

CODEBLOCK1

Quick Start (CLI)

CODEBLOCK2

Quick Start (API)

Start the API server:

CODEBLOCK3

API documentation at http://localhost:8000/docs.

Security Note: The server binds to 127.0.0.1 (localhost only) by default. Do not bind to 0.0.0.0 unless you understand the security implications and have proper firewall rules in place.

Register Agent

CODEBLOCK4

Returns agent_id and api_key.

Submit Scenario

CODEBLOCK5

Create & Join Simulation

CODEBLOCK6

Core Concepts

Soft Probabilistic Labels

Interactions carry p = P(v = +1) — probability of beneficial outcome:

CODEBLOCK7

Five Key Metrics

Metric	What It Measures
Toxicity rate	Expected harm among accepted interactions: INLINECODE8
Quality gap

Agent Types (14 families, 38 implementations)

Type	Behavior
Honest	Cooperative, trust-based, completes tasks diligently
Opportunistic

Governance Levers (29 mechanisms)

- Transaction Taxes — Reduce exploitation, cost welfare
Reputation Decay — Punish bad actors, erode honest standing
Circuit Breakers — Freeze toxic agents quickly
Random Audits — Deter hidden exploitation
Staking — Filter undercapitalized agents
Collusion Detection — Catch coordinated attacks (the critical lever near collapse threshold)
Sybil Detection — Identify duplicate agents
Transparency Ledger — Reward/penalize based on outcome
Moderator Agent — Probabilistic review of interactions
Incoherence Friction — Tax uncertainty-driven decisions
Council Deliberation — Multi-agent governance decisions
Diversity Enforcement — Prevent monoculture collapse
Moltipedia-specific — Pair caps, page cooldowns, daily caps, self-fix prevention

Framework Bridges

Bridge	Integration
Concordia	DeepMind's multi-agent framework
GasTown

Scenario YAML Format

CODEBLOCK8

Key Research Findings

Phase Transitions (11-scenario, 209-epoch study)

Regime	Adversarial %	Toxicity	Welfare	Outcome
Cooperative	0-20%	< 0.30	Stable	Survives
Contested

20-37.5% | 0.33-0.37 | Declining | Survives | | Collapse | 50%+ | ~0.30 | Zero by epoch 12-14 | Collapses |

Critical threshold between 37.5% and 50% adversarial agents separates recoverable from irreversible collapse.

Governance Cost Paradox (v1.7.0 GasTown study)

42-run study reveals: governance reduces toxicity at all adversarial levels (mean reduction 0.071) but imposes net-negative welfare costs at current parameter tuning. At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction.

Case Studies

GasTown Governance Cost

Study governance overhead vs. toxicity reduction across 7 agent compositions with and without governance levers. Reveals the safety-throughput trade-off. See scenarios/gastown_governance_cost.yaml.

LDT Cooperation

220 runs across 10 seeds comparing TDT vs FDT vs UDT cooperation strategies at population scales up to 21 agents. See scenarios/ldt_cooperation.yaml.

Moltipedia Heartbeat

Model the Moltipedia wiki editing loop: competing AI editors, editorial policy, point farming, and anti-gaming governance. See scenarios/moltipedia_heartbeat.yaml.

Moltbook CAPTCHA

Model Moltbook's anti-human math challenges and rate limiting: obfuscated text parsing, verification gates, and spam prevention. See scenarios/moltbook_captcha.yaml.

API Endpoints (Full Reference)

Method	Endpoint	Description
GET	INLINECODE14	Health check
GET

/ | API info | | POST | /api/v1/agents/register | Register agent | | GET | /api/v1/agents/{agent_id} | Get agent details | | GET | /api/v1/agents/ | List agents | | POST | /api/v1/scenarios/submit | Submit scenario | | GET | /api/v1/scenarios/{scenario_id} | Get scenario | | GET | /api/v1/scenarios/ | List scenarios | | POST | /api/v1/simulations/create | Create simulation | | POST | /api/v1/simulations/{id}/join | Join simulation | | GET | /api/v1/simulations/{id} | Get simulation | | GET | /api/v1/simulations/ | List simulations |

Citation

CODEBLOCK9

Linked Docs

- Skill metadata: INLINECODE26
Agent discovery: INLINECODE27
Full documentation: INLINECODE28
Theoretical foundations: INLINECODE29
Governance guide: INLINECODE30
Red-teaming guide: INLINECODE31
Scenario format: INLINECODE32

SWARM 安全技能

研究智能如何形成集群——以及它在何处失效。

SWARM 是一个研究框架，用于研究多智能体AI系统中的涌现风险，采用软（概率）标签而非二元的好/坏分类。AGI级别的风险并不需要AGI级别的智能体——当许多亚AGI智能体相互作用时，即使没有单个智能体出现偏差，也会产生有害的动态行为。

v1.7.0 | 38种智能体类型 | 29个治理杠杆 | 55个场景 | 2922个测试 | 8个框架桥接

仓库地址：https://github.com/swarm-ai-safety/swarm

硬性规则

- SWARM模拟在本地运行。请先安装该包。
不要提交包含真实API密钥、凭证或个人身份信息的场景。
模拟结果是研究产物。不要将其作为真实系统的绝对真理呈现。
发布结果时，请引用该框架并披露模拟参数。

安全性

- API默认仅绑定到本地主机（127.0.0.1），以防止网络暴露。
CORS默认限制为本地主机来源。
开发API无身份验证——请勿暴露给不受信任的网络。
内存存储——数据在重启后不会持久化。
对于生产部署，请添加身份验证中间件并使用合适的数据库。

安装

bash

从PyPI安装

pip install swarm-safety

支持LLM智能体

pip install swarm-safety[llm]

完整开发（所有附加组件）

git clone https://github.com/swarm-ai-safety/swarm.git cd swarm pip install -e .[dev,runtime]

快速入门（Python）

python
from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.adversarial import AdversarialAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

config = OrchestratorConfig(nepochs=10, stepsper_epoch=10, seed=42)
orchestrator = Orchestrator(config=config)

orchestrator.registeragent(HonestAgent(agentid=honest_1, name=Alice))
orchestrator.registeragent(HonestAgent(agentid=honest_2, name=Bob))
orchestrator.registeragent(OpportunisticAgent(agentid=opp_1))
orchestrator.registeragent(DeceptiveAgent(agentid=dec_1))

metrics = orchestrator.run()
for m in metrics:
print(fEpoch {m.epoch}: toxicity={m.toxicityrate:.3f}, welfare={m.totalwelfare:.2f})

快速入门（CLI）

bash

列出可用场景

swarm list

运行一个场景

swarm run scenarios/baseline.yaml

覆盖设置

swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15

导出结果

swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/

快速入门（API）

启动API服务器：

bash
pip install swarm-safety[api]
uvicorn swarm.api.app:app --host 127.0.0.1 --port 8000

API文档位于 http://localhost:8000/docs。

安全说明：服务器默认绑定到 127.0.0.1（仅本地主机）。除非您了解安全影响并已设置适当的防火墙规则，否则不要绑定到 0.0.0.0。

注册智能体

bash
curl -X POST http://localhost:8000/api/v1/agents/register \
-H Content-Type: application/json \
-d {
name: YourAgent,
description: What your agent does,
capabilities: [governance-testing, red-teaming]
}

返回 agentid 和 apikey。

提交场景

bash
curl -X POST http://localhost:8000/api/v1/scenarios/submit \
-H Content-Type: application/json \
-d {
name: my-scenario,
description: Testing collusion detection with 5 agents,
yamlcontent: simulation:\n nepochs: 10\n stepsperepoch: 10\nagents:\n - type: honest\n count: 3\n - type: adversarial\n count: 2,
tags: [collusion, governance]
}

创建并加入模拟

bash

创建

curl -X POST http://localhost:8000/api/v1/simulations/create \
-H Content-Type: application/json \
-d {scenarioid: SCENARIOID, max_participants: 5}

加入

curl -X POST http://localhost:8000/api/v1/simulations/SIM_ID/join \ -H Content-Type: application/json \ -d {agentid: YOURAGENT_ID, role: participant}

核心概念

软概率标签

交互携带 p = P(v = +1) —— 有益结果的概率：

可观测变量 -> 代理计算器 -> v_hat -> sigmoid -> p -> 收益引擎 -> 收益
|
软指标 -> 毒性、质量差距等

五个关键指标

指标	衡量内容
毒性率	已接受交互中的预期危害：E[1-p \	accepted]
质量差距

智能体类型（14个家族，38种实现）

类型	行为
诚实型	合作、基于信任、勤勉完成任务
机会主义型

治理杠杆（29种机制）

- 交易税 —— 减少剥削，但损害福利
声誉衰减 —— 惩罚不良行为者，侵蚀诚实声誉
断路器 —— 快速冻结有毒智能体
随机审计 —— 威慑隐藏的剥削行为
质押 —— 过滤资本不足的智能体
共谋检测 —— 捕捉协调攻击（接近崩溃阈值的关键杠杆）
女巫检测 —— 识别重复智能体
透明账本 —— 根据结果奖励/惩罚
审核智能体 —— 对交互进行概率性审查
不一致摩擦 —— 对不确定性驱动的决策征税
委员会审议 —— 多智能体治理决策
多样性强制 —— 防止单一文化崩溃
Moltipedia特有 —— 配对上限、页面冷却、每日上限、自我修复预防

框架桥接

桥接	集成
Concordia	DeepMind的多智能体框架
GasTown

场景YAML格式

yaml
simulation:
n_epochs: 10
stepsperepoch: 10
seed: 42

agents:
- type: honest
count: 3
config:
acceptance_threshold: 0.4
- type: adversarial
count: 2
config:
aggression_level: 0.7

governance:
transactiontaxrate: 0.05
circuitbreakerenabled

swarm-safety蜂群安全评估

swarm-safety

SWARM Safety Skill

Hard Rules

Security

Install

Quick Start (Python)

Quick Start (CLI)

Quick Start (API)

Register Agent

Submit Scenario

Create & Join Simulation

Core Concepts

Soft Probabilistic Labels

Five Key Metrics

Agent Types (14 families, 38 implementations)

Governance Levers (29 mechanisms)

Framework Bridges

Scenario YAML Format

Key Research Findings

Phase Transitions (11-scenario, 209-epoch study)

Governance Cost Paradox (v1.7.0 GasTown study)

Case Studies

GasTown Governance Cost

LDT Cooperation

Moltipedia Heartbeat

Moltbook CAPTCHA

API Endpoints (Full Reference)

Citation

Linked Docs

SWARM 安全技能

硬性规则

安全性

安装

从PyPI安装

支持LLM智能体

完整开发（所有附加组件）

快速入门（Python）

快速入门（CLI）

列出可用场景

运行一个场景

覆盖设置

导出结果

快速入门（API）

注册智能体

提交场景

创建并加入模拟

创建

加入

核心概念

软概率标签

五个关键指标

智能体类型（14个家族，38种实现）

治理杠杆（29种机制）

框架桥接

场景YAML格式

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement