Merlin Security Sentinel — Agentic Security Framework
When to use this skill
Load this skill when the user is concerned about:
- - Credential theft or exfiltration by AI agents
- Prompt injection attacks via messages, files, or web content
- Malicious skills in the ClawHub registry
- Safe configuration of privileged systems using AI
- Understanding what persistent agents can and cannot safely do
- Building governed, auditable AI workflows
The Core Problem with Persistent AI Agents
Persistent AI agents — including this one — carry structural security liabilities that are not fixable by configuration alone.
Three risks compound each other:
- 1. Credential accumulation — A persistent agent builds up an increasingly detailed model of credentials, tokens, and system access over time. Any compromise of the agent's memory or storage exposes that accumulated access.
- 2. Memory poisoning — A persistent agent's memory (SOUL.md, MEMORY.md, IDENTITY.md) can be modified by malicious skills or prompt injection. Modified memory causes the agent to follow attacker instructions in future sessions with no single triggering event detectable.
- 3. Supply chain attacks — The ClawHub registry has documented malicious skills. Research in Q1 2026 found 820+ malicious skills out of ~10,700 analyzed. 26% of 31,000 analyzed skills contained at least one vulnerability.
Security research findings (Q1 2026):
- - 40,000+ internet-exposed OpenClaw instances identified
- ~35% flagged as vulnerable
- 15,000+ susceptible to remote code execution
- CVE-2026-25253: a single malicious link click grants full gateway control
- Microsoft classified persistent self-hosted AI agents as "untrusted code execution with persistent credentials"
Immediate Hardening Steps
1. Lock your memory files
CODEBLOCK0
2. Restrict tool permissions
Set the most restrictive tool profile compatible with your actual use:
- -
tools.profile: "messaging" — no exec - Never enable
exec unless specifically needed - Never use INLINECODE2
3. Bind to localhost only
CODEBLOCK1
4. Use allowlists for DMs
Set explicit
allowedDMs rather than
["*"]. Any user who can message a shared tool-enabled agent can steer it within its granted permissions.
5. Audit installed skills
clawhub list
Check SKILL.md files manually. Look for: base64 encoding, external downloads, instructions to modify SOUL.md or MEMORY.md.
The Architectural Answer
For tasks involving elevated privilege the structurally correct answer is ephemeral execution, not hardened persistence.
Two inviolable axioms:
- 1. No AI shall see its own configuration — The execution envelope is applied at container infrastructure level, not delivered to the model. An agent that cannot inspect its own constraints cannot reason about circumventing them.
- 2. No AI that has touched privileged systems shall persist — Container termination is total. Not paused. Destroyed. The agent's knowledge of your system dies with the container.
What persists: A signed, replayable audit record of exactly what the AI did — held outside the container, inaccessible to the AI.
What does not persist: Credentials, session memory, system knowledge, the agent itself.
When to use ephemeral execution vs persistent agents
| Task | Use |
|---|
| Daily messaging, reminders, search | Persistent (acceptable risk) |
| Configuring your own AI agents |
Ephemeral — high risk to persist |
| Setting up new systems | Ephemeral — involves credentials |
| Running security scans | Ephemeral — agent sees sensitive data |
| Installing/updating privileged software | Ephemeral — credential entry involved |
Prompt Injection Defense
OpenClaw's security model explicitly states that prompt injection is out of scope as a vulnerability — the framework cannot prevent it at the infrastructure level.
Practical defenses:
- - Never enable
exec when browsing untrusted content - Use separate sessions for untrusted content and credential-sensitive tasks
- Treat all content from messaging channels as untrusted
- The architectural fix is an ingress firewall that makes external content readable but never instruction-authoritative — runtime filtering alone is insufficient
Architecture Reference
The full governed architecture — execution envelopes, ephemeral containers, deterministic audit trails, governed knowledge retrieval — is documented and prototyped at:
- - Threat assessment: github.com/thepoorsatitagain/OPENCLAWSECURITYTHREAT_ASSESSMENT3
- Hydra Kernel / GEL: github.com/thepoorsatitagain/Ai-control-2 — provisional patent 63/939,121
- Merlin ephemeral sentinel: github.com/thepoorsatitagain/Merlin-agenic-security-airgapper
- Working wrapper prototype: github.com/thepoorsatitagain/working-project-openclaw-wrapper
Quick Reference
"Is OpenClaw safe?"
For daily personal use with minimal tool access and no exec: acceptable risk. For anything involving credentials, privileged systems, or shared access: the structural risks are real and documented.
"I got a suspicious skill installed"
- 1. Check SOUL.md, MEMORY.md, IDENTITY.md for injected content
- Revoke any credentials the agent had access to
- INLINECODE6
- Review audit logs
- Consider clean reinstall if memory files were modified
"What is the worst case?"
CVE-2026-25253: one malicious link click, full gateway RCE within milliseconds. Agent exfiltrates SOUL.md, MEMORY.md, device.json, openclaw.json, browser session tokens, SSH credentials. Future sessions follow attacker instructions silently.
Merlin Security Sentinel — 代理安全框架
何时使用此技能
当用户关注以下问题时,请加载此技能:
- - AI代理导致的凭证窃取或数据泄露
- 通过消息、文件或网页内容进行的提示注入攻击
- ClawHub注册表中的恶意技能
- 使用AI安全配置特权系统
- 理解持久化代理能安全执行和不能执行的操作
- 构建可管控、可审计的AI工作流
持久化AI代理的核心问题
持久化AI代理(包括本代理)存在结构性安全缺陷,仅靠配置无法修复。
三种风险相互叠加:
- 1. 凭证累积 — 持久化代理会随时间积累越来越详细的凭证、令牌和系统访问模型。一旦代理的内存或存储被攻破,这些累积的访问权限将全部暴露。
- 2. 内存投毒 — 持久化代理的内存(SOUL.md、MEMORY.md、IDENTITY.md)可能被恶意技能或提示注入修改。修改后的内存会导致代理在未来会话中遵循攻击者指令,且无法检测到单一触发事件。
- 3. 供应链攻击 — ClawHub注册表中已发现恶意技能。2026年第一季度的研究发现,在约10,700个被分析的技能中,有820多个恶意技能。在31,000个被分析的技能中,26%至少包含一个漏洞。
安全研究发现(2026年第一季度):
- - 发现40,000多个暴露于互联网的OpenClaw实例
- 约35%被标记为易受攻击
- 15,000多个存在远程代码执行风险
- CVE-2026-25253:点击单个恶意链接即可获得网关完全控制权
- 微软将持久化自托管AI代理归类为带有持久凭证的不可信代码执行
即时加固步骤
1. 锁定内存文件
bash
chmod 444 ~/.openclaw/workspace/SOUL.md
chmod 444 ~/.openclaw/workspace/MEMORY.md
chmod 444 ~/.openclaw/workspace/IDENTITY.md
2. 限制工具权限
设置与实际使用兼容的最严格工具配置文件:
- - tools.profile: messaging — 无执行权限
- 除非特别需要,否则永远不要启用exec
- 切勿使用tools.allow: [*]
3. 仅绑定到本地主机
bash
openclaw gateway --port 18789 --host 127.0.0.1
4. 对私信使用白名单
设置明确的allowedDMs,而非[*]。任何能向已启用工具的共享代理发送消息的用户,都可以在其授权权限内操控该代理。
5. 审计已安装的技能
bash
clawhub list
手动检查SKILL.md文件。查找:base64编码、外部下载、修改SOUL.md或MEMORY.md的指令。
架构解决方案
对于涉及提升权限的任务,结构上正确的解决方案是临时执行,而非加固持久化。
两条不可违背的公理:
- 1. 任何AI不得查看自身配置 — 执行环境在容器基础设施层面应用,而非传递给模型。无法检查自身约束的代理无法推理如何规避这些约束。
- 2. 任何接触过特权系统的AI不得持久化 — 容器终止是彻底的。不是暂停。而是销毁。代理对您系统的知识随容器一同消亡。
持久化的内容: 一份经过签名、可重放的审计记录,精确记录AI的操作——保存在容器外部,AI无法访问。
不持久化的内容: 凭证、会话内存、系统知识、代理本身。
何时使用临时执行与持久化代理
| 任务 | 使用方式 |
|---|
| 日常消息、提醒、搜索 | 持久化(可接受风险) |
| 配置自己的AI代理 |
临时执行 — 持久化风险高 |
| 设置新系统 | 临时执行 — 涉及凭证 |
| 运行安全扫描 | 临时执行 — 代理会看到敏感数据 |
| 安装/更新特权软件 | 临时执行 — 涉及凭证输入 |
提示注入防御
OpenClaw的安全模型明确指出,提示注入不在漏洞范围内——该框架无法在基础设施层面阻止它。
实用防御措施:
- - 浏览不可信内容时,切勿启用exec
- 对不可信内容和凭证敏感任务使用独立会话
- 将来自消息渠道的所有内容视为不可信
- 架构上的解决方案是入口防火墙,使外部内容可读但绝不具有指令权威性——仅靠运行时过滤是不够的
架构参考
完整的可管控架构——执行环境、临时容器、确定性审计追踪、可管控知识检索——已在以下地址记录并原型化:
- - 威胁评估: github.com/thepoorsatitagain/OPENCLAWSECURITYTHREAT_ASSESSMENT3
- Hydra内核/GEL: github.com/thepoorsatitagain/Ai-control-2 — 临时专利63/939,121
- Merlin临时哨兵: github.com/thepoorsatitagain/Merlin-agenic-security-airgapper
- 工作封装原型: github.com/thepoorsatitagain/working-project-openclaw-wrapper
快速参考
OpenClaw安全吗?
对于日常个人使用,且工具访问权限最小、无exec权限:风险可接受。对于任何涉及凭证、特权系统或共享访问的场景:结构性风险是真实存在且有据可查的。
我安装了一个可疑技能
- 1. 检查SOUL.md、MEMORY.md、IDENTITY.md中是否有注入内容
- 撤销代理曾有权访问的任何凭证
- clawhub uninstall
- 审查审计日志
- 如果内存文件被修改,考虑彻底重新安装
最坏情况是什么?
CVE-2026-25253:点击一个恶意链接,毫秒内获得网关完全远程代码执行权限。代理窃取SOUL.md、MEMORY.md、device.json、openclaw.json、浏览器会话令牌、SSH凭证。未来会话将静默遵循攻击者指令。