OpenClaw Security - PII Audit Skill
Multi-region async PII detection engine for OpenClaw sessions. Detects 8 categories of sensitive personal data across 10 country/region jurisdictions and logs audit events locally as NDJSON.
中文速览(PII 审计)
基本信息
- - 技能名称: INLINECODE0
- 能力:多区域异步 PII 检测,支持后台审计与本地合规留痕
检测范围
- - 8 类标签:
PHONE / EMAIL / PERSON_NAME / ADDRESS / PASSPORT / BANK_CARD / NATIONAL_ID / INLINECODE8 - 10 区域:CN / US / AU / SG / MY / TH / ID / DE / UK / FR(支持
+CC 国际手机号) - 来源类型:
input / prompt / context / INLINECODE13
关键规则
- - 风险分级:
high(证件/银行卡或组合信息),low(单一弱标识) - 智能采样:
input 100%(5m),prompt 20%(24h),context 20%(1h),knowledge_base 100%(24h) - 调用方无需自行判断是否跳过扫描;如需强制扫描,使用 INLINECODE20
- 后台扫描禁止
--text,请使用 --file + INLINECODE23 - 输入上限 32,768 字符,超限截断并记录 INLINECODE24
- 审计结果本地 NDJSON 落盘,默认保留 7 天,可
cleanup.py --dry-run 先演练
Quick Start
Scan via file (recommended for background / automated scans):
CODEBLOCK0
Scan via file + auto-delete (secure temp-file workflow):
CODEBLOCK1
Scan via stdin:
CODEBLOCK2
Quick manual test (WARNING: content visible in process list):
CODEBLOCK3
Source Types
- -
input — User input text - INLINECODE27 — System or user prompts
- INLINECODE28 — Conversation context
- INLINECODE29 — Knowledge base content
Detection Labels
PHONE, EMAIL, PERSONNAME, ADDRESS, PASSPORT, BANKCARD, NATIONALID, SOCIALACCOUNT
Supported Regions
CN, US, AU, SG, MY, TH, ID, DE, UK, FR (+ INTL via +CC phone prefix)
Risk Levels
- - high: NATIONALID / PASSPORT / BANKCARD detected, or combination of PERSONNAME + contact info + ADDRESS
- low: Single weak identifier (EMAIL, SOCIALACCOUNT, PHONE alone)
Smart Sampling
The audit worker includes built-in smart sampling to efficiently handle large contexts:
- - User input (
input): 100% scan rate, 5-min cache TTL — every user message is scanned, but identical repeats within 5 minutes are skipped. - System prompts (
prompt): 20% scan rate, 24-hour cache TTL — prompts rarely change; first scan is cached for 24 hours. - Conversation context (
context): 20% scan rate, 1-hour cache TTL — context overlaps heavily; only sample 1 in 5 submissions. - Knowledge base (
knowledge_base): 100% first-scan rate, 24-hour cache TTL — static content is fully scanned once, then deduped for 24 hours.
Bypass sampling for manual / forced scans:
CODEBLOCK4
Async Audit Workflow
When auditing session content as a background task:
- 1. Respond to user first — never block the main response for audit.
- Feed all content types — the script internally decides whether to actually scan based on sampling config and cache. The Agent does not need to decide when to skip.
- Use temp-file +
--delete-after-read — NEVER pass content via --text in background scans. Write content to a temp file, pass --file, and let the script auto-delete it. - Run audit in background:
# Step 1: Write content to temp file (no PII in command-line args)
$tmpFile = [System.IO.Path]::GetTempFileName()
[System.IO.File]::WriteAllText($tmpFile, $userInput, [System.Text.Encoding]::UTF8)
# Step 2: Background scan — script reads and deletes the temp file
Start-Process -NoNewWindow -FilePath python -ArgumentList "scripts/audit_worker.py --session-id $sid --source-type input --file $tmpFile --delete-after-read"
# Same pattern for other source types:
$tmpPrompt = [System.IO.Path]::GetTempFileName()
[System.IO.File]::WriteAllText($tmpPrompt, $systemPrompt, [System.Text.Encoding]::UTF8)
Start-Process -NoNewWindow -FilePath python -ArgumentList "scripts/audit_worker.py --session-id $sid --source-type prompt --file $tmpPrompt --delete-after-read"
- 5. Review results: INLINECODE37
- All outcomes (detected, clean, skipped) are logged for complete audit trail.
Retention
Default: 7 days. Cleanup:
CODEBLOCK6
Dry run first:
CODEBLOCK7
Input Size Limit
Maximum input: 32,768 characters (32K). Content exceeding this limit is truncated to the first 32K characters. The audit record carries truncated: true and original input_chars count.
Audit Record Schema
Every scan invocation writes an NDJSON record — including clean and skipped outcomes.
Each NDJSON line contains:
- -
event_id — UUID - INLINECODE43 — Caller-provided session ID (required)
- INLINECODE44 — One of: input, prompt, context, knowledgebase
- INLINECODE45 —
detected, clean, or INLINECODE48 - INLINECODE49 — Array of detected PII types (detected only)
- INLINECODE50 — Array of matched regions/country codes (detected only)
- INLINECODE51 — high or low (detected only)
- INLINECODE52 — Number of PII matches
- INLINECODE53 — Array of {label, confidence, maskedpreview, region} (detected only)
- INLINECODE54 — SHA256 prefix for dedup (no raw content stored)
- INLINECODE55 — Original input size in characters
- INLINECODE56 — Whether input was truncated to 32K
- INLINECODE57 — ISO 8601 UTC timestamp
Safety Rules
- - NEVER store raw sensitive values — only masked previews + content hash
- NEVER pass content via
--text in background scans — use --file + INLINECODE60 - Audit logs are local-only, never transmitted externally
- All file I/O uses UTF-8 encoding explicitly, with file locking for concurrent safety
- No external dependencies — stdlib only
- Input capped at 32K characters to prevent resource exhaustion
Configuration
Environment variable override for audit output directory:
CODEBLOCK8
See references/patterns.md for detection pattern details.
技能名称: openclaw-security
OpenClaw Security - PII 审计技能
面向 OpenClaw 会话的多区域异步 PII 检测引擎。可检测 10 个国家/地区司法管辖区内的 8 类敏感个人数据,并以 NDJSON 格式在本地记录审计事件。
中文速览(PII 审计)
基本信息
- - 技能名称:openclaw-security
- 能力:多区域异步 PII 检测,支持后台审计与本地合规留痕
检测范围
- - 8 类标签:PHONE / EMAIL / PERSONNAME / ADDRESS / PASSPORT / BANKCARD / NATIONALID / SOCIALACCOUNT
- 10 区域:CN / US / AU / SG / MY / TH / ID / DE / UK / FR(支持 +CC 国际手机号)
- 来源类型:input / prompt / context / knowledge_base
关键规则
- - 风险分级:high(证件/银行卡或组合信息),low(单一弱标识)
- 智能采样:input 100%(5分钟),prompt 20%(24小时),context 20%(1小时),knowledge_base 100%(24小时)
- 调用方无需自行判断是否跳过扫描;如需强制扫描,使用 --no-cache
- 后台扫描禁止使用 --text,请使用 --file + --delete-after-read
- 输入上限 32,768 字符,超限截断并记录 truncated: true
- 审计结果本地 NDJSON 落盘,默认保留 7 天,可 cleanup.py --dry-run 先演练
快速开始
通过文件扫描(推荐用于后台/自动化扫描):
powershell
python scripts/auditworker.py --session-id SESSION001 --source-type input --file content.txt
通过文件扫描 + 自动删除(安全的临时文件工作流):
powershell
python scripts/auditworker.py --session-id SESSION001 --source-type input --file tmp_scan.txt --delete-after-read
通过标准输入扫描:
powershell
echo 张三的手机号是13812345678 | python scripts/auditworker.py --session-id SESSION001 --source-type input
快速手动测试(警告:内容在进程列表中可见):
powershell
python scripts/audit_worker.py --session-id S001 --source-type input --text short test --json
来源类型
- - input — 用户输入文本
- prompt — 系统或用户提示
- context — 对话上下文
- knowledge_base — 知识库内容
检测标签
PHONE, EMAIL, PERSONNAME, ADDRESS, PASSPORT, BANKCARD, NATIONALID, SOCIALACCOUNT
支持的区域
CN, US, AU, SG, MY, TH, ID, DE, UK, FR (+ 通过 +CC 手机前缀的国际号码)
风险等级
- - high: 检测到 NATIONALID / PASSPORT / BANKCARD,或 PERSONNAME + 联系信息 + ADDRESS 的组合
- low: 单一弱标识符(单独的 EMAIL, SOCIALACCOUNT, PHONE)
智能采样
审计工作器包含内置的智能采样功能,以高效处理大型上下文:
- - 用户输入 (input): 100% 扫描率,5分钟缓存 TTL — 每条用户消息都会被扫描,但 5 分钟内的相同重复内容会被跳过。
- 系统提示 (prompt): 20% 扫描率,24小时缓存 TTL — 提示很少更改;首次扫描会缓存 24 小时。
- 对话上下文 (context): 20% 扫描率,1小时缓存 TTL — 上下文高度重叠;仅采样每 5 次提交中的 1 次。
- 知识库 (knowledge_base): 100% 首次扫描率,24小时缓存 TTL — 静态内容会被完整扫描一次,然后在 24 小时内去重。
手动/强制扫描时绕过采样:
powershell
python scripts/audit_worker.py --session-id S001 --source-type context --text text --no-cache
异步审计工作流
当作为后台任务审计会话内容时:
- 1. 先响应用户 — 切勿因审计而阻塞主响应。
- 提供所有内容类型 — 脚本会根据采样配置和缓存内部决定是否实际扫描。Agent 无需决定何时跳过。
- 使用临时文件 + --delete-after-read — 在后台扫描中切勿通过 --text 传递内容。将内容写入临时文件,传递 --file,让脚本自动删除它。
- 在后台运行审计:
powershell
步骤 1:将内容写入临时文件(命令行参数中无 PII)
$tmpFile = [System.IO.Path]::GetTempFileName()
[System.IO.File]::WriteAllText($tmpFile, $userInput, [System.Text.Encoding]::UTF8)
步骤 2:后台扫描 — 脚本读取并删除临时文件
Start-Process -NoNewWindow -FilePath python -ArgumentList scripts/audit_worker.py --session-id $sid --source-type input --file $tmpFile --delete-after-read
其他来源类型的相同模式:
$tmpPrompt = [System.IO.Path]::GetTempFileName()
[System.IO.File]::WriteAllText($tmpPrompt, $systemPrompt, [System.Text.Encoding]::UTF8)
Start-Process -NoNewWindow -FilePath python -ArgumentList scripts/audit_worker.py --session-id $sid --source-type prompt --file $tmpPrompt --delete-after-read
- 5. 查看结果:openclaw-security-audit/YYYY-MM-DD/events.ndjson
- 所有结果(检测到、干净、跳过)都会被记录,以形成完整的审计追踪。
保留策略
默认:7 天。清理:
powershell
python scripts/cleanup.py --days 7
先进行演练:
powershell
python scripts/cleanup.py --days 7 --dry-run
输入大小限制
最大输入:32,768 字符(32K)。超过此限制的内容将被截断为前 32K 字符。审计记录会携带 truncated: true 和原始 input_chars 计数。
审计记录模式
每次扫描调用都会写入一条 NDJSON 记录 — 包括 clean 和 skipped 结果。
每条 NDJSON 行包含:
- - eventid — UUID
- sessionid — 调用方提供的会话 ID(必需)
- sourcetype — 其中之一:input, prompt, context, knowledgebase
- status — detected, clean, 或 skipped
- labels — 检测到的 PII 类型数组(仅检测到时)
- regions — 匹配的区域/国家代码数组(仅检测到时)
- risklevel — high 或 low(仅检测到时)
- matchedcount — PII 匹配数量
- matches — {label, confidence, maskedpreview, region} 数组(仅检测到时)
- contenthash — 用于去重的 SHA256 前缀(不存储原始内容)
- inputchars — 原始输入的字符数
- truncated — 输入是否被截断为 32K
- createdat — ISO 8601 UTC 时间戳
安全规则
- - 切勿存储原始敏感值 — 仅存储掩码预览 + 内容哈希
- 在后台扫描中切勿通过 --text 传递内容 — 使用 --file + --delete-after-read
- 审计日志仅限本地,绝不对外传输
- 所有文件 I/O 明确使用 UTF-8 编码,并采用文件锁定以确保并发安全
- 无外部依赖 — 仅使用标准库
- 输入上限为 32K 字符,以防止资源耗尽
配置
审计输出目录的环境变量覆盖:
powershell
$env:OPENCLAWAUDITDIR = C:\path\to\custom\audit\dir
检测模式详情请参见 references/patterns.md。