SentrySkills - Always-On Security Guard
SentrySkills is designed to run AUTOMATICALLY on every task. It provides three-stage protection (preflight → runtime → output) with predictive risk analysis.
⚠️ Installation ≠ Automatic Activation
After installing SentrySkills, you must configure your AGENTS.md to make it run automatically. See install/firsttime_setup.md for details.
How It Works
On every task, SentrySkills:
- 1. Preflight - Analyzes user intent and planned actions
- Runtime - Monitors behavior during execution
- Output - Redacts sensitive data before response
- Predictive - Warns about potential risks
Quick Enable (One Command)
Add this to your ~/.codex/AGENTS.md:
CODEBLOCK0
Restart Codex and you're protected!
When to Use SentrySkills
Use SentrySkills when you need AI agents to operate safely with:
- - Sensitive data access - Agents reading credentials, secrets, or private information
- System modifications - Agents executing commands, writing files, or changing configurations
- External communications - Agents making network requests or calling external APIs
- Code generation - Agents producing code that might contain vulnerabilities
- Production environments - Any scenario where security cannot be compromised
- Multi-turn conversations - Detect subtle manipulation across multiple interactions
Examples:
CODEBLOCK1
Skill Package Structure
This is a skill package that orchestrates multiple sub-skills:
- 1. using-sentryskills - User-facing entry point
- sentryskills-orchestrator - Central coordination
- sentryskills-preflight - Pre-execution checks
- sentryskills-runtime - Runtime monitoring
- sentryskills-output - Output validation & redaction
Each sub-skill has its own SKILL.md with specific requirements.
Execution Requirements
- 1. Run guard checks before each external output
- Process sequence: INLINECODE2
- Block: Prohibit original response, must refuse or redact
- Downgrade: Must downgrade expression and declare uncertainty
- Explanatory responses must also go through output guard
Recommended Usage
Default (turn_dir layout)
CODEBLOCK2
With summary output
CODEBLOCK3
Legacy event stream
CODEBLOCK4
Mandatory Logging Protocol
- 1. Text-only judgment is prohibited - Runtime hook must execute each round
- Input JSON must include
project_path (absolute path to avoid drift) - Final response must provide:
-
self_guard_final_action
-
self_guard_trace_id
-
self_guard_events_log (path to index or legacy events)
- 4. If script execution fails, declare "security self-check not completed" and adopt conservative output strategy
Default Log Layout
Log root: INLINECODE7
Per-turn directories:
Global index:
Session state:
Policy Profiles
- - balanced: Standard security (default)
- strict: Maximum security
- permissive: Minimal interference
Detection Coverage
Preflight Stage
- - Prompt injection patterns
- Malicious intent detection
- Sensitive topic inference
- Action classification
Runtime Stage
- - Event monitoring
- Source tracking
- Anomaly detection
- Behavioral analysis
Output Stage
- - Sensitive data redaction
- Source disclosure handling
- Confidence assessment
- Safe response generation
Predictive Analysis
- - Resource exhaustion prediction
- Scope creep detection
- Privilege escalation warning
- Data exfiltration path analysis
- Multi-turn grooming detection
Integration
As Codex Skill
Copy to skills/sentryskills/ and reference in agent configuration.
Configuration Files
- -
shared/references/runtime_policy.*.json - Security policy profiles - INLINECODE14 - Detection rule definitions
- INLINECODE15 - Input validation schema
Testing
CODEBLOCK5
Event Types
The system emits structured events for:
- -
preflight_result - Pre-execution check outcome - INLINECODE17 - Runtime monitoring outcome
- INLINECODE18 - Output validation outcome
- INLINECODE19 - Risk prediction (if enabled)
- INLINECODE20 - Overall decision with rationale
- INLINECODE21 - Completion with duration
Each event includes:
- - Trace ID for correlation
- Decision (block/downgrade/allow/continue)
- Reason codes
- Matched rules
- Metadata
Performance
- - Typical latency: 50-100ms per check
- Memory: <50MB baseline
- Zero external dependencies (Python stdlib only)
Security Properties
- - No data exfiltration: All processing is local
- No LLM calls: Pure rule-based and heuristic
- Audit trail: Complete event log for compliance
- Transparent: All decisions include reason codes
SentrySkills - 始终在线的安全卫士
SentrySkills 旨在自动运行于每个任务之上。 它通过预测性风险分析提供三阶段保护(预检 → 运行时 → 输出)。
⚠️ 安装 ≠ 自动激活
安装 SentrySkills 后,您必须配置 AGENTS.md 文件以使其自动运行。详情请参阅 install/firsttime_setup.md。
工作原理
在每个任务中,SentrySkills 执行以下操作:
- 1. 预检 - 分析用户意图和计划执行的操作
- 运行时 - 监控执行过程中的行为
- 输出 - 在响应前对敏感数据进行脱敏处理
- 预测 - 预警潜在风险
快速启用(一条命令)
将此内容添加到您的 ~/.codex/AGENTS.md 文件中:
bash
cat >> ~/.codex/AGENTS.md << EOF
SentrySkills - 安全优先
在每次响应之前,运行:
python ~/.codex/sentryskills/shared/scripts/self
guardruntime
hooktemplate.py \
input.json \
--policy-profile balanced \
--out result.json
然后检查 result.json 中的 final_action 并遵循其指示。
EOF
重启 Codex,您就受到保护了!
何时使用 SentrySkills
当您需要 AI 代理在以下场景中安全运行时,请使用 SentrySkills:
- - 敏感数据访问 - 代理读取凭据、密钥或隐私信息
- 系统修改 - 代理执行命令、写入文件或更改配置
- 外部通信 - 代理发起网络请求或调用外部 API
- 代码生成 - 代理生成可能包含漏洞的代码
- 生产环境 - 任何安全不容妥协的场景
- 多轮对话 - 检测跨多次交互的细微操控行为
示例:
✅ 使用:当代理需要读取环境变量或配置文件时
✅ 使用:当代理被要求执行 shell 命令时
✅ 使用:当代理生成数据库查询或 API 调用时
✅ 使用:当代理修改系统文件或配置时
❌ 跳过:对公共文档的简单只读查询
❌ 跳过:无需系统访问权限的基本解释
技能包结构
这是一个技能包,协调多个子技能:
- 1. using-sentryskills - 面向用户的入口点
- sentryskills-orchestrator - 中央协调
- sentryskills-preflight - 执行前检查
- sentryskills-runtime - 运行时监控
- sentryskills-output - 输出验证与脱敏
每个子技能都有各自的 SKILL.md 文件,包含具体要求。
执行要求
- 1. 在每次外部输出之前运行防护检查
- 处理顺序:预检 → 运行时 → 输出防护 → 最终决策
- 阻止:禁止原始响应,必须拒绝或脱敏
- 降级:必须降低表达强度并声明不确定性
- 解释性响应也必须经过输出防护
推荐用法
默认(turn_dir 布局)
bash
python shared/scripts/selfguardruntimehooktemplate.py \
shared/references/input_schema.json \
--policy shared/references/runtime_policy.balanced.json \
--policy-profile balanced
带摘要输出
bash
python shared/scripts/selfguardruntimehooktemplate.py \
shared/references/input_schema.json \
--out ./sentryskilllog/sentryskills_summary.json
传统事件流
bash
python shared/scripts/selfguardruntimehooktemplate.py \
shared/references/input_schema.json \
--log-layout legacy \
--events-log ./sentryskilllog/sentryskills_events.jsonl
强制日志记录协议
- 1. 禁止仅文本判断 - 运行时钩子必须在每轮中执行
- 输入 JSON 必须包含 project_path(绝对路径,避免漂移)
- 最终响应必须提供:
- self
guardfinal_action
- self
guardtrace_id
- self
guardevents_log(索引或传统事件的路径)
- 4. 如果脚本执行失败,声明安全自检未完成并采用保守输出策略
默认日志布局
日志根目录:./sentryskilllog/
每轮目录:
- - ./sentryskilllog/turns/YYYYMMDDHHMMSSid>/input.json
- ./sentryskilllog/turns/YYYYMMDDHHMMSSid>/result.json
全局索引:
- - ./sentryskilllog/index.jsonl
会话状态:
- - ./sentryskilllog/.selfguardstate/
策略配置文件
- - balanced:标准安全级别(默认)
- strict:最高安全级别
- permissive:最小干预
检测覆盖范围
预检阶段
运行时阶段
输出阶段
- - 敏感数据脱敏
- 来源披露处理
- 置信度评估
- 安全响应生成
预测分析
- - 资源耗尽预测
- 范围蔓延检测
- 权限提升预警
- 数据外泄路径分析
- 多轮诱导检测
集成
作为 Codex 技能
复制到 skills/sentryskills/ 目录,并在代理配置中引用。
配置文件
- - shared/references/runtimepolicy.*.json - 安全策略配置文件
- shared/references/detectionrules.json - 检测规则定义
- shared/references/input_schema.json - 输入验证模式
测试
bash
测试预测分析
python test
predictiveanalysis.py
测试集成
python test_integration.py
事件类型
系统为以下事件发出结构化事件:
- - preflightresult - 执行前检查结果
- runtimeresult - 运行时监控结果
- outputguardresult - 输出验证结果
- predictiveanalysisresult - 风险预测(如启用)
- finaldecision - 包含理由的总体决策
- hookend - 完成及耗时
每个事件包含:
- - 用于关联的追踪 ID
- 决策(阻止/降级/允许/继续)
- 原因代码
- 匹配的规则
- 元数据
性能
- - 典型延迟:每次检查 50-100ms
- 内存:<50MB 基线
- 零外部依赖(仅使用 Python 标准库)
安全特性
- - 无数据外泄:所有处理均在本地完成
- 无 LLM 调用:纯基于规则和启发式方法
- 审计追踪:完整的事件日志,满足合规要求
- 透明:所有决策均包含原因代码