SentrySkills - Always-On Security Guard

SentrySkills is designed to run AUTOMATICALLY on every task. It provides three-stage protection (preflight → runtime → output) with predictive risk analysis.

⚠️ Installation ≠ Automatic Activation

After installing SentrySkills, you must configure your AGENTS.md to make it run automatically. See install/firsttime_setup.md for details.

How It Works

On every task, SentrySkills:

1. Preflight - Analyzes user intent and planned actions
Runtime - Monitors behavior during execution
Output - Redacts sensitive data before response
Predictive - Warns about potential risks

Quick Enable (One Command)

Add this to your ~/.codex/AGENTS.md:

CODEBLOCK0

Restart Codex and you're protected!

When to Use SentrySkills

Use SentrySkills when you need AI agents to operate safely with:

- Sensitive data access - Agents reading credentials, secrets, or private information
System modifications - Agents executing commands, writing files, or changing configurations
External communications - Agents making network requests or calling external APIs
Code generation - Agents producing code that might contain vulnerabilities
Production environments - Any scenario where security cannot be compromised
Multi-turn conversations - Detect subtle manipulation across multiple interactions

Examples:
CODEBLOCK1

Skill Package Structure

This is a skill package that orchestrates multiple sub-skills:

1. using-sentryskills - User-facing entry point
sentryskills-orchestrator - Central coordination
sentryskills-preflight - Pre-execution checks
sentryskills-runtime - Runtime monitoring
sentryskills-output - Output validation & redaction

Each sub-skill has its own SKILL.md with specific requirements.

Execution Requirements

1. Run guard checks before each external output
Process sequence: INLINECODE2
Block: Prohibit original response, must refuse or redact
Downgrade: Must downgrade expression and declare uncertainty
Explanatory responses must also go through output guard

Recommended Usage

Default (turn_dir layout)

CODEBLOCK2

With summary output

CODEBLOCK3

Legacy event stream

CODEBLOCK4

Mandatory Logging Protocol

1. Text-only judgment is prohibited - Runtime hook must execute each round
Input JSON must include project_path (absolute path to avoid drift)
Final response must provide:

- self_guard_final_action - self_guard_trace_id - self_guard_events_log (path to index or legacy events)

4. If script execution fails, declare "security self-check not completed" and adopt conservative output strategy

Default Log Layout

Log root: INLINECODE7

Per-turn directories:

- INLINECODE8
INLINECODE9

Global index:

- INLINECODE10

Session state:

- INLINECODE11

Policy Profiles

- balanced: Standard security (default)
strict: Maximum security
permissive: Minimal interference

Detection Coverage

Preflight Stage

- Prompt injection patterns
Malicious intent detection
Sensitive topic inference
Action classification

Runtime Stage

- Event monitoring
Source tracking
Anomaly detection
Behavioral analysis

Output Stage

- Sensitive data redaction
Source disclosure handling
Confidence assessment
Safe response generation

Predictive Analysis

- Resource exhaustion prediction
Scope creep detection
Privilege escalation warning
Data exfiltration path analysis
Multi-turn grooming detection

Integration

As Codex Skill

Copy to skills/sentryskills/ and reference in agent configuration.

Configuration Files

- shared/references/runtime_policy.*.json - Security policy profiles
INLINECODE14 - Detection rule definitions
INLINECODE15 - Input validation schema

Testing

CODEBLOCK5

Event Types

The system emits structured events for:

- preflight_result - Pre-execution check outcome
INLINECODE17 - Runtime monitoring outcome
INLINECODE18 - Output validation outcome
INLINECODE19 - Risk prediction (if enabled)
INLINECODE20 - Overall decision with rationale
INLINECODE21 - Completion with duration

Each event includes:

- Trace ID for correlation
Decision (block/downgrade/allow/continue)
Reason codes
Matched rules
Metadata

Performance

- Typical latency: 50-100ms per check
Memory: <50MB baseline
Zero external dependencies (Python stdlib only)

Security Properties

- No data exfiltration: All processing is local
No LLM calls: Pure rule-based and heuristic
Audit trail: Complete event log for compliance
Transparent: All decisions include reason codes

SentrySkills - 始终在线的安全卫士

SentrySkills 旨在自动运行于每个任务之上。 它通过预测性风险分析提供三阶段保护（预检 → 运行时 → 输出）。

⚠️ 安装 ≠ 自动激活

安装 SentrySkills 后，您必须配置 AGENTS.md 文件以使其自动运行。详情请参阅 install/firsttime_setup.md。

工作原理

在每个任务中，SentrySkills 执行以下操作：

1. 预检 - 分析用户意图和计划执行的操作
运行时 - 监控执行过程中的行为
输出 - 在响应前对敏感数据进行脱敏处理
预测 - 预警潜在风险

快速启用（一条命令）

将此内容添加到您的 ~/.codex/AGENTS.md 文件中：

bash
cat >> ~/.codex/AGENTS.md << EOF

SentrySkills - 安全优先

在每次响应之前，运行：
python ~/.codex/sentryskills/shared/scripts/selfguardruntimehooktemplate.py \
input.json \
--policy-profile balanced \
--out result.json
然后检查 result.json 中的 final_action 并遵循其指示。
EOF

重启 Codex，您就受到保护了！

何时使用 SentrySkills

当您需要 AI 代理在以下场景中安全运行时，请使用 SentrySkills：

- 敏感数据访问 - 代理读取凭据、密钥或隐私信息
系统修改 - 代理执行命令、写入文件或更改配置
外部通信 - 代理发起网络请求或调用外部 API
代码生成 - 代理生成可能包含漏洞的代码
生产环境 - 任何安全不容妥协的场景
多轮对话 - 检测跨多次交互的细微操控行为

示例：

✅ 使用：当代理需要读取环境变量或配置文件时
✅ 使用：当代理被要求执行 shell 命令时
✅ 使用：当代理生成数据库查询或 API 调用时
✅ 使用：当代理修改系统文件或配置时
❌ 跳过：对公共文档的简单只读查询
❌ 跳过：无需系统访问权限的基本解释

技能包结构

这是一个技能包，协调多个子技能：

1. using-sentryskills - 面向用户的入口点
sentryskills-orchestrator - 中央协调
sentryskills-preflight - 执行前检查
sentryskills-runtime - 运行时监控
sentryskills-output - 输出验证与脱敏

每个子技能都有各自的 SKILL.md 文件，包含具体要求。

执行要求

1. 在每次外部输出之前运行防护检查
处理顺序：预检 → 运行时 → 输出防护 → 最终决策
阻止：禁止原始响应，必须拒绝或脱敏
降级：必须降低表达强度并声明不确定性
解释性响应也必须经过输出防护

强制日志记录协议

1. 禁止仅文本判断 - 运行时钩子必须在每轮中执行
输入 JSON 必须包含 project_path（绝对路径，避免漂移）
最终响应必须提供：

- selfguardfinal_action - selfguardtrace_id - selfguardevents_log（索引或传统事件的路径）

4. 如果脚本执行失败，声明安全自检未完成并采用保守输出策略

默认日志布局

日志根目录：./sentryskilllog/

每轮目录：

- ./sentryskilllog/turns/YYYYMMDDHHMMSSid>/input.json
./sentryskilllog/turns/YYYYMMDDHHMMSSid>/result.json

全局索引：

- ./sentryskilllog/index.jsonl

会话状态：

- ./sentryskilllog/.selfguardstate/

策略配置文件

- balanced：标准安全级别（默认）
strict：最高安全级别
permissive：最小干预

检测覆盖范围

预检阶段

- 提示注入模式
恶意意图检测
敏感主题推断
操作分类

运行时阶段

- 事件监控
来源追踪
异常检测
行为分析

输出阶段

- 敏感数据脱敏
来源披露处理
置信度评估
安全响应生成

预测分析

- 资源耗尽预测
范围蔓延检测
权限提升预警
数据外泄路径分析
多轮诱导检测

集成

作为 Codex 技能

复制到 skills/sentryskills/ 目录，并在代理配置中引用。

配置文件

- shared/references/runtimepolicy.*.json - 安全策略配置文件
shared/references/detectionrules.json - 检测规则定义
shared/references/input_schema.json - 输入验证模式

测试

bash

测试预测分析

python testpredictiveanalysis.py

测试集成
python test_integration.py
事件类型

系统为以下事件发出结构化事件：

- preflightresult - 执行前检查结果
runtimeresult - 运行时监控结果
outputguardresult - 输出验证结果
predictiveanalysisresult - 风险预测（如启用）
finaldecision - 包含理由的总体决策
hookend - 完成及耗时

每个事件包含：

- 用于关联的追踪 ID
决策（阻止/降级/允许/继续）
原因代码
匹配的规则
元数据

性能

- 典型延迟：每次检查 50-100ms
内存：<50MB 基线
零外部依赖（仅使用 Python 标准库）

安全特性

- 无数据外泄：所有处理均在本地完成
无 LLM 调用：纯基于规则和启发式方法
审计追踪：完整的事件日志，满足合规要求
透明：所有决策均包含原因代码

sentryskills哨兵技能

sentryskills

SentrySkills - Always-On Security Guard

⚠️ Installation ≠ Automatic Activation

How It Works

Quick Enable (One Command)

When to Use SentrySkills

Skill Package Structure

Execution Requirements

Recommended Usage

Default (turn_dir layout)

With summary output

Legacy event stream

Mandatory Logging Protocol

Default Log Layout

Policy Profiles

Detection Coverage

Preflight Stage

Runtime Stage

Output Stage

Predictive Analysis

Integration

As Codex Skill

Configuration Files

Testing

Event Types

Performance

Security Properties

SentrySkills - 始终在线的安全卫士

⚠️ 安装 ≠ 自动激活

工作原理

快速启用（一条命令）

SentrySkills - 安全优先

何时使用 SentrySkills

技能包结构

执行要求

推荐用法

默认（turn_dir 布局）

带摘要输出

传统事件流

强制日志记录协议

默认日志布局

策略配置文件

检测覆盖范围

预检阶段

运行时阶段

输出阶段

预测分析

集成

作为 Codex 技能

配置文件

测试

测试预测分析

测试集成

事件类型

性能

安全特性

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement