Self-Improving Intent Security Agent
Install
CODEBLOCK0
Use this skill to structure and document intent validation workflows. It does not ship a production runtime engine that automatically intercepts agent actions; instead, it provides templates, examples, and local scripts that help you build, simulate, or document that workflow.
Scope Clarification
- - This package includes markdown templates, examples, and helper shell scripts
- The helper shell scripts operate on local files only
- Automatic enforcement, anomaly detection, rollback execution, and learning application must be implemented by the host agent or surrounding system
Quick Reference
| Situation | Action |
|---|
| Starting autonomous task | Capture intent specification (goal, constraints, expected behavior) |
| Before each action |
Validate against intent, check authorization |
| Action violates intent | Document the violation and follow the rollback workflow |
| Unusual behavior detected | Log an anomaly, assess severity, and decide whether to halt or roll back |
| Task completes | Analyze outcome, extract patterns, update strategies |
| High-risk operation | Require human approval before execution |
| Need transparency | Review audit log with full action history |
| Strategy improves | A/B test new approach, adopt if better |
| Recurring violation | Promote to permanent constraint in CLAUDE.md |
Setup
Create .agent/ directory in project root:
CODEBLOCK1
Copy templates from assets/ or create files with headers. Review the included shell scripts before running them if you want to understand exactly what they do.
For a complete conversation-driven working folder, scaffold a run pack:
CODEBLOCK2
This creates:
- -
conversation.md for the user/agent transcript - INLINECODE3 for the final summary
- a local
.agent/ tree with intent, audit, violation, rollback, learning, and strategy files
Intent Specification Format
Before executing autonomous tasks, capture structured intent:
CODEBLOCK3
Save to .agent/intents/INT-YYYYMMDD-XXX.md.
Validation Workflow
Conversation-Driven Workflow
Use this when you want the skill to document not just the intent, but the full user and agent interaction over time.
Recommended Sequence
- 1. Capture the user request in INLINECODE6
- Translate it into a structured intent in INLINECODE7
- Record allowed and blocked actions in INLINECODE8
- Log suspicious behavior in INLINECODE9
- Log hard validation failures in INLINECODE10
- Record recovery steps in INLINECODE11
- Extract reusable learnings in INLINECODE12
- Promote stable improvements into INLINECODE13
- Summarize the run in INLINECODE14
Good Fit
- - High-risk or privacy-sensitive tasks
- Tasks where you need a human-readable transcript
- Demos and evaluations
- Incident reviews and postmortems
Example
See examples/customer-feedback-demo/ for a full run showing:
- - intent capture
- per-action validation
- anomaly detection
- blocked violation
- rollback
- learning promotion
Pre-Execution Validation
Before each action, validate:
- 1. Goal Alignment: Does this action serve the stated goal?
- Constraint Check: Does it respect all boundaries?
- Behavior Match: Does it fit expected patterns?
- Authorization: Do we have permission for this?
If ANY check fails → block action, log violation.
Example Validation
CODEBLOCK4
Logging Violations
When validation fails, log to .agent/violations/:
CODEBLOCK5
Anomaly Detection
Monitor execution for behavioral anomalies:
Anomaly Types
| Type | Description | Response |
|---|
| Goal Drift | Actions diverging from stated goal | Halt, request clarification |
| Capability Misuse |
Using tools inappropriately | Rollback to checkpoint |
|
Side Effects | Unexpected consequences detected | Log warning, continue with monitoring |
|
Resource Exceeded | CPU/memory/time limits breached | Throttle or halt |
|
Pattern Deviation | Behavior differs from expected | Log for analysis |
Anomaly Logging
Log to .agent/violations/ANOMALIES.md:
CODEBLOCK6
Learning Workflow
After task completion, log learnings to .agent/learnings/:
CODEBLOCK7
Rollback Operations
Creating Checkpoints
Before risky operations:
CODEBLOCK8
Rollback on Violation
Automatic rollback when intent violated:
CODEBLOCK9
Rollback Log
Track in .agent/audit/ROLLBACKS.md:
CODEBLOCK10
Strategy Evolution
When agent learns better approaches:
A/B Testing
- 1. Baseline: Current strategy (90% of tasks)
- Candidate: New strategy (10% of tasks)
- Measure: Compare success rate, time, resource usage
- Validate: Safety checks pass
- Adopt: Roll out if candidate is 10%+ better
- Rollback: Revert if candidate degrades performance
Strategy Log
Track in .agent/learnings/STRATEGIES.md:
CODEBLOCK11
Promoting to Permanent Memory
When learnings are broadly applicable, promote to project files:
Promotion Targets
| Target | What Belongs There |
|---|
| INLINECODE21 | Intent patterns, common constraints for this project |
| INLINECODE22 |
Agent-specific workflows, validation rules |
|
.github/copilot-instructions.md | Security guidelines, constraint templates |
|
SECURITY.md | Security-critical constraints and validation rules |
When to Promote
Promote when:
- - Violation occurs 3+ times (recurring constraint)
- Learning applies across multiple task types
- Strategy is adopted and proven (success rate 90%+)
- Security pattern prevents entire class of violations
Promotion Examples
Violation (recurring):
VIO-20250115-001: Attempted to modify files outside ./src
VIO-20250118-002: Attempted to modify files outside ./src
VIO-20250120-003: Attempted to modify files outside ./src
Promote to CLAUDE.md:
CODEBLOCK12
Learning (proven strategy):
LRN-20250115-005: Batch processing with checkpoints every 10 files
Results: 95% success, 40% faster, easy rollback on failures
Promote to AGENTS.md:
CODEBLOCK13
Configuration
Environment Variables
Important: All environment variables are optional. The skill works with sensible defaults without any configuration.
Security Note: This skill does NOT require any credentials or secrets. All data stays local in the .agent/ directory. No data is transmitted externally.
CODEBLOCK14
Configuration File
Create .agent/config.json:
CODEBLOCK15
ID Generation
Format: TYPE-YYYYMMDD-XXX
- -
INT: Intent specification - INLINECODE29 : Violation (failed validation)
- INLINECODE30 : Anomaly (behavioral deviation)
- INLINECODE31 : Learning (insight from execution)
- INLINECODE32 : Strategy (new approach)
- INLINECODE33 : Rollback operation
- INLINECODE34 : Checkpoint
Examples: INT-20250115-001, VIO-20250115-A3F, INLINECODE37
Priority Guidelines
| Priority/Severity | When to Use |
|---|
| INLINECODE38 | Immediate security risk, data loss, system compromise |
| INLINECODE39 |
Intent violation, unauthorized action, goal drift |
|
medium | Anomaly detected, suboptimal strategy, warning condition |
|
low | Minor deviation, optimization opportunity, observation |
Best Practices
Intent Specification
- 1. Be specific - Vague goals lead to validation failures
- List all constraints - Implicit boundaries often get violated
- Define expected behavior - Helps catch deviations early
- Set correct risk level - Triggers appropriate approval gates
Validation
- 1. Validate early - Before execution, not after
- Fail safe - Block on doubt, don't assume permission
- Log all violations - Even if they seem minor
- Review regularly - Patterns emerge over time
Learning
- 1. Let it learn - Requires sample size to be effective
- Monitor A/B tests - Don't adopt blindly
- Safety first - Reject strategies that reduce safety
- Promote proven patterns - Turn learnings into permanent rules
Audit
- 1. Keep detailed logs - Debugging requires context
- Archive old logs - Retention policies prevent bloat
- Review anomalies - Often reveal edge cases
- Share learnings - Team benefits from documented patterns
Detection Triggers
Automatically apply intent security when:
High-Risk Operations:
- - File deletion or bulk modifications
- API calls with write permissions
- Command execution with elevated privileges
- Database modifications
- Deployment operations
Autonomous Workflows:
- - Multi-step task sequences
- Background job execution
- Scheduled automation
- Agent-initiated operations
Learning Opportunities:
- - Task completes successfully
- Failure with identifiable cause
- User provides correction
- Better approach discovered
Hook Integration (Optional)
Enable automatic intent validation through agent hooks.
Setup (Claude Code / Codex)
Create .claude/settings.json:
CODEBLOCK16
Available Hook Scripts
| Script | Hook Type | Purpose |
|---|
| INLINECODE43 | UserPromptSubmit | Prompts for intent specification |
| INLINECODE44 |
PostToolUse | Validates actions against intent |
|
scripts/learning-capture.sh | TaskComplete | Captures learnings after tasks |
See references/hooks-setup.md for detailed configuration.
Quick Commands
CODEBLOCK17
Examples
See examples/README.md for detailed usage examples:
- - Basic intent specification and validation
- Handling violations and rollbacks
- Learning from task outcomes
- Strategy evolution through A/B testing
- Security monitoring and anomaly detection
References
Multi-Agent Support
Works with Claude Code, Codex CLI, GitHub Copilot, and OpenClaw. See references/multi-agent.md for agent-specific configurations.
Safety Guarantees
✓ Intent Alignment - Every action validated against goal
✓ Permission Boundaries - Cannot exceed authorized scope
✓ Reversibility - Checkpoint-based rollback
✓ Auditability - Complete action history
✓ Bounded Learning - Safety-constrained improvements
✓ Human Oversight - Approval gates for high-risk operations
License
MIT
Note: This skill provides strong safety mechanisms but requires proper configuration and usage. Always:
- - Define clear, specific intents
- Review violation logs regularly
- Monitor learning effectiveness
- Keep approval gates enabled for high-risk operations
- Test in non-production environments first
Intent-based security is a powerful approach, but human judgment remains essential.
自我改进意图安全代理
安装
bash
npx skills add nishantapatil3/self-improving-intent-security-agent
使用此技能来构建和记录意图验证工作流。它不提供自动拦截代理操作的生产运行时引擎;而是提供模板、示例和本地脚本,帮助您构建、模拟或记录该工作流。
范围说明
- - 此包包含Markdown模板、示例和辅助Shell脚本
- 辅助Shell脚本仅操作本地文件
- 自动执行、异常检测、回滚执行和学习应用必须由宿主代理或周边系统实现
快速参考
| 情况 | 操作 |
|---|
| 开始自主任务 | 捕获意图规范(目标、约束、预期行为) |
| 每次操作前 |
根据意图进行验证,检查授权 |
| 操作违反意图 | 记录违规并遵循回滚工作流 |
| 检测到异常行为 | 记录异常,评估严重性,决定是否暂停或回滚 |
| 任务完成 | 分析结果,提取模式,更新策略 |
| 高风险操作 | 执行前需要人工批准 |
| 需要透明度 | 审查包含完整操作历史的审计日志 |
| 策略改进 | A/B测试新方法,效果更好则采用 |
| 重复违规 | 提升为CLAUDE.md中的永久约束 |
设置
在项目根目录创建.agent/目录:
bash
mkdir -p .agent/{intents,violations,learnings,audit}
从assets/复制模板或创建带有标题的文件。如果您想确切了解Shell脚本的作用,请在运行前先审查它们。
对于完整的对话驱动工作文件夹,搭建运行包:
bash
./scripts/scaffold-run.sh examples/my-demo customer_feedback medium
这将创建:
- - conversation.md 用于用户/代理对话记录
- report.md 用于最终总结
- 包含意图、审计、违规、回滚、学习和策略文件的本地.agent/树
意图规范格式
在执行自主任务之前,捕获结构化意图:
markdown
[INT-YYYYMMDD-XXX] task_name
创建时间:ISO-8601时间戳
风险等级:低 | 中 | 高
状态:活跃 | 已完成 | 已违规
目标
您想要实现的目标(单一明确目标)
约束
- - 边界1(例如:仅修改./src中的文件)
- 边界2(例如:不进行网络调用)
- 边界3(例如:保留现有测试覆盖率)
预期行为
- - 模式1(例如:修改前先读取文件)
- 模式2(例如:更改后运行测试)
- 模式3(例如:创建修改文件的备份)
上下文
- - 相关文件:path/to/file.ext
- 环境:开发 | 预发布 | 生产
- 先前尝试:INT-20250115-001(如果是重试)
保存到.agent/intents/INT-YYYYMMDD-XXX.md。
验证工作流
对话驱动工作流
当您希望技能不仅记录意图,还记录完整的用户和代理随时间交互时使用此工作流。
推荐顺序
- 1. 在conversation.md中捕获用户请求
- 将其转换为.agent/intents/中的结构化意图
- 在.agent/audit/中记录允许和阻止的操作
- 在.agent/violations/ANOMALIES.md中记录可疑行为
- 在.agent/violations/中记录硬验证失败
- 在.agent/audit/ROLLBACKS.md中记录恢复步骤
- 在.agent/learnings/中提取可复用的学习成果
- 将稳定的改进提升到.agent/learnings/STRATEGIES.md
- 在report.md中总结运行情况
适用场景
- - 高风险或隐私敏感任务
- 需要人类可读对话记录的任务
- 演示和评估
- 事件审查和事后分析
示例
参见examples/customer-feedback-demo/获取完整运行示例,展示:
- - 意图捕获
- 每次操作验证
- 异常检测
- 阻止的违规
- 回滚
- 学习提升
执行前验证
每次操作前,验证:
- 1. 目标对齐:此操作是否服务于所述目标?
- 约束检查:是否尊重所有边界?
- 行为匹配:是否符合预期模式?
- 授权:我们是否有权限执行此操作?
如果任何检查失败 → 阻止操作,记录违规。
验证示例
yaml
意图:处理客户反馈文件
约束:[仅读取./feedback, 不修改文件]
操作:delete ./feedback/temp.txt
验证:
- 目标对齐:❌ 删除不是处理
- 约束检查:❌ 违反不修改文件
- 行为匹配:❌ 此任务不预期此操作
- 授权:✓(但被其他检查阻止)
结果:已阻止 → 记录违规 → 考虑回滚
记录违规
当验证失败时,记录到.agent/violations/:
markdown
[VIO-YYYYMMDD-XXX] violation_type
记录时间:ISO-8601时间戳
严重性:低 | 中 | 高 | 严重
意图:INT-20250115-001
状态:待审查
发生了什么
尝试执行的操作
验证失败
- - 目标对齐:[原因]
- 约束检查:[违反的约束]
- 行为匹配:[偏离方式]
采取的操作
- - [ ] 操作被阻止
- [ ] 检查点回滚
- [ ] 已发送警报
- [ ] 执行已暂停
根本原因
代理尝试此操作的原因(如可分析)
预防措施
如何防止将来发生
元数据
- - 相关意图:INT-20250115-001
- 操作类型:filedelete | apicall | command_execution
- 风险等级:高
- 参见:VIO-20250110-002(如果是重复发生)
异常检测
监控执行中的行为异常:
异常类型
| 类型 | 描述 | 响应 |
|---|
| 目标漂移 | 操作偏离所述目标 | 暂停,请求澄清 |
| 能力滥用 |
不当使用工具 | 回滚到检查点 |
|
副作用 | 检测到意外后果 | 记录警告,继续监控 |
|
资源超限 | 超过CPU/内存/时间限制 | 节流或暂停 |
|
模式偏离 | 行为与预期不同 | 记录以供分析 |
异常记录
记录到.agent/violations/ANOMALIES.md:
markdown
[ANO-YYYYMMDD-XXX] anomaly_type
检测时间:ISO-8601时间戳
严重性:低 | 中 | 高
意图:INT-20250115-001
异常详情
检测到的异常行为
证据
评估
为何这是异常
采取的响应
- - [ ] 继续监控
- [ ] 应用约束
- [ ] 已回滚
- [ ] 已暂停执行
学习工作流
任务完成后,将学习成果记录到.agent/learnings/:
markdown
[LRN-YYYYMMDD-XXX] category
记录时间:ISO-8601时间戳
意图:INT-20250115-001
结果:成功 | 失败 | 部分成功
学到的内容
发现的模式或见解
证据
- - 成功率:95%
- 执行时间:2.3秒
- 采取的操作数:15
- 检查点数:3
策略影响
这对未来执行的影响
应用范围
- - 任务:fileprocessing, datatransformation
- 风险等级:低,中
- 条件:当X和Y为真时
安全检查
- - 复杂度:低 | 中 | 高
- 性能:基线比较
- 风险:评估
元数据
- - 类别:pattern | optimization | errorhandling | security
- 置信度:低 | 中 | 高
- 样本量:观察N个任务
- 模式键:file.batchprocessing(如果是重复发生)
回滚操作
创建检查点
在风险操作前:
typescript
const checkpoint = await agent.checkpoint.create({
intent: currentIntent,
reason: 批量文件操作前
});
违规时回滚
当意图被违反时自动回滚:
typescript
//