Self-Improving Intent Security Agent

Install

CODEBLOCK0

Use this skill to structure and document intent validation workflows. It does not ship a production runtime engine that automatically intercepts agent actions; instead, it provides templates, examples, and local scripts that help you build, simulate, or document that workflow.

Scope Clarification

- This package includes markdown templates, examples, and helper shell scripts
The helper shell scripts operate on local files only
Automatic enforcement, anomaly detection, rollback execution, and learning application must be implemented by the host agent or surrounding system

Quick Reference

Situation	Action
Starting autonomous task	Capture intent specification (goal, constraints, expected behavior)
Before each action

Setup

Create .agent/ directory in project root:

CODEBLOCK1

Copy templates from assets/ or create files with headers. Review the included shell scripts before running them if you want to understand exactly what they do.

For a complete conversation-driven working folder, scaffold a run pack:

CODEBLOCK2

This creates:

- conversation.md for the user/agent transcript
INLINECODE3 for the final summary
a local .agent/ tree with intent, audit, violation, rollback, learning, and strategy files

Intent Specification Format

Before executing autonomous tasks, capture structured intent:

CODEBLOCK3

Save to .agent/intents/INT-YYYYMMDD-XXX.md.

Validation Workflow

Conversation-Driven Workflow

Use this when you want the skill to document not just the intent, but the full user and agent interaction over time.

Recommended Sequence

1. Capture the user request in INLINECODE6
Translate it into a structured intent in INLINECODE7
Record allowed and blocked actions in INLINECODE8
Log suspicious behavior in INLINECODE9
Log hard validation failures in INLINECODE10
Record recovery steps in INLINECODE11
Extract reusable learnings in INLINECODE12
Promote stable improvements into INLINECODE13
Summarize the run in INLINECODE14

Good Fit

- High-risk or privacy-sensitive tasks
Tasks where you need a human-readable transcript
Demos and evaluations
Incident reviews and postmortems

Example

See examples/customer-feedback-demo/ for a full run showing:

- intent capture
per-action validation
anomaly detection
blocked violation
rollback
learning promotion

Pre-Execution Validation

Before each action, validate:

1. Goal Alignment: Does this action serve the stated goal?
Constraint Check: Does it respect all boundaries?
Behavior Match: Does it fit expected patterns?
Authorization: Do we have permission for this?

If ANY check fails → block action, log violation.

Example Validation

CODEBLOCK4

Logging Violations

When validation fails, log to .agent/violations/:

CODEBLOCK5

Anomaly Detection

Monitor execution for behavioral anomalies:

Anomaly Types

Type	Description	Response
Goal Drift	Actions diverging from stated goal	Halt, request clarification
Capability Misuse

Anomaly Logging

Log to .agent/violations/ANOMALIES.md:

CODEBLOCK6

Learning Workflow

After task completion, log learnings to .agent/learnings/:

CODEBLOCK7

Rollback Operations

Creating Checkpoints

Before risky operations:

CODEBLOCK8

Rollback on Violation

Automatic rollback when intent violated:

CODEBLOCK9

Rollback Log

Track in .agent/audit/ROLLBACKS.md:

CODEBLOCK10

Strategy Evolution

When agent learns better approaches:

A/B Testing

1. Baseline: Current strategy (90% of tasks)
Candidate: New strategy (10% of tasks)
Measure: Compare success rate, time, resource usage
Validate: Safety checks pass
Adopt: Roll out if candidate is 10%+ better
Rollback: Revert if candidate degrades performance

Strategy Log

Track in .agent/learnings/STRATEGIES.md:

CODEBLOCK11

Promoting to Permanent Memory

When learnings are broadly applicable, promote to project files:

Promotion Targets

Target	What Belongs There
INLINECODE21	Intent patterns, common constraints for this project
INLINECODE22

When to Promote

Promote when:

- Violation occurs 3+ times (recurring constraint)
Learning applies across multiple task types
Strategy is adopted and proven (success rate 90%+)
Security pattern prevents entire class of violations

Promotion Examples

Violation (recurring):

VIO-20250115-001: Attempted to modify files outside ./src

VIO-20250118-002: Attempted to modify files outside ./src

VIO-20250120-003: Attempted to modify files outside ./src

Promote to CLAUDE.md:
CODEBLOCK12

Learning (proven strategy):

LRN-20250115-005: Batch processing with checkpoints every 10 files

Results: 95% success, 40% faster, easy rollback on failures

Promote to AGENTS.md:
CODEBLOCK13

Configuration

Environment Variables

Important: All environment variables are optional. The skill works with sensible defaults without any configuration.

Security Note: This skill does NOT require any credentials or secrets. All data stays local in the .agent/ directory. No data is transmitted externally.

CODEBLOCK14

Configuration File

Create .agent/config.json:

CODEBLOCK15

ID Generation

Format: TYPE-YYYYMMDD-XXX

- INT: Intent specification
INLINECODE29: Violation (failed validation)
INLINECODE30: Anomaly (behavioral deviation)
INLINECODE31: Learning (insight from execution)
INLINECODE32: Strategy (new approach)
INLINECODE33: Rollback operation
INLINECODE34: Checkpoint

Examples: INT-20250115-001, VIO-20250115-A3F, INLINECODE37

Priority Guidelines

Priority/Severity	When to Use
INLINECODE38	Immediate security risk, data loss, system compromise
INLINECODE39

Best Practices

Intent Specification

1. Be specific - Vague goals lead to validation failures
List all constraints - Implicit boundaries often get violated
Define expected behavior - Helps catch deviations early
Set correct risk level - Triggers appropriate approval gates

Validation

1. Validate early - Before execution, not after
Fail safe - Block on doubt, don't assume permission
Log all violations - Even if they seem minor
Review regularly - Patterns emerge over time

Learning

1. Let it learn - Requires sample size to be effective
Monitor A/B tests - Don't adopt blindly
Safety first - Reject strategies that reduce safety
Promote proven patterns - Turn learnings into permanent rules

Audit

1. Keep detailed logs - Debugging requires context
Archive old logs - Retention policies prevent bloat
Review anomalies - Often reveal edge cases
Share learnings - Team benefits from documented patterns

Detection Triggers

Automatically apply intent security when:

High-Risk Operations:

- File deletion or bulk modifications
API calls with write permissions
Command execution with elevated privileges
Database modifications
Deployment operations

Autonomous Workflows:

- Multi-step task sequences
Background job execution
Scheduled automation
Agent-initiated operations

Learning Opportunities:

- Task completes successfully
Failure with identifiable cause
User provides correction
Better approach discovered

Hook Integration (Optional)

Enable automatic intent validation through agent hooks.

Setup (Claude Code / Codex)

Create .claude/settings.json:

CODEBLOCK16

Available Hook Scripts

Script	Hook Type	Purpose
INLINECODE43	UserPromptSubmit	Prompts for intent specification
INLINECODE44

See references/hooks-setup.md for detailed configuration.

Quick Commands

CODEBLOCK17

Examples

See examples/README.md for detailed usage examples:

- Basic intent specification and validation
Handling violations and rollbacks
Learning from task outcomes
Strategy evolution through A/B testing
Security monitoring and anomaly detection

References

- Architecture - System design and components
Intent Security - Validation and authorization
Self-Improvement - Learning mechanisms
Hooks Setup - Automation configuration
API Reference - Programmatic usage

Multi-Agent Support

Works with Claude Code, Codex CLI, GitHub Copilot, and OpenClaw. See references/multi-agent.md for agent-specific configurations.

Safety Guarantees

✓ Intent Alignment - Every action validated against goal
✓ Permission Boundaries - Cannot exceed authorized scope
✓ Reversibility - Checkpoint-based rollback
✓ Auditability - Complete action history
✓ Bounded Learning - Safety-constrained improvements
✓ Human Oversight - Approval gates for high-risk operations

License

MIT

Note: This skill provides strong safety mechanisms but requires proper configuration and usage. Always:

- Define clear, specific intents
Review violation logs regularly
Monitor learning effectiveness
Keep approval gates enabled for high-risk operations
Test in non-production environments first

Intent-based security is a powerful approach, but human judgment remains essential.

自我改进意图安全代理

安装

bash
npx skills add nishantapatil3/self-improving-intent-security-agent

使用此技能来构建和记录意图验证工作流。它不提供自动拦截代理操作的生产运行时引擎；而是提供模板、示例和本地脚本，帮助您构建、模拟或记录该工作流。

范围说明

- 此包包含Markdown模板、示例和辅助Shell脚本
辅助Shell脚本仅操作本地文件
自动执行、异常检测、回滚执行和学习应用必须由宿主代理或周边系统实现

快速参考

情况	操作
开始自主任务	捕获意图规范（目标、约束、预期行为）
每次操作前

设置

bash
mkdir -p .agent/{intents,violations,learnings,audit}

从assets/复制模板或创建带有标题的文件。如果您想确切了解Shell脚本的作用，请在运行前先审查它们。

对于完整的对话驱动工作文件夹，搭建运行包：

bash
./scripts/scaffold-run.sh examples/my-demo customer_feedback medium

这将创建：

- conversation.md 用于用户/代理对话记录
report.md 用于最终总结
包含意图、审计、违规、回滚、学习和策略文件的本地.agent/树

意图规范格式

在执行自主任务之前，捕获结构化意图：

markdown

[INT-YYYYMMDD-XXX] task_name

创建时间：ISO-8601时间戳
风险等级：低 | 中 | 高
状态：活跃 | 已完成 | 已违规

目标

您想要实现的目标（单一明确目标）

约束

- 边界1（例如：仅修改./src中的文件）
边界2（例如：不进行网络调用）
边界3（例如：保留现有测试覆盖率）

预期行为

- 模式1（例如：修改前先读取文件）
模式2（例如：更改后运行测试）
模式3（例如：创建修改文件的备份）

上下文

- 相关文件：path/to/file.ext
环境：开发 | 预发布 | 生产
先前尝试：INT-20250115-001（如果是重试）

保存到.agent/intents/INT-YYYYMMDD-XXX.md。

验证工作流

对话驱动工作流

当您希望技能不仅记录意图，还记录完整的用户和代理随时间交互时使用此工作流。

适用场景

- 高风险或隐私敏感任务
需要人类可读对话记录的任务
演示和评估
事件审查和事后分析

示例

参见examples/customer-feedback-demo/获取完整运行示例，展示：

- 意图捕获
每次操作验证
异常检测
阻止的违规
回滚
学习提升

执行前验证

每次操作前，验证：

1. 目标对齐：此操作是否服务于所述目标？
约束检查：是否尊重所有边界？
行为匹配：是否符合预期模式？
授权：我们是否有权限执行此操作？

如果任何检查失败 → 阻止操作，记录违规。

验证示例

yaml
意图：处理客户反馈文件
约束：[仅读取./feedback, 不修改文件]

操作：delete ./feedback/temp.txt
验证：
- 目标对齐：❌ 删除不是处理
- 约束检查：❌ 违反不修改文件
- 行为匹配：❌ 此任务不预期此操作
- 授权：✓（但被其他检查阻止）

结果：已阻止 → 记录违规 → 考虑回滚

记录违规

当验证失败时，记录到.agent/violations/：

markdown

[VIO-YYYYMMDD-XXX] violation_type

记录时间：ISO-8601时间戳
严重性：低 | 中 | 高 | 严重
意图：INT-20250115-001
状态：待审查

发生了什么

尝试执行的操作

验证失败

- 目标对齐：[原因]
约束检查：[违反的约束]
行为匹配：[偏离方式]

采取的操作

- [ ] 操作被阻止
[ ] 检查点回滚
[ ] 已发送警报
[ ] 执行已暂停

根本原因

代理尝试此操作的原因（如可分析）

预防措施

如何防止将来发生

元数据

- 相关意图：INT-20250115-001
操作类型：filedelete | apicall | command_execution
风险等级：高
参见：VIO-20250110-002（如果是重复发生）

异常检测

监控执行中的行为异常：

异常类型

类型	描述	响应
目标漂移	操作偏离所述目标	暂停，请求澄清
能力滥用

异常记录

记录到.agent/violations/ANOMALIES.md：

markdown

[ANO-YYYYMMDD-XXX] anomaly_type

检测时间：ISO-8601时间戳
严重性：低 | 中 | 高
意图：INT-20250115-001

异常详情

检测到的异常行为

证据

- 触发警报的指标
基线值与实际值
偏离时间线

评估

为何这是异常

采取的响应

- [ ] 继续监控
[ ] 应用约束
[ ] 已回滚
[ ] 已暂停执行

学习工作流

任务完成后，将学习成果记录到.agent/learnings/：

markdown

[LRN-YYYYMMDD-XXX] category

记录时间：ISO-8601时间戳
意图：INT-20250115-001
结果：成功 | 失败 | 部分成功

学到的内容

发现的模式或见解

证据

- 成功率：95%
执行时间：2.3秒
采取的操作数：15
检查点数：3

策略影响

这对未来执行的影响

应用范围

- 任务：fileprocessing, datatransformation
风险等级：低，中
条件：当X和Y为真时

安全检查

- 复杂度：低 | 中 | 高
性能：基线比较
风险：评估

元数据

- 类别：pattern | optimization | errorhandling | security
置信度：低 | 中 | 高
样本量：观察N个任务
模式键：file.batchprocessing（如果是重复发生）

回滚操作

创建检查点

在风险操作前：

typescript
const checkpoint = await agent.checkpoint.create({
intent: currentIntent,
reason: 批量文件操作前
});

违规时回滚

当意图被违反时自动回滚：

typescript
//

self-improving-intent-security-agent自优意图安全代理