Memory Self-Heal Skill
Use this skill when the agent starts failing repeatedly, stalls, or keeps asking the user for steps that could be inferred from prior evidence.
Goals
- 1. Recover execution without user micromanagement
- Reuse previous fixes from memory/logs/tasks
- Escalate only with minimal unblock input when truly blocked
- Leave reusable evidence for future runs
When To Trigger
Trigger when any of these appear:
- - Same or similar error occurs 2+ times in one task
- Tool call fails due to argument mismatch, missing config, auth wall, or context overflow
- Agent claims completion without verifiable artifact
- Task progress stalls (no new artifact across 2 cycles)
Inputs
- - Current task objective
- Latest error/output
- Available evidence locations (memory, tasks, logs)
Portable Evidence Scan Order
Scan these in order; skip missing paths silently:
- 1.
memory/ (or equivalent workspace memory path) - INLINECODE1 or queue files
- runtime logs / channel logs
- skill docs (
skills/*/SKILL.md) for known fallback recipes - core docs (
TOOLS.md, CAPABILITIES.md, AGENTS.md)
Shell examples (use whichever shell is active):
CODEBLOCK0
CODEBLOCK1
Failure Classification
Classify first, then act:
- -
syntax_or_args: command syntax/argument mismatch - INLINECODE7 : key/token/env/config missing or invalid
- INLINECODE8 : timeout, DNS, handshake, region restrictions
- INLINECODE9 : page requires manual login/attach
- INLINECODE10 : context window, rate limit, memory pressure
- INLINECODE11 : no artifact/evidence but reported complete
- INLINECODE12 : no confident class
Recovery Policy (3-Tier)
Attempt 1: Direct Fix
- - Apply best-known fix from memory for same class/signature
- Re-run the smallest validating action
- Record result
Attempt 2: Safe Fallback
- - Switch to alternate tool/path with lower fragility
- Narrow scope (smaller input, shorter query, one target)
- Re-run validation
Attempt 3: Controlled Escalation
- - Mark blocked with minimum unblock input
- Provide exact next action user must do (one command or one UI step)
- Do not loop further until new input arrives
Safety Rules
- - Never auto-run destructive operations without confirmation
- Never log secrets/tokens in memory files
- Max 3 retries per blocker signature per task
- Prefer deterministic steps over broad speculative retries
Completion Contract
Do not claim done unless all are true:
- - At least one artifact exists and is readable (file/link/output)
- The original task objective is explicitly mapped to artifact(s)
- No unresolved blocker for current objective
Required output block:
CODEBLOCK2
Memory Writeback Template
Append one concise entry after each self-heal cycle:
CODEBLOCK3
Generic Known Fixes (Seed Set)
- - Command mismatch on Windows: prefer native PowerShell cmdlets
- Token mismatch/auth failure: verify active config source and token scope
- WebSocket/timeouts: test reachability + proxy/no_proxy consistency
- Context overflow: split task into smaller units and reduce payload
- False completion: enforce artifact validation before final response
Integration Notes
- - Works with autonomy/task-tracker skills but does not depend on them
- If a project has custom memory paths, adapt scan roots dynamically
- Keep entries short to avoid memory bloat
记忆自愈技能
当智能体开始反复失败、卡顿,或持续向用户询问可从先前证据推断出的步骤时,使用此技能。
目标
- 1. 无需用户微观管理即可恢复执行
- 从记忆/日志/任务中复用先前修复方案
- 仅在真正受阻时以最少的解锁输入进行升级
- 为未来运行留下可复用的证据
触发时机
出现以下任一情况时触发:
- - 同一任务中相同或类似错误出现2次以上
- 工具调用因参数不匹配、配置缺失、认证障碍或上下文溢出而失败
- 智能体声称完成但无可验证的产物
- 任务进度停滞(连续2个周期无新产物)
输入
- - 当前任务目标
- 最新错误/输出
- 可用证据位置(记忆、任务、日志)
便携式证据扫描顺序
按顺序扫描;静默跳过缺失路径:
- 1. memory/(或等效的工作空间记忆路径)
- tasks/或队列文件
- 运行时日志/频道日志
- 技能文档(skills/*/SKILL.md)中的已知回退方案
- 核心文档(TOOLS.md、CAPABILITIES.md、AGENTS.md)
Shell示例(使用当前活动的shell):
powershell
PowerShell
Get-ChildItem -Recurse memory, tasks -ErrorAction SilentlyContinue |
Select-String -Pattern error|blocked|retry|fallback|auth|token|proxy|timeout|context -Context 2
bash
POSIX shell
rg -n error|blocked|retry|fallback|auth|token|proxy|timeout|context memory tasks 2>/dev/null
失败分类
先分类,后行动:
- - syntaxorargs:命令语法/参数不匹配
- authorconfig:密钥/令牌/环境变量/配置缺失或无效
- networkorreachability:超时、DNS、握手、区域限制
- uiloginwall:页面需要手动登录/附加
- resourcelimit:上下文窗口、速率限制、内存压力
- falsedone:无产物/证据但报告完成
- unknown:无法确定类别
恢复策略(三级)
尝试1:直接修复
- - 从记忆中应用针对相同类别/特征的最佳已知修复
- 重新运行最小验证操作
- 记录结果
尝试2:安全回退
- - 切换到脆弱性较低的替代工具/路径
- 缩小范围(更小的输入、更短的查询、单一目标)
- 重新运行验证
尝试3:受控升级
- - 以最少的解锁输入标记为受阻
- 提供用户必须执行的确切下一步操作(一个命令或一个UI步骤)
- 在收到新输入前不再循环
安全规则
- - 未经确认绝不自动运行破坏性操作
- 绝不在记忆文件中记录密钥/令牌
- 每个任务每个阻塞特征最多重试3次
- 优先选择确定性步骤而非广泛的推测性重试
完成契约
除非以下所有条件成立,否则不得声称完成:
- - 至少存在一个可读的产物(文件/链接/输出)
- 原始任务目标已明确映射到产物
- 当前目标无未解决的阻塞项
必需输出块:
markdown
完成检查清单
- - 目标达成:是/否
- 产物:<路径或URL或命令输出引用>
- 验证:<已检查的内容>
- 剩余阻塞项:<无或确切的解锁输入>
记忆回写模板
每次自愈循环后追加一条简洁条目:
markdown
自愈:<日期时间> <简短任务>
- - 特征:<标准化错误特征>
- 类别:<分类>
- 尝试1:<操作> -> <结果>
- 尝试2:<操作> -> <结果>
- 最终:<成功 | 受阻>
- 产物/证据:<路径|URL|日志引用>
- 可复用规则:<一行规则>
通用已知修复(种子集)
- - Windows上命令不匹配:优先使用原生PowerShell cmdlet
- 令牌不匹配/认证失败:验证活动配置源和令牌范围
- WebSocket/超时:测试可达性 + 代理/无代理一致性
- 上下文溢出:将任务拆分为更小的单元并减少负载
- 虚假完成:在最终响应前强制执行产物验证
集成说明
- - 可与自主/任务追踪技能配合使用,但不依赖它们
- 如果项目有自定义记忆路径,动态调整扫描根目录
- 保持条目简短以避免记忆膨胀