agent-failure-loop

If the same mistake repeats three times, a rule is automatically created.

Agents lose their memory when a session ends. They make the same mistake yesterday, today, and tomorrow.
This skill builds an end-to-end self-improvement loop that automatically detects → classifies → tracks → promotes failures into rules.

1. Why Do Agents Repeat the Same Mistakes?
Architecture
5-Layer Pipeline
Failure Type Classification
Recording Format
Promotion Conditions and Logic
Installation
Quick Start (5 Minutes)
Cron Integration
Before/After Demo
Comparison with Competing Skills
Cross-Platform Configuration
Script Reference
FAQ

Why Do Agents Repeat the Same Mistakes?

Structural limitations of AI agents:

Problem	Cause	Result
No memory between sessions	Context window is limited to in-session	Yesterday's failure is repeated today
No failure records

Problems with existing approaches:

- Manual rule addition: Humans manually write rules in AGENTS.md → tedious and frequently missed
Conversation learning (self-improving-agent): Analyzes only conversation patterns → cannot detect execution failures → no enforcement
Python self-improvement (actual-self-improvement): Implementation exists but → no cron integration → no auto-promotion → ultimately manual

agent-failure-loop bridges all these gaps:

CODEBLOCK0

Architecture

CODEBLOCK1

5-Layer Pipeline

Layer 0: Event Occurrence

Failure events naturally occur during the agent's normal workflow.

Detection Criteria:

Event	Detection Method	Example
Tool execution failure	exit code ≠ 0, error message	INLINECODE0 failure, API 4xx/5xx
User correction

What the agent should do: Record immediately to Layer 1 when the above events are detected. This behavior should be specified as a rule in AGENTS.md/CLAUDE.md.

Layer 1: Raw Recording (Immediate, Real-time)

Record in memory/failures/YYYY-MM-DD.md immediately upon failure detection.

Key Principles:

- Record immediately after detection (not batched)
Agent records directly (no script needed)
One file per day, accumulated chronologically
Structured format (see Recording Format below)

Directory Structure:
CODEBLOCK2

Layer 2: Structured Analysis (Batch)

INLINECODE2 parses failures/ and generates structured analysis results in .learnings/.

Execution timing: Cron (daily-reflection) or manual execution

Input: memory/failures/*.md
Output:
CODEBLOCK3

Processing Steps:

1. Parse all .md files in failures/ in date order
Extract type, cause, and lesson from each entry
Normalize cause text to generate pattern keys (MD5 hash)
Group by identical pattern keys
Patterns with 3+ repetitions → registered as promotion candidates in INLINECODE8

Layer 3: Pattern Detection + Rule Promotion (Automatic)

INLINECODE9 reads promotable.json and automatically inserts rules into the target file.

Promotion condition: Same pattern repeated 3+ times (configurable)

Target files (configurable):

- AGENTS.md — OpenClaw, general-purpose
INLINECODE12 — Claude Code
INLINECODE13 — Cursor IDE
Custom file — --target option

Deduplication: Previously promoted pattern keys are recorded in .learnings/promoted.json to prevent duplicate insertion

Layer 4: Execution Guardrail (Pre-task Lookup)

Query similar failures before starting a new task to provide advance warnings.

Implementation method (agent rule):
CODEBLOCK4

failure-matcher.py behavior:

1. Load INLINECODE16
Compare task keywords against past failure titles/causes (simple keyword matching)
Output failure records with high similarity
Agent reads the output and references the lessons

Note: failure-matcher.py is not provided separately. A simple implementation example is shown in Quick Start below. Using grep on sync-learnings.py output files is a sufficient alternative.

Failure Type Classification

Four failure types are defined. All failures are classified as one of these.

ERROR — Execution Error

Tool/command/API execution failed.

Field	Description
Code	INLINECODE18
Detection

CORRECTION — User Correction

User corrected the agent's output.

Field	Description
Code	INLINECODE20
Detection

"No", "redo", "that's not it", "do it like this", correction instructions |
| Example | "Not that file, the one under src/", "The format is wrong" |
| Importance | Highest value — reflects user's implicit preferences |

RETRY_EXCEEDED — Retry Limit Exceeded

Same task attempted 3+ times.

Field	Description
Code	INLINECODE21
Detection

MISUNDERSTAND — Instruction Misunderstanding

Generated output that doesn't match user's intended instruction.

Field	Description
Code	INLINECODE22
Detection

Recording Format

Raw Record (memory/failures/YYYY-MM-DD.md)

CODEBLOCK5

Example: Actual Records

CODEBLOCK6

Structured Analysis Output (.learnings/promotable.json)

CODEBLOCK7

Promotion Conditions and Logic

Promotion Conditions

Condition	Value	Configurable
Minimum repeat count	3 (default)	INLINECODE23 or `AFL_MIN_COUNT` env var
Same pattern determination

Promotion Process

CODEBLOCK8

Promotion Format (by Target)

AGENTS.md (agents-md):
CODEBLOCK9

CLAUDE.md (claude-md):
CODEBLOCK10

.cursorrules (cursorrules):
CODEBLOCK11

Generic (plain):

### [ERROR] Playwright selector failure
- **Rule:** Must verify selector existence with DOM dump before use
- **Count:** 3x
- **Promoted:** 2026-03-25

Installation

Zero-Config Installation (30 Seconds)

CODEBLOCK13

Add Agent Rules

Add the following rules to AGENTS.md (or CLAUDE.md, .cursorrules):

CODEBLOCK14

Environment Variables (Optional)

Variable	Default	Description
INLINECODE29	INLINECODE30	Failure records directory
INLINECODE31

Quick Start (5 Minutes)

Step 1: Install (30 Seconds)

CODEBLOCK15

Step 2: Generate Test Data (1 Minute)

CODEBLOCK16

Step 3: Run Analysis (30 Seconds)

CODEBLOCK17

Expected Output:
CODEBLOCK18

Step 4: Auto-Promote (30 Seconds)

CODEBLOCK19

Expected Output:
CODEBLOCK20

Step 5: Verify (30 Seconds)

CODEBLOCK21

Done in 5 minutes! Now when the agent makes the same mistake 3 times, a rule is automatically created.

Cron Integration

daily-reflection Cron Example

Runs automatically at 23:00 daily to analyze the day's failures and promote rules.

OpenClaw Cron Configuration:

CODEBLOCK22

Standard crontab Configuration:

CODEBLOCK23

weekly-skill-review Integration

CODEBLOCK24

Real-time + Batch Hybrid

Real-time (agent directly):

- Layer 0 → Layer 1: Record in failures/ immediately upon failure detection
When same type reaches 3 occurrences, immediately add rule to AGENTS.md

Batch (cron):

- Layer 1 → Layer 2: Structured analysis via INLINECODE40
Layer 2 → Layer 3: Handle missed promotions via INLINECODE41
Catches patterns missed by the agent's real-time detection as a second safety net

Before/After Demo

Before: Without Rules

CODEBLOCK25

Problem: Same mistake every time. No learning. User frustration ↑

After: With agent-failure-loop Applied

CODEBLOCK26

Result: No recurrence of the same mistake from Day 4 onward. Auto-learning complete.

Actual Auto-Promotion Simulation

CODEBLOCK27

Comparison with Competing Skills

Feature	agent-failure-loop	self-improving-agent	actual-self-improvement
Auto failure detection	✅ 4-type classification	❌ Conversation patterns only	⚠️ Manual trigger
Immediate recording

Why agent-failure-loop?

1. End-to-end: Covers the entire pipeline from detection to promotion
Enforcement: Promoted rules go into AGENTS.md/CLAUDE.md, which the agent must read
Automation: Combined with cron, the self-improvement loop runs without human intervention
Cross-platform: Not tied to any specific platform
Transparency: All failures and promotion processes are preserved as markdown files for auditing

Cross-Platform Configuration

Platform-Specific Configuration Examples

OpenClaw:
CODEBLOCK28

Claude Code:
CODEBLOCK29

Cursor IDE:
CODEBLOCK30

Codex / Others:
CODEBLOCK31

Custom Configuration File

Place .failure-loop.json at the project root to manage configuration via file instead of environment variables:

CODEBLOCK32

Note: The current version of scripts supports environment variables and CLI arguments. .failure-loop.json support is planned for a future version.

AGENTS.md Format Independence

This skill does not depend on a specific AGENTS.md format:

- --format agents-md: Adds rows to the "self-improvement rules" table in AGENTS.md. If the table doesn't exist, appends to end of file.
INLINECODE45: Can append to any markdown file
INLINECODE46: Can specify any file

Script Reference

sync-learnings.py

Parses raw failure records from the failures/ directory and generates structured analysis results in .learnings/.

Usage:
CODEBLOCK33

Options:

Option	Default	Description
INLINECODE49	INLINECODE50	Failure records directory
INLINECODE51

Output Files:

- summary.json — Overall statistics
INLINECODE56 — Repeated pattern analysis
INLINECODE57 — Promotion candidate list
INLINECODE58 — Details by type

Dependencies: Python 3.8+ standard library only (hashlib, json, re, pathlib, etc.)

auto-promote.py

Reads promotion candidates from .learnings/promotable.json and automatically inserts rules into the target file.

Usage:
CODEBLOCK34

Options:

Option	Default	Description
INLINECODE60	INLINECODE61	Analysis results directory
INLINECODE62

Dependencies: Python 3.8+ standard library only

FAQ

Q: What if the agent doesn't record failures?

You need to add failure recording rules to AGENTS.md/CLAUDE.md. See "Add Agent Rules" in the Installation section. With the rules in place, the agent will automatically record upon failure detection. If the agent ignores the rules... that itself will be recorded as a CORRECTION.

Q: Won't the same rule be promoted twice?

INLINECODE71 records already-promoted pattern keys to prevent duplication. Forced re-promotion is possible with the --force option.

Q: If cause text is slightly different, will it be recognized as a different pattern?

The current version distinguishes patterns using MD5 hash of the cause text. Whitespace and case are normalized, but semantically identical causes with different wording will be recognized as separate patterns. It's recommended to specify in the rules that the agent should record causes with consistent wording.

Future improvement: Semantic similarity (embedding comparison) support planned.

Q: Can it be used in environments without Python?

Even without scripts, the agent can directly perform Layer 1 (recording) and Layer 3 (promotion). Scripts serve as a double safety net for batch analysis (Layer 2) and auto-promotion. The basic loop works with the agent's real-time detection alone.

Q: What if failure records accumulate too much?

Since sync-learnings.py generates summaries after analysis, raw records can be archived:
CODEBLOCK35

Q: Can it be shared across a team?

Committing the .learnings/ directory to git allows the entire team to share learning results. promoted.json prevents duplicate promotions, so it's safe for multiple people to use simultaneously.

Q: How do I use it with Claude Code without OpenClaw?

1. Add failure recording rules to CLAUDE.md (see Installation)
Use --target CLAUDE.md --format claude-md when running scripts
Use manual execution or OS crontab instead of cron

Q: Is it compatible with existing AGENTS.md self-improvement rules?

Fully compatible. auto-promote.py finds the "self-improvement rules" table in AGENTS.md and adds rows. If the table doesn't exist, it appends to the end of the file.

Q: Can I write failure records directly to .learnings/?

No. failures/ is raw data, .learnings/ is analysis results. The agent records only in failures/, and .learnings/ is auto-generated by sync-learnings.py. This separation ensures data integrity.

Q: What if a promoted rule is wrong?

Manually delete the rule from AGENTS.md. The pattern key remains in promoted.json, so the same rule won't be promoted again. To re-promote, use the --force option or delete the key from promoted.json.

Full Directory Structure

CODEBLOCK36

Production Usage Evidence

This skill has been validated in a real production environment. A significant number of the 20+ rules in AGENTS.md's "self-improvement rules" table were auto-generated through this pipeline:

- Must check environment before writing scripts — Auto-promoted from [ERROR] 3 consecutive failures
INLINECODE88 — Auto-promoted from [RETRY_EXCEEDED] 6 attempts
INLINECODE89 — Auto-promoted from [ERROR] multiple repeats

After these rules were promoted, the recurrence rate of the same failure types decreased significantly.

License

MIT License. Free to use, modify, and distribute.

agent-failure-loop v1.0.0 — Building agents that learn from failure.

agent-failure-loop

如果同一个错误重复三次，就会自动创建一条规则。

智能体在会话结束时就会失去记忆。它们昨天、今天和明天都会犯同样的错误。
这项技能构建了一个端到端的自我改进循环，能够自动检测 → 分类 → 追踪 → 升级失败为规则。

为什么智能体会重复犯同样的错误？

AI智能体的结构性限制：

问题	原因	结果
会话间无记忆	上下文窗口仅限于会话内	昨天的失败今天重复
无失败记录

现有方法的问题：

- 手动添加规则：人类在AGENTS.md中手动编写规则 → 繁琐且经常遗漏
对话学习（self-improving-agent）：仅分析对话模式 → 无法检测执行失败 → 无强制力
Python自我改进（actual-self-improvement）：有实现但 → 无cron集成 → 无自动升级 → 最终仍需手动

agent-failure-loop 弥补了所有这些差距：

失败发生 → 立即记录 → 批量分析 → 3次重复检测 → 自动规则升级 → 任务前查询
↑ |
└──────────────────── 防护栏防止复发 ───────────────────────────┘

架构

┌─────────────────────────────────────────────────────────────────────┐
│ agent-failure-loop 架构 │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 第4层：防护栏 ──────────────────────────────────────────────── │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ 开始任务前 → failure-matcher.py → 查询 │ │
│ │ 相似失败记录 │ │
│ │ 此任务之前失败3次。原因：X。教训：Y │ │
│ └────────────────────────────┬────────────────────────────────┘ │
│ │ 查询 │
│ 第3层：模式 + 升级 ──────────┼────────────────────────────────── │
│ ┌────────────────────────────┴────────────────────────────────┐ │
│ │ auto-promote.py │ │
│ │ .learnings/promotable.json → 3次以上 → AGENTS.md │ │
│ │ ┌──────────┐ ┌───────────┐ ┌────────────────────────┐ │ │
│ │ │ 模式检测 │──▶│ ≥3检查 │──▶│ 插入规则到目标文件 │ │ │
│ │ │ │ │ │ │ (AGENTS/CLAUDE/cursor) │ │ │
│ │ └──────────┘ └───────────┘ └────────────────────────┘ │ │
│ └────────────────────────────┬────────────────────────────────┘ │
│ │ 输入 │
│ 第2层：结构化分析 ───────────┼────────────────────────────────── │
│ ┌────────────────────────────┴────────────────────────────────┐ │
│ │ sync-learnings.py │ │
│ │ failures/*.md → 解析 → 分组 → .learnings/ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌─────────────────────────┐ │ │
│ │ │ 解析条目 │──▶│ 分组模式 │──▶│ summary.json │ │ │
│ │ │ │ │ │ │ repeated-patterns.md │ │ │
│ │ └──────────┘ └──────────┘ │ by-type/*.md │ │ │
│ │ │ promotable.json │ │ │
│ │ └─────────────────────────┘ │ │
│ └────────────────────────────┬────────────────────────────────┘ │
│ │ 输入 │
│ 第1层：原始记录 ─────────────┼────────────────────────────────── │
│ ┌────────────────────────────┴────────────────────────────────┐ │
│ │ 智能体检测到失败时立即记录 │ │
│ │ (实时) │ │
│ │ │ │
│ │ memory/failures/ │ │
│ │ ├── 2026-03-24.md ← 按日期记录的原始数据 │ │
│ │ ├── 2026-03-25.md │ │
│ │ └── 2026-03-26.md │ │
│ └────────────────────────────┬────────────────────────────────┘ │
│ │ 触发 │
│ 第0层：事件 ─────────────────┴────────────────────────────────── │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ 失败事件发生 │ │
│ │ │ │
│ │ ┌─────────┐ ┌────────────┐ ┌───────────────┐ ┌──────┐ │ │
│ │ │ 错误 │ │ 用户纠正 │ │ 重试超限 │ │ 误解 │ │ │
│ │ │ 执行错误 │ │ 用户修正 │ │ 重试次数上限 │ │ 误解 │ │ │
│ │ └─────────┘ └────────────┘ └───────────────┘ └──────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

5层流水线

第0层：事件发生

失败事件在智能体的正常工作流程中自然发生。

检测标准：

事件	检测方法	示例
工具执行失败	退出码 ≠ 0，错误消息	npm install 失败，API 4xx/5xx
用户纠正

智能体应做的操作： 检测到上述事件时立即记录到第1层。此行为应在AGENTS.md/CLAUDE.md中作为规则指定。

第1层：原始记录（即时，实时）

检测到失败时立即记录到 memory/failures/YYYY-MM-DD.md。

关键原则：

- 检测后立即记录（非批量）
智能体直接记录（无需脚本）
每天一个文件，按时间顺序累积
结构化格式（见下方记录格式）

目录结构：

memory/failures/
├── 2026-03-24.md
├── 2026-03-25.md
└── 2026-03-26.md

第2层：结构化分析（批量）

sync-learnings.py 解析 failures/ 并在 .learnings/ 中生成结构化分析结果。

执行时机： Cron（每日反思）或手动执行

输入： memory/failures/*.md
输出：

.learnings/
├── summary.json ← 总体统计（机器可读）
├── repeated-patterns.md ← 重复模式分析（人类可读）
├── promotable.json ← 升级候选列表（自动升级输入）
└── by-type/
├── error.md
├── correction.md
├── retry_exceeded.md

agent-failure-loop代理故障循环