agent-failure-loop
If the same mistake repeats three times, a rule is automatically created.
Agents lose their memory when a session ends. They make the same mistake yesterday, today, and tomorrow.
This skill builds an end-to-end self-improvement loop that automatically detects → classifies → tracks → promotes failures into rules.
Table of Contents
- 1. Why Do Agents Repeat the Same Mistakes?
- Architecture
- 5-Layer Pipeline
- Failure Type Classification
- Recording Format
- Promotion Conditions and Logic
- Installation
- Quick Start (5 Minutes)
- Cron Integration
- Before/After Demo
- Comparison with Competing Skills
- Cross-Platform Configuration
- Script Reference
- FAQ
Why Do Agents Repeat the Same Mistakes?
Structural limitations of AI agents:
| Problem | Cause | Result |
|---|
| No memory between sessions | Context window is limited to in-session | Yesterday's failure is repeated today |
| No failure records |
Logs accumulate without distinguishing success/failure | Pattern detection impossible |
|
No learning feedback | Humans repeat the same corrections | Human fatigue increases |
|
No guardrails | Past lessons are not referenced on retry | Agent falls into the same trap again |
Problems with existing approaches:
- - Manual rule addition: Humans manually write rules in AGENTS.md → tedious and frequently missed
- Conversation learning (self-improving-agent): Analyzes only conversation patterns → cannot detect execution failures → no enforcement
- Python self-improvement (actual-self-improvement): Implementation exists but → no cron integration → no auto-promotion → ultimately manual
agent-failure-loop bridges all these gaps:
CODEBLOCK0
Architecture
CODEBLOCK1
5-Layer Pipeline
Layer 0: Event Occurrence
Failure events naturally occur during the agent's normal workflow.
Detection Criteria:
| Event | Detection Method | Example |
|---|
| Tool execution failure | exit code ≠ 0, error message | INLINECODE0 failure, API 4xx/5xx |
| User correction |
"No", "redo", "that's not it", etc. | "Not that file, the one under src/" |
| Retry exceeded | Same task attempted 3+ times | Tried the same selector 3 times |
| Misunderstanding | Output doesn't match request | Generated full translation when asked for "summarize" |
What the agent should do: Record immediately to Layer 1 when the above events are detected. This behavior should be specified as a rule in AGENTS.md/CLAUDE.md.
Layer 1: Raw Recording (Immediate, Real-time)
Record in memory/failures/YYYY-MM-DD.md immediately upon failure detection.
Key Principles:
- - Record immediately after detection (not batched)
- Agent records directly (no script needed)
- One file per day, accumulated chronologically
- Structured format (see Recording Format below)
Directory Structure:
CODEBLOCK2
Layer 2: Structured Analysis (Batch)
INLINECODE2 parses failures/ and generates structured analysis results in .learnings/.
Execution timing: Cron (daily-reflection) or manual execution
Input: memory/failures/*.md
Output:
CODEBLOCK3
Processing Steps:
- 1. Parse all
.md files in failures/ in date order - Extract type, cause, and lesson from each entry
- Normalize cause text to generate pattern keys (MD5 hash)
- Group by identical pattern keys
- Patterns with 3+ repetitions → registered as promotion candidates in INLINECODE8
Layer 3: Pattern Detection + Rule Promotion (Automatic)
INLINECODE9 reads promotable.json and automatically inserts rules into the target file.
Promotion condition: Same pattern repeated 3+ times (configurable)
Target files (configurable):
- -
AGENTS.md — OpenClaw, general-purpose - INLINECODE12 — Claude Code
- INLINECODE13 — Cursor IDE
- Custom file —
--target option
Deduplication: Previously promoted pattern keys are recorded in .learnings/promoted.json to prevent duplicate insertion
Layer 4: Execution Guardrail (Pre-task Lookup)
Query similar failures before starting a new task to provide advance warnings.
Implementation method (agent rule):
CODEBLOCK4
failure-matcher.py behavior:
- 1. Load INLINECODE16
- Compare task keywords against past failure titles/causes (simple keyword matching)
- Output failure records with high similarity
- Agent reads the output and references the lessons
Note: failure-matcher.py is not provided separately. A simple implementation example is shown in Quick Start below. Using grep on sync-learnings.py output files is a sufficient alternative.
Failure Type Classification
Four failure types are defined. All failures are classified as one of these.
ERROR — Execution Error
Tool/command/API execution failed.
| Field | Description |
|---|
| Code | INLINECODE18 |
| Detection |
exit code ≠ 0, error message, exception thrown |
|
Example |
npm install failure, API 404, file not found, permission error |
|
Frequency | Most common |
CORRECTION — User Correction
User corrected the agent's output.
| Field | Description |
|---|
| Code | INLINECODE20 |
| Detection |
"No", "redo", "that's not it", "do it like this", correction instructions |
|
Example | "Not that file, the one under src/", "The format is wrong" |
|
Importance | Highest value — reflects user's implicit preferences |
RETRY_EXCEEDED — Retry Limit Exceeded
Same task attempted 3+ times.
| Field | Description |
|---|
| Code | INLINECODE21 |
| Detection |
Same/similar command executed 3+ times |
|
Example | Tried same CSS selector 3 times, called same API endpoint 3 times |
|
Meaning | Pattern of blindly retrying without addressing the root cause |
MISUNDERSTAND — Instruction Misunderstanding
Generated output that doesn't match user's intended instruction.
| Field | Description |
|---|
| Code | INLINECODE22 |
| Detection |
Mismatch between output and request, "that's not what I meant" |
|
Example | Generated "translation" when asked for "summary", confused target file |
|
Root Cause | Ambiguity in instructions or lack of context |
Recording Format
Raw Record (memory/failures/YYYY-MM-DD.md)
CODEBLOCK5
Example: Actual Records
CODEBLOCK6
Structured Analysis Output (.learnings/promotable.json)
CODEBLOCK7
Promotion Conditions and Logic
Promotion Conditions
| Condition | Value | Configurable |
|---|
| Minimum repeat count | 3 (default) | INLINECODE23 or AFL_MIN_COUNT env var |
| Same pattern determination |
Type + cause text MD5 hash | Automatic |
| Deduplication | Recorded in
.learnings/promoted.json | Automatic |
| Target file |
AGENTS.md (default) |
--target or
AFL_TARGET_FILE env var |
Promotion Process
CODEBLOCK8
Promotion Format (by Target)
AGENTS.md (agents-md):
CODEBLOCK9
CLAUDE.md (claude-md):
CODEBLOCK10
.cursorrules (cursorrules):
CODEBLOCK11
Generic (plain):
### [ERROR] Playwright selector failure
- **Rule:** Must verify selector existence with DOM dump before use
- **Count:** 3x
- **Promoted:** 2026-03-25
Installation
Zero-Config Installation (30 Seconds)
CODEBLOCK13
Add Agent Rules
Add the following rules to AGENTS.md (or CLAUDE.md, .cursorrules):
CODEBLOCK14
Environment Variables (Optional)
| Variable | Default | Description |
|---|
| INLINECODE29 | INLINECODE30 | Failure records directory |
| INLINECODE31 |
.learnings | Analysis results directory |
|
AFL_TARGET_FILE |
AGENTS.md | Rule promotion target file |
|
AFL_FORMAT |
agents-md | Promotion format |
|
AFL_MIN_COUNT |
3 | Minimum repeat count |
Quick Start (5 Minutes)
Step 1: Install (30 Seconds)
CODEBLOCK15
Step 2: Generate Test Data (1 Minute)
CODEBLOCK16
Step 3: Run Analysis (30 Seconds)
CODEBLOCK17
Expected Output:
CODEBLOCK18
Step 4: Auto-Promote (30 Seconds)
CODEBLOCK19
Expected Output:
CODEBLOCK20
Step 5: Verify (30 Seconds)
CODEBLOCK21
Done in 5 minutes! Now when the agent makes the same mistake 3 times, a rule is automatically created.
Cron Integration
daily-reflection Cron Example
Runs automatically at 23:00 daily to analyze the day's failures and promote rules.
OpenClaw Cron Configuration:
CODEBLOCK22
Standard crontab Configuration:
CODEBLOCK23
weekly-skill-review Integration
Register repeated tasks as skill candidates during weekly review:
CODEBLOCK24
Real-time + Batch Hybrid
Real-time (agent directly):
- - Layer 0 → Layer 1: Record in
failures/ immediately upon failure detection - When same type reaches 3 occurrences, immediately add rule to AGENTS.md
Batch (cron):
- - Layer 1 → Layer 2: Structured analysis via INLINECODE40
- Layer 2 → Layer 3: Handle missed promotions via INLINECODE41
- Catches patterns missed by the agent's real-time detection as a second safety net
Before/After Demo
Before: Without Rules
CODEBLOCK25
Problem: Same mistake every time. No learning. User frustration ↑
After: With agent-failure-loop Applied
CODEBLOCK26
Result: No recurrence of the same mistake from Day 4 onward. Auto-learning complete.
Actual Auto-Promotion Simulation
CODEBLOCK27
Comparison with Competing Skills
| Feature | agent-failure-loop | self-improving-agent | actual-self-improvement |
|---|
| Auto failure detection | ✅ 4-type classification | ❌ Conversation patterns only | ⚠️ Manual trigger |
| Immediate recording |
✅ Layer 1 real-time | ❌ After session ends | ❌ Manual |
|
Structured analysis | ✅ sync-learnings.py | ❌ | ⚠️ Python available |
|
Auto promotion | ✅ 3x repeat → automatic | ❌ Manual rule addition | ❌ |
|
Cron integration | ✅ daily-reflection | ❌ | ❌ |
|
Guardrails | ✅ Pre-task lookup | ❌ | ❌ |
|
Multi-platform | ✅ OpenClaw/Claude/Codex/Cursor | ⚠️ ChatGPT-centric | ⚠️ Python only |
|
Target file config | ✅ AGENTS/CLAUDE/cursorrules/custom | ❌ Fixed | ❌ |
|
Deduplication | ✅ promoted.json | ❌ | ❌ |
|
Zero-config | ✅ Python 3.8+ stdlib only | ⚠️ npm required | ⚠️ pip required |
|
Enforcement | ✅ Rule promotion = agent behavior change | ❌ Suggestions only | ❌ |
|
Skill extraction integration | ✅ Repeated patterns → skill candidates | ❌ | ❌ |
Why agent-failure-loop?
- 1. End-to-end: Covers the entire pipeline from detection to promotion
- Enforcement: Promoted rules go into AGENTS.md/CLAUDE.md, which the agent must read
- Automation: Combined with cron, the self-improvement loop runs without human intervention
- Cross-platform: Not tied to any specific platform
- Transparency: All failures and promotion processes are preserved as markdown files for auditing
Cross-Platform Configuration
Platform-Specific Configuration Examples
OpenClaw:
CODEBLOCK28
Claude Code:
CODEBLOCK29
Cursor IDE:
CODEBLOCK30
Codex / Others:
CODEBLOCK31
Custom Configuration File
Place .failure-loop.json at the project root to manage configuration via file instead of environment variables:
CODEBLOCK32
Note: The current version of scripts supports environment variables and CLI arguments. .failure-loop.json support is planned for a future version.
AGENTS.md Format Independence
This skill does not depend on a specific AGENTS.md format:
- -
--format agents-md: Adds rows to the "self-improvement rules" table in AGENTS.md. If the table doesn't exist, appends to end of file. - INLINECODE45 : Can append to any markdown file
- INLINECODE46 : Can specify any file
Script Reference
sync-learnings.py
Parses raw failure records from the failures/ directory and generates structured analysis results in .learnings/.
Usage:
CODEBLOCK33
Options:
| Option | Default | Description |
|---|
| INLINECODE49 | INLINECODE50 | Failure records directory |
| INLINECODE51 |
.learnings | Analysis results output directory |
|
--dry-run | - | Preview without writing files |
|
--json | - | Output summary in JSON format |
Output Files:
- -
summary.json — Overall statistics - INLINECODE56 — Repeated pattern analysis
- INLINECODE57 — Promotion candidate list
- INLINECODE58 — Details by type
Dependencies: Python 3.8+ standard library only (hashlib, json, re, pathlib, etc.)
auto-promote.py
Reads promotion candidates from .learnings/promotable.json and automatically inserts rules into the target file.
Usage:
CODEBLOCK34
Options:
| Option | Default | Description |
|---|
| INLINECODE60 | INLINECODE61 | Analysis results directory |
| INLINECODE62 |
AGENTS.md | Promotion target file |
|
--format |
agents-md | Output format (agents-md/claude-md/cursorrules/plain) |
|
--min-count |
3 | Minimum repeat count |
|
--dry-run | - | Preview without modifying files |
|
--force | - | Re-promote already promoted patterns |
Dependencies: Python 3.8+ standard library only
FAQ
Q: What if the agent doesn't record failures?
You need to add failure recording rules to AGENTS.md/CLAUDE.md. See "Add Agent Rules" in the Installation section. With the rules in place, the agent will automatically record upon failure detection. If the agent ignores the rules... that itself will be recorded as a CORRECTION.
Q: Won't the same rule be promoted twice?
INLINECODE71 records already-promoted pattern keys to prevent duplication. Forced re-promotion is possible with the --force option.
Q: If cause text is slightly different, will it be recognized as a different pattern?
The current version distinguishes patterns using MD5 hash of the cause text. Whitespace and case are normalized, but semantically identical causes with different wording will be recognized as separate patterns. It's recommended to specify in the rules that the agent should record causes with consistent wording.
Future improvement: Semantic similarity (embedding comparison) support planned.
Q: Can it be used in environments without Python?
Even without scripts, the agent can directly perform Layer 1 (recording) and Layer 3 (promotion). Scripts serve as a double safety net for batch analysis (Layer 2) and auto-promotion. The basic loop works with the agent's real-time detection alone.
Q: What if failure records accumulate too much?
Since sync-learnings.py generates summaries after analysis, raw records can be archived:
CODEBLOCK35
Q: Can it be shared across a team?
Committing the .learnings/ directory to git allows the entire team to share learning results. promoted.json prevents duplicate promotions, so it's safe for multiple people to use simultaneously.
Q: How do I use it with Claude Code without OpenClaw?
- 1. Add failure recording rules to
CLAUDE.md (see Installation) - Use
--target CLAUDE.md --format claude-md when running scripts - Use manual execution or OS crontab instead of cron
Q: Is it compatible with existing AGENTS.md self-improvement rules?
Fully compatible. auto-promote.py finds the "self-improvement rules" table in AGENTS.md and adds rows. If the table doesn't exist, it appends to the end of the file.
Q: Can I write failure records directly to .learnings/?
No. failures/ is raw data, .learnings/ is analysis results. The agent records only in failures/, and .learnings/ is auto-generated by sync-learnings.py. This separation ensures data integrity.
Q: What if a promoted rule is wrong?
Manually delete the rule from AGENTS.md. The pattern key remains in promoted.json, so the same rule won't be promoted again. To re-promote, use the --force option or delete the key from promoted.json.
Full Directory Structure
CODEBLOCK36
Production Usage Evidence
This skill has been validated in a real production environment. A significant number of the 20+ rules in AGENTS.md's "self-improvement rules" table were auto-generated through this pipeline:
- -
Must check environment before writing scripts — Auto-promoted from [ERROR] 3 consecutive failures - INLINECODE88 — Auto-promoted from [RETRY_EXCEEDED] 6 attempts
- INLINECODE89 — Auto-promoted from [ERROR] multiple repeats
After these rules were promoted, the recurrence rate of the same failure types decreased significantly.
License
MIT License. Free to use, modify, and distribute.
agent-failure-loop v1.0.0 — Building agents that learn from failure.