Reflexion
Closed-loop learning for AI coding agents. Inspired by Reflexion: Language Agents with Verbal Reinforcement Learning.
"Reflexion agents verbally reflect on task feedback signals, then maintain their own reflective text in an episodic memory buffer to induce better decision-making in subsequent trials."
— Shinn et al., 2023
The problem: AI agents repeat the same mistakes across sessions. They don't learn from errors, don't remember corrections, and every new conversation starts from zero.
The solution: A capture-recall-promote loop that closes the feedback gap.
CODEBLOCK0
Quick Reference
| Situation | What Happens |
|---|
| Command fails | INLINECODE0 auto-logs error + context to INLINECODE1 |
| User corrects agent |
Agent calls
capture.sh with correction details |
| Similar prompt later |
recall.sh finds matching entries, injects solutions into context |
| Pattern seen 3+ times |
promote.sh auto-appends a concise rule to
CLAUDE.md |
| Want to see stats | Run
./scripts/status.sh for learning dashboard |
Install
Claude Code (recommended)
CODEBLOCK1
Add hooks to .claude/settings.json:
CODEBLOCK2
First run
The scripts auto-initialize on first use. No setup needed. To manually initialize:
CODEBLOCK3
How It Works
1. Capture (automatic)
The capture.sh hook fires after every Bash tool use. It reads the tool output from stdin (JSON), detects errors via pattern matching, and stores structured entries:
CODEBLOCK4
When the agent (or user) resolves the error, the agent should update the entry:
CODEBLOCK5
2. Recall (automatic)
The recall.sh hook fires before every user prompt. It extracts keywords from the prompt, searches the entry index, and injects relevant past learnings:
CODEBLOCK6
This costs ~50-80 tokens when matches exist, zero when they don't.
3. Promote (automatic)
When an entry hits 3+ occurrences, promote.sh appends a concise rule to CLAUDE.md:
CODEBLOCK7
Promoted entries are marked "promoted": true and stop being injected via recall (the rule is now in CLAUDE.md permanently).
4. Verify (agent-driven)
After the agent applies a recalled solution, it should verify and update:
- - Worked: Increment
occurrences, update INLINECODE14 - Failed: Add note to entry, flag for review, decrement confidence
This step is agent-driven (via prompt instruction), not hook-automated, to avoid false positives.
Entry Types
| Type | Trigger | Example |
|---|
| INLINECODE15 | Command failure detected by hook | INLINECODE16 , Permission denied, INLINECODE18 |
| INLINECODE19 |
User says "no", "actually", "wrong" | "Actually use pnpm, not npm" |
|
insight | Non-obvious solution discovered | "Must run codegen after API changes" |
|
pattern | Recurring approach that works | "Always check auth status before git push" |
Data Format
Entries live in .reflexion/entries/ as individual JSON files (one per learning). This enables:
- - Fast grep-based search (no parsing a giant markdown file)
- Atomic writes (no corruption from concurrent access)
- Easy manual editing
- Git-friendly diffs
The keyword index at .reflexion/index.txt maps keywords to entry IDs for fast recall:
CODEBLOCK8
Promotion Rules
An entry is auto-promoted to CLAUDE.md when ALL conditions are met:
- 1. INLINECODE24
- INLINECODE25 is non-empty (the fix is known)
- INLINECODE26 is
false (not already promoted) - Entry is older than 1 day (not a flurry of the same error in one session)
Promoted rules are written as short, actionable directives. Not incident reports.
Agent Instructions
When this skill is active, follow these behaviors:
On Error
- 1. Check if
capture.sh already logged it (it runs automatically on Bash errors) - If you resolve the error, update the entry's
resolution field - If the error matches a recalled learning, say so and apply the known fix
On User Correction
Log a
correction entry manually:
cat > .reflexion/entries/RFX-$(date +%Y%m%d)-$(head -c3 /dev/urandom | xxd -p | head -c3).json << 'ENTRY'
{
"id": "RFX-...",
"type": "correction",
"trigger": "user said: actually use pnpm",
"context": "attempted npm install",
"resolution": "this project uses pnpm, not npm",
"keywords": ["npm", "pnpm", "install", "package-manager"],
"occurrences": 1,
"first_seen": "2026-03-31",
"last_seen": "2026-03-31",
"promoted": false
}
ENTRY
Then rebuild the index: INLINECODE31
On Recall
When
<reflexion-recall> context appears in the prompt:
- 1. Read the recalled learnings
- Apply the known resolution if relevant
- If the resolution works, increment occurrences
- If it doesn't apply, ignore it (no penalty)
Before Major Tasks
Run
./scripts/status.sh to see if there are relevant learnings for the area you're about to work in.
Security
- - Never log secrets, tokens, API keys, or credentials in entries
- The
capture.sh script redacts common secret patterns (Bearer tokens, API keys, passwords) - INLINECODE35 should be in
.gitignore for private projects - For team projects, committing
.reflexion/ creates shared learning (opt-in)
Comparison
| Feature | self-improving-agent | OMC auto-learner | reflexion |
|---|
| Auto-capture errors | Hook reminder only | Pattern detection | Hook + auto-parse + store |
| Structured storage |
Markdown append | Content hash dedup |
JSON entries + keyword index |
| Cross-session recall | None | None |
Auto keyword match + inject |
| Auto-promote to CLAUDE.md | Manual | Manual |
Auto at 3 occurrences |
| Token overhead | ~70 tokens always | Variable |
0 tokens when no match, ~60 on match |
| Correction capture | Reminder to log | Confidence scoring |
Structured entry with resolution |
| Works offline | Yes | Yes |
Yes |
| Dependencies | bash | TypeScript + npm |
bash + grep (zero deps) |
File Structure
CODEBLOCK10
Citation
This skill implements the core feedback loop from:
CODEBLOCK11
The paper showed that language agents reflecting on past failures in an episodic memory buffer significantly outperform base agents — achieving 91% pass@1 on HumanEval vs GPT-4's 80%. This skill adapts that principle for AI coding agents: instead of weight updates, it stores verbal reflections (error entries with resolutions) and retrieves them when similar situations arise.
反思
面向AI编码代理的闭环学习。灵感来源于《反思:具备语言强化学习的语言代理》。
反思代理会对任务反馈信号进行语言层面的反思,然后在情景记忆缓冲区中维护自身的反思文本,以便在后续试验中做出更优决策。
— Shinn 等人,2023
问题:AI代理在不同会话中重复犯同样的错误。它们不会从错误中学习,不会记住修正,每次新的对话都从零开始。
解决方案:一个弥补反馈鸿沟的捕获-召回-提升循环。
错误/修正发生
|
[捕获] -----> .reflexion/entries/
| |
下一个相似任务 [索引] 关键词
| |
[召回] <--- 关键词匹配提示词
|
将过往解决方案注入上下文
|
[验证] 是否有效?
| |
是 否 ---> 更新条目,标记待审查
|
出现次数 >= 3?
| |
是 否 ---> 增加计数器
|
[提升] 将规则追加到 CLAUDE.md
快速参考
| 场景 | 发生什么 |
|---|
| 命令失败 | capture.sh 自动将错误+上下文记录到 .reflexion/entries/ |
| 用户纠正代理 |
代理调用 capture.sh 并附带修正详情 |
| 后续出现相似提示 | recall.sh 找到匹配条目,将解决方案注入上下文 |
| 模式出现3次以上 | promote.sh 自动将简洁规则追加到 CLAUDE.md |
| 查看统计信息 | 运行 ./scripts/status.sh 查看学习仪表盘 |
安装
Claude Code(推荐)
bash
克隆到你的项目或全局技能目录
git clone https://github.com/user/reflexion.git .claude/skills/reflexion
或复制到现有技能目录
cp -r reflexion/ ~/.claude/skills/reflexion
将钩子添加到 .claude/settings.json:
json
{
hooks: {
PostToolUse: [
{
matcher: Bash,
hooks: [
{
type: command,
command: ./.claude/skills/reflexion/scripts/capture.sh
}
]
}
],
UserPromptSubmit: [
{
matcher: ,
hooks: [
{
type: command,
command: ./.claude/skills/reflexion/scripts/recall.sh
}
]
}
]
}
}
首次运行
脚本在首次使用时自动初始化。无需设置。如需手动初始化:
bash
./scripts/init.sh
工作原理
1. 捕获(自动)
capture.sh 钩子在每次Bash工具使用后触发。它从标准输入读取工具输出(JSON格式),通过模式匹配检测错误,并存储结构化条目:
json
{
id: RFX-20260331-a7f,
type: error,
trigger: npm ERR! Missing script: \build\,
context: npm run build,
resolution: ,
keywords: [npm, build, missing, script],
occurrences: 1,
first_seen: 2026-03-31,
last_seen: 2026-03-31,
promoted: false,
cwd: /home/user/project
}
当代理(或用户)解决错误后,代理应更新该条目:
使用以下解决方案更新 .reflexion/entries/RFX-20260331-a7f.json:
使用 pnpm run build - 该项目使用 pnpm,而非 npm
2. 召回(自动)
recall.sh 钩子在每次用户提示前触发。它从提示中提取关键词,搜索条目索引,并注入相关的过往学习内容:
xml
过往学习 [RFX-20260331-a7f](已见2次):
触发条件:npm ERR! Missing script: build
解决方案:使用 pnpm run build - 该项目使用 pnpm,而非 npm
关键词:npm, build, missing, script
当存在匹配时,这大约消耗50-80个token,无匹配时则为零。
3. 提升(自动)
当某个条目出现3次以上时,promote.sh 会将一条简洁规则追加到 CLAUDE.md:
markdown
反思:已学习规则
- - 该项目使用 pnpm,而非 npm。始终使用 pnpm run 命令。(已见3次,来源:RFX-20260331-a7f)
已提升的条目会被标记为 promoted: true,并停止通过召回注入(该规则现已永久存在于 CLAUDE.md 中)。
4. 验证(代理驱动)
代理应用召回的解决方案后,应进行验证并更新:
- - 有效:增加 occurrences,更新 last_seen
- 无效:在条目中添加备注,标记待审查,降低置信度
此步骤由代理驱动(通过提示指令),而非钩子自动化,以避免误报。
条目类型
| 类型 | 触发条件 | 示例 |
|---|
| error | 钩子检测到命令失败 | npm ERR!、Permission denied、ModuleNotFoundError |
| correction |
用户说不、实际上、错了 | 实际上使用 pnpm,而非 npm |
| insight | 发现非显而易见的解决方案 | API变更后必须运行代码生成 |
| pattern | 反复有效的处理方法 | git push前始终检查认证状态 |
数据格式
条目以独立JSON文件形式存储在 .reflexion/entries/ 中(每个学习内容一个文件)。这实现了:
- - 快速的基于grep的搜索(无需解析大型markdown文件)
- 原子写入(不会因并发访问导致损坏)
- 轻松的手动编辑
- 对Git友好的差异对比
位于 .reflexion/index.txt 的关键词索引将关键词映射到条目ID,以实现快速召回:
npm:RFX-20260331-a7f,RFX-20260401-b2c
build:RFX-20260331-a7f
pnpm:RFX-20260331-a7f,RFX-20260401-b2c
docker:RFX-20260402-c1d
提升规则
当满足所有条件时,条目会自动提升到 CLAUDE.md:
- 1. occurrences >= 3
- resolution 不为空(已知修复方案)
- promoted 为 false(尚未提升)
- 条目存在超过1天(非同一会话中相同错误的爆发)
提升的规则以简短、可操作的指令形式写入。而非事件报告。
代理指令
当此技能激活时,请遵循以下行为:
遇到错误时
- 1. 检查 capture.sh 是否已记录该错误(它在Bash错误时自动运行)
- 如果你解决了错误,更新条目的 resolution 字段
- 如果错误匹配某个已召回的学习内容,说明情况并应用已知修复
遇到用户纠正时
手动记录一个 correction 条目:
bash
cat > .reflexion/entries/RFX-$(date +%Y%m%d)-$(head -c3 /dev/urandom | xxd -p | head -c3).json << ENTRY
{
id: RFX-...,
type: correction,
trigger: 用户说:实际上使用 pnpm,
context: 尝试了 npm install,
resolution: 该项目使用 pnpm,而非 npm,
keywords: [npm, pnpm, install, package-manager],
occurrences: 1,
first_seen: 2026-03-31,
last_seen: 2026-03-31,
promoted: false
}
ENTRY
然后重建索引:./scripts/rebuild-index.sh
遇到召回时
当提示中出现
上下文时:
- 1. 读取召回的学习内容
- 如果相关,应用已知的解决方案
- 如果解决方案有效,增加出现次数
- 如果不适用,忽略即可(无惩罚)
在重大任务前
运行 ./scripts/status.sh 查看是否有与你即将工作的领域相关的学习内容。
安全性