Reflexion

Closed-loop learning for AI coding agents. Inspired by Reflexion: Language Agents with Verbal Reinforcement Learning.

"Reflexion agents verbally reflect on task feedback signals, then maintain their own reflective text in an episodic memory buffer to induce better decision-making in subsequent trials."
— Shinn et al., 2023

The problem: AI agents repeat the same mistakes across sessions. They don't learn from errors, don't remember corrections, and every new conversation starts from zero.

The solution: A capture-recall-promote loop that closes the feedback gap.

CODEBLOCK0

Quick Reference

Situation	What Happens
Command fails	INLINECODE0 auto-logs error + context to INLINECODE1
User corrects agent

Install

Claude Code (recommended)

CODEBLOCK1

Add hooks to .claude/settings.json:

CODEBLOCK2

First run

The scripts auto-initialize on first use. No setup needed. To manually initialize:

CODEBLOCK3

How It Works

1. Capture (automatic)

The capture.sh hook fires after every Bash tool use. It reads the tool output from stdin (JSON), detects errors via pattern matching, and stores structured entries:

CODEBLOCK4

When the agent (or user) resolves the error, the agent should update the entry:

CODEBLOCK5

2. Recall (automatic)

The recall.sh hook fires before every user prompt. It extracts keywords from the prompt, searches the entry index, and injects relevant past learnings:

CODEBLOCK6

This costs ~50-80 tokens when matches exist, zero when they don't.

3. Promote (automatic)

When an entry hits 3+ occurrences, promote.sh appends a concise rule to CLAUDE.md:

CODEBLOCK7

Promoted entries are marked "promoted": true and stop being injected via recall (the rule is now in CLAUDE.md permanently).

4. Verify (agent-driven)

After the agent applies a recalled solution, it should verify and update:

- Worked: Increment occurrences, update INLINECODE14
Failed: Add note to entry, flag for review, decrement confidence

This step is agent-driven (via prompt instruction), not hook-automated, to avoid false positives.

Entry Types

Type	Trigger	Example
INLINECODE15	Command failure detected by hook	INLINECODE16, `Permission denied`, INLINECODE18
INLINECODE19

Data Format

Entries live in .reflexion/entries/ as individual JSON files (one per learning). This enables:

- Fast grep-based search (no parsing a giant markdown file)
Atomic writes (no corruption from concurrent access)
Easy manual editing
Git-friendly diffs

The keyword index at .reflexion/index.txt maps keywords to entry IDs for fast recall:

CODEBLOCK8

Promotion Rules

An entry is auto-promoted to CLAUDE.md when ALL conditions are met:

1. INLINECODE24
INLINECODE25 is non-empty (the fix is known)
INLINECODE26 is false (not already promoted)
Entry is older than 1 day (not a flurry of the same error in one session)

Promoted rules are written as short, actionable directives. Not incident reports.

Agent Instructions

When this skill is active, follow these behaviors:

On Error

1. Check if capture.sh already logged it (it runs automatically on Bash errors)
If you resolve the error, update the entry's resolution field
If the error matches a recalled learning, say so and apply the known fix

On User Correction

Log a correction entry manually:

cat > .reflexion/entries/RFX-$(date +%Y%m%d)-$(head -c3 /dev/urandom | xxd -p | head -c3).json << 'ENTRY'
{
  "id": "RFX-...",
  "type": "correction",
  "trigger": "user said: actually use pnpm",
  "context": "attempted npm install",
  "resolution": "this project uses pnpm, not npm",
  "keywords": ["npm", "pnpm", "install", "package-manager"],
  "occurrences": 1,
  "first_seen": "2026-03-31",
  "last_seen": "2026-03-31",
  "promoted": false
}
ENTRY

Then rebuild the index: INLINECODE31

On Recall

When <reflexion-recall> context appears in the prompt:

1. Read the recalled learnings
Apply the known resolution if relevant
If the resolution works, increment occurrences
If it doesn't apply, ignore it (no penalty)

Before Major Tasks

Run ./scripts/status.sh to see if there are relevant learnings for the area you're about to work in.

Security

- Never log secrets, tokens, API keys, or credentials in entries
The capture.sh script redacts common secret patterns (Bearer tokens, API keys, passwords)
INLINECODE35 should be in .gitignore for private projects
For team projects, committing .reflexion/ creates shared learning (opt-in)

Comparison

Feature	self-improving-agent	OMC auto-learner	reflexion
Auto-capture errors	Hook reminder only	Pattern detection	Hook + auto-parse + store
Structured storage

File Structure

CODEBLOCK10

Citation

This skill implements the core feedback loop from:

CODEBLOCK11

The paper showed that language agents reflecting on past failures in an episodic memory buffer significantly outperform base agents — achieving 91% pass@1 on HumanEval vs GPT-4's 80%. This skill adapts that principle for AI coding agents: instead of weight updates, it stores verbal reflections (error entries with resolutions) and retrieves them when similar situations arise.

反思

面向AI编码代理的闭环学习。灵感来源于《反思：具备语言强化学习的语言代理》。

反思代理会对任务反馈信号进行语言层面的反思，然后在情景记忆缓冲区中维护自身的反思文本，以便在后续试验中做出更优决策。
— Shinn 等人，2023

问题：AI代理在不同会话中重复犯同样的错误。它们不会从错误中学习，不会记住修正，每次新的对话都从零开始。

解决方案：一个弥补反馈鸿沟的捕获-召回-提升循环。

快速参考

场景	发生什么
命令失败	capture.sh 自动将错误+上下文记录到 .reflexion/entries/
用户纠正代理

安装

Claude Code（推荐）

bash

克隆到你的项目或全局技能目录

git clone https://github.com/user/reflexion.git .claude/skills/reflexion

或复制到现有技能目录

cp -r reflexion/ ~/.claude/skills/reflexion

将钩子添加到 .claude/settings.json：

json
{
hooks: {
PostToolUse: [
{
matcher: Bash,
hooks: [
{
type: command,
command: ./.claude/skills/reflexion/scripts/capture.sh
}
]
}
],
UserPromptSubmit: [
{
matcher: ,
hooks: [
{
type: command,
command: ./.claude/skills/reflexion/scripts/recall.sh
}
]
}
]
}
}

首次运行

脚本在首次使用时自动初始化。无需设置。如需手动初始化：

bash
./scripts/init.sh

工作原理

1. 捕获（自动）

capture.sh 钩子在每次Bash工具使用后触发。它从标准输入读取工具输出（JSON格式），通过模式匹配检测错误，并存储结构化条目：

json
{
id: RFX-20260331-a7f,
type: error,
trigger: npm ERR! Missing script: \build\,
context: npm run build,
resolution: ,
keywords: [npm, build, missing, script],
occurrences: 1,
first_seen: 2026-03-31,
last_seen: 2026-03-31,
promoted: false,
cwd: /home/user/project
}

当代理（或用户）解决错误后，代理应更新该条目：

使用以下解决方案更新 .reflexion/entries/RFX-20260331-a7f.json：
使用 pnpm run build - 该项目使用 pnpm，而非 npm

2. 召回（自动）

recall.sh 钩子在每次用户提示前触发。它从提示中提取关键词，搜索条目索引，并注入相关的过往学习内容：

xml

过往学习 [RFX-20260331-a7f]（已见2次）：
触发条件：npm ERR! Missing script: build
解决方案：使用 pnpm run build - 该项目使用 pnpm，而非 npm
关键词：npm, build, missing, script

当存在匹配时，这大约消耗50-80个token，无匹配时则为零。

3. 提升（自动）

当某个条目出现3次以上时，promote.sh 会将一条简洁规则追加到 CLAUDE.md：

markdown

反思：已学习规则

- 该项目使用 pnpm，而非 npm。始终使用 pnpm run 命令。（已见3次，来源：RFX-20260331-a7f）

已提升的条目会被标记为 promoted: true，并停止通过召回注入（该规则现已永久存在于 CLAUDE.md 中）。

4. 验证（代理驱动）

代理应用召回的解决方案后，应进行验证并更新：

- 有效：增加 occurrences，更新 last_seen
无效：在条目中添加备注，标记待审查，降低置信度

此步骤由代理驱动（通过提示指令），而非钩子自动化，以避免误报。

条目类型

类型	触发条件	示例
error	钩子检测到命令失败	npm ERR!、Permission denied、ModuleNotFoundError
correction

数据格式

条目以独立JSON文件形式存储在 .reflexion/entries/ 中（每个学习内容一个文件）。这实现了：

- 快速的基于grep的搜索（无需解析大型markdown文件）
原子写入（不会因并发访问导致损坏）
轻松的手动编辑
对Git友好的差异对比

位于 .reflexion/index.txt 的关键词索引将关键词映射到条目ID，以实现快速召回：

npm:RFX-20260331-a7f,RFX-20260401-b2c
build:RFX-20260331-a7f
pnpm:RFX-20260331-a7f,RFX-20260401-b2c
docker:RFX-20260402-c1d

提升规则

当满足所有条件时，条目会自动提升到 CLAUDE.md：

1. occurrences >= 3
resolution 不为空（已知修复方案）
promoted 为 false（尚未提升）
条目存在超过1天（非同一会话中相同错误的爆发）

提升的规则以简短、可操作的指令形式写入。而非事件报告。

代理指令

当此技能激活时，请遵循以下行为：

遇到错误时

1. 检查 capture.sh 是否已记录该错误（它在Bash错误时自动运行）
如果你解决了错误，更新条目的 resolution 字段
如果错误匹配某个已召回的学习内容，说明情况并应用已知修复

遇到用户纠正时

手动记录一个 correction 条目： bash cat > .reflexion/entries/RFX-$(date +%Y%m%d)-$(head -c3 /dev/urandom | xxd -p | head -c3).json << ENTRY { id: RFX-..., type: correction, trigger: 用户说：实际上使用 pnpm, context: 尝试了 npm install, resolution: 该项目使用 pnpm，而非 npm, keywords: [npm, pnpm, install, package-manager], occurrences: 1, first_seen: 2026-03-31, last_seen: 2026-03-31, promoted: false } ENTRY

然后重建索引：./scripts/rebuild-index.sh

遇到召回时

当提示中出现上下文时：

1. 读取召回的学习内容
如果相关，应用已知的解决方案
如果解决方案有效，增加出现次数
如果不适用，忽略即可（无惩罚）

在重大任务前

运行 ./scripts/status.sh 查看是否有与你即将工作的领域相关的学习内容。

安全性

- 切勿在条目中记录密钥、令牌、API密钥或

reflexion反射学习

reflexion

Reflexion

Quick Reference

Install

Claude Code (recommended)

First run

How It Works

1. Capture (automatic)

2. Recall (automatic)

3. Promote (automatic)

4. Verify (agent-driven)

Entry Types

Data Format

Promotion Rules

Agent Instructions

On Error

On User Correction

On Recall

Before Major Tasks

Security

Comparison

File Structure

Citation

反思

快速参考

安装

Claude Code（推荐）

克隆到你的项目或全局技能目录

或复制到现有技能目录

首次运行

工作原理

1. 捕获（自动）

2. 召回（自动）

3. 提升（自动）

反思：已学习规则

4. 验证（代理驱动）

条目类型

数据格式

提升规则

代理指令

遇到错误时

遇到用户纠正时

遇到召回时

在重大任务前

安全性

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement