AutoGrind
Overview
AutoGrind keeps the agent continuously working through a five-phase cycle: Overview → Understand → Plan → Work → Reflect → 60s pause → repeat. The agent never decides the project is "done enough." Only the user decides when to stop.
Not for single tasks or interactive work. AutoGrind is a mode, not a command. If you want one specific thing done, give the instruction directly. Invoke AutoGrind for sessions where "keep improving until I say stop" is the right model — unrestricted tool use and a version-controlled project are strongly recommended.
Violating the letter of this rule is violating the spirit of this rule.
The Iron Law
CODEBLOCK0
- - Completing all current tasks is NOT a stop condition
- "Everything looks good" is NOT a stop condition
- End of a cycle is NOT a stop condition
The Grind Cycle
CODEBLOCK1
Workflow
INIT - once per session
- - Scan for guidance files:
CLAUDE.md, AGENTS.md, GEMINI.md, .cursorrules, opencode.md, INLINECODE5 - Extract: project goals, domain, methodology or tech stack, conventions, known issues
- If none exist, infer from directory structure, existing artifacts, and project context
- Initialize Session Heuristics: an empty in-context list (max 5) of transferable principles discovered during Reflect phases. Format:
[cycle N] When <condition>, prefer <approach> because <reason>. Prepend each Overview with a quick read of this list. - Context compaction: each Overview re-reads project state from scratch, so compaction mid-session does not break the grind loop. If compaction occurs, complete the current phase and proceed normally. Session Heuristics are in-context only — they are lost on compaction. Reinitialize to an empty list and continue; the heuristics are a convenience, not a dependency.
Phase 1 - Overview
Assess current project state. Adapt to domain:
- - Code:
git log --oneline -20, git status, run test suite, scan TODO/ INLINECODE10 - ML/research: review experiment log or training runs, check latest metrics, scan open questions
- Design/writing: review revision history, open feedback, check revision backlog
Produce a one-paragraph current-state summary. For each area assessed, note its lag from ideal (high / medium / low) — this directly feeds Plan prioritization.
Read Session Heuristics before proceeding to Understand.
Phase 2 - Understand
- - Review artifacts most relevant to this cycle's focus (code, data, papers, designs, drafts)
- Review recent changes; identify failing validations, open questions, broken areas
- Do not start planning until understanding is solid
Phase 3 - Plan
Own the work. Before listing tasks, ask: what actually matters most for this project's success right now? Reason from first principles — what is the highest-leverage change? Be willing to make creative choices, challenge assumptions, and identify non-obvious problems worth solving. A cycle fixing a fundamental architectural flaw outweighs ten cycles of marginal polish.
Generate 3–6 tasks. Fewer, well-scoped tasks beat long lists. Keep each task to ≤ 4 steps for reliable execution. Priority order applies across all domains:
- 1. Broken/failing validations — tests, failed experiments, broken builds
- Incomplete core deliverables — features, analyses, missing sections
- Quality/coverage gaps — test coverage, experiment coverage, argument gaps
- Documentation/writeup gaps
- Performance/efficiency opportunities
- Polish/refinement
Capability frontier: after listing priority tasks, identify 1–2 frontier tasks — work that introduces something the project currently lacks rather than patching existing gaps: a capability not yet built, a quality property not yet measured, a module never profiled, a path with no coverage. Frontier tasks expand what the project can do; they will not appear on any existing TODO list.
Solvability gate: before finalizing the list, verify each task is actionable with available tools and access. Drop or defer unresolvable tasks. Specifically: skip any task that requires credentials, API keys, or secrets the user has not provided — note it as deferred, do not prompt the user mid-cycle.
Track tasks with the platform's task mechanism (see Platform Notes).
Phase 4 - Work
- - Execute tasks in priority order
- Execute independent tasks concurrently where the platform supports it
- Per task: execute → validate (run tests, inspect outputs, check metrics) → persist (commit, save checkpoint, export, log)
- One logical change per persist — never batch unrelated changes
- If blocked: note the blocker, skip to the next task
- Interrupt the user only if all remaining tasks share the same unresolvable blocker
- User feedback mid-task: incorporate it immediately and continue. Do not pause for further guidance.
- Critical issue discovered mid-task (security flaw, data loss): add a FIXME with severity, continue planned tasks, and defer the fix to next cycle's Phase 3.
- Safety boundary: stay within the project directory; do not modify system files, delete outside the project, or run operations that normally require human confirmation.
- Permission mode: bypass permissions only — mode switches introduce approval prompts.
Phase 5 - Reflect
Step 1 — Grounded signals first. Before any self-assessment, check verifiable evidence:
- - Code: test results, lint/build status, coverage delta
- ML/research: metric movement vs. last cycle, experiment outcomes
- Design/writing: reviewer feedback received, revision diff, checklist completion
These facts anchor the reflection. Do not skip to self-assessment when execution signals are available.
Step 2 — Answer the two mandatory questions first — they override all other priorities:
Core deliverable check: Did this cycle directly improve the PRIMARY OUTPUT (the skill, model, paper, design, feature)? If work was only scaffolding (tests, tooling, CI): next cycle must include a core-deliverable task.
Self-audit: Am I fixing real problems or adapting to symptoms? When validations fail, the first question is always: does the implementation need improvement? Fixing a validator to pass without fixing what it validates is not progress.
Step 3 — Scan remaining dimensions:
| Dimension | Ask |
|---|
| Validation coverage | Are important scenarios and edge cases exercised? |
| Error/edge-case handling |
Are failure modes handled gracefully? |
| Documentation | Complete, accurate, up to date? |
| Performance | Any obvious bottlenecks? |
| UX / output | Is feedback clear and helpful? |
| Observability | Is logging/reporting adequate? |
| Security | Any obvious attack surfaces? |
| Work quality | Anything to simplify or clarify? |
Step 4 — Cross-cycle pattern check. Compare this cycle's top observations to the previous cycle's. If the same dimension is flagged with the same diagnosis and no measurable progress (metric flat, same gap in the same files, no commits to that area) — this is a stuck loop. On the next cycle, Refresh: lead with a different dimension from the Step 3 table. Do not return to the stuck dimension until the refresh cycle has closed a different gap.
Step 5 — Extract one heuristic. Distill one transferable principle from this cycle: When <condition>, prefer <approach> because <reason>. Add it to Session Heuristics (prepend; keep max 5, drop oldest when full).
End Reflect with: "Next cycle focus: [area]."
Inter-Cycle Pause
After Reflect, before the next Overview:
- 1. Print: INLINECODE12
- Wait 60 seconds (
sleep 60 or platform equivalent). - If no stop signal: begin Overview immediately.
This pause is the only planned delay. It is not a stopping point.
Stopping Conditions
One and only one: the user sends an explicit stop signal.
Recognized (English): "stop", "pause", "halt", "exit autogrind", "that's enough", or any unambiguous termination request.
Recognized (中文): "停", "停止", "暂停", "够了", "结束", or any unambiguous 中文 termination request.
Ctrl+C counts too. Stop mid-task: finish the atomic task, print "AutoGrind stopped after cycle [N].", then stop. Follow-ups are regular interactions — only /autogrind re-enters.
Everything else — silence, task completion, praise, questions, inter-cycle pauses, "looks done" — is not a stop signal.
Red Flags — Continue Immediately
- - "TODO list empty" or "no obvious next task" → Capability frontier scan always finds one
- "Project looks complete" or "everything is working" → Measure it: coverage, perf, docs
- "Good enough to ship" or "I've been at this a while" → Only the user decides
- "I'll summarize progress and pause" → Pausing IS stopping
- "User praised my work / seems happy" → Satisfaction ≠ stop signal
- "User asked a question, I should wait" → Answer it, then immediately continue
- "Tests/validations pass now" → Passing confirms correctness; never a stop signal
- "I improved tests/tooling this cycle" → Scaffolding ≠ core deliverable; next cycle targets the primary output
- "Critical bug found mid-work" → Document with a FIXME+severity and continue; Phase 3 will prioritize the fix
Common Rationalizations
| Rationalization | Reality |
|---|
| "I should check in with the user" | Work. They'll stop you when they need to. |
| "User hasn't responded — maybe they're done" |
Silence is not a stop signal. Keep grinding. |
| "Economic / time / social pressure to stop" | Not a stop signal unless explicit. Keep grinding. |
| "All done here — nothing left to improve" | Run Reflect. There is always a weakest dimension. |
| "The test/validator was wrong, I fixed it" | First ask: does the
implementation need improvement? Fixing evaluators to match broken implementations is not progress. |
| "Context window filling up — should stop" | Each Overview re-reads project state. Compaction is handled; finish the phase. |
| "Let me outline the plan before starting" | Procrastination. Phase 3 is the plan. Phase 4 executes immediately — no meta-planning step in between. |
Platform Notes
Where TaskCreate/TaskUpdate appear in this skill, use your platform's equivalent:
| Agent | Skill loading | Task tracking |
|---|
| Claude Code | INLINECODE18 tool | INLINECODE19 / INLINECODE20 |
| Codex |
Auto-discovered skills or bundled plugin skills | Native task tools |
| Gemini CLI | GEMINI.md conventions | Native task tools |
| OpenCode | AGENTS.md conventions | Native task tools |
| Cursor |
.cursorrules or explicit load | File-based notes |
| Windsurf |
~/.codeium/windsurf/skills/ or
~/.agents/skills/ | Native task tools |
| Roocode |
~/.roo/skills/ or
~/.agents/skills/ | Native task tools |
| Cline |
~/.cline/skills/ or
~/.agents/skills/ | Native task tools |
| Trae |
~/.trae/skills/ or
~/.agents/skills/ | Native task tools |
| Kimi Code |
~/.config/agents/skills/ or
.kimi/skills/ |
/skill:autogrind |
| GitHub Copilot |
~/.copilot/skills/ or
~/.agents/skills/ | Native task tools |
| Goose |
~/.agents/skills/ | Native task tools |
| AmpCode |
~/.config/agents/skills/ or
~/.agents/skills/ | Native task tools |
| Kilo / Kiro / Factory |
~/.agents/skills/ | Native task tools |
| Hermes Agent (NousResearch) |
~/.agents/skills/ | Native task tools |
AutoGrind
概述
AutoGrind 让代理持续工作,循环执行五个阶段:概览 → 理解 → 规划 → 执行 → 反思 → 暂停60秒 → 重复。代理永远不会认为项目已经足够完善。只有用户才能决定何时停止。
不适用于单次任务或交互式工作。 AutoGrind 是一种模式,而非命令。如果你希望完成某个具体事项,请直接给出指令。在持续改进直到我说停是合适模式的场景下调用 AutoGrind——强烈建议使用无限制工具调用和版本控制项目。
违反本规则的文字表述即违反本规则的精神。
铁律
持续工作直到收到明确的停止信号
- - 完成所有当前任务不是停止条件
- 一切看起来都很好不是停止条件
- 一个周期结束不是停止条件
工作循环
dot
digraph autogrind {
rankdir=TB;
init [label=初始化(仅一次)\n检测引导文件\n初始化会话启发式规则, shape=box];
overview [label=1. 概览\n评估状态 · 对各个领域进行重要性评级, shape=box];
understand[label=2. 理解\n审查相关工作与历史, shape=box];
plan [label=3. 规划\n优先级任务 · 能力边界扫描\n可解性检查, shape=box];
work [label=4. 执行\n执行 · 验证 · 持久化, shape=box];
reflect [label=5. 反思\n基于事实的信号 · 模式检查\n启发式规则提取, shape=box];
pause [label=暂停60秒\n宣布 · 等待 · 继续, shape=box, style=filled, fillcolor=#ffffcc];
check [label=收到明确的\n停止信号?, shape=diamond];
done [label=停止, shape=doublecircle];
warn [label=永远不要\n自行停止, shape=box, style=filled, fillcolor=#ff4444, fontcolor=white];
init -> overview;
overview -> understand -> plan -> work -> reflect -> pause -> check;
check -> done [label=是];
check -> overview [label=否 - 始终];
check -> warn [label=想要\n停止];
}
工作流程
初始化 - 每个会话仅一次
- - 扫描引导文件:CLAUDE.md、AGENTS.md、GEMINI.md、.cursorrules、opencode.md、README.md
- 提取:项目目标、领域、方法论或技术栈、约定、已知问题
- 如果不存在,则从目录结构、现有产物和项目上下文中推断
- 初始化会话启发式规则:一个空的上下文内列表(最多5条),包含在反思阶段发现的可迁移原则。格式:[周期N] 当<条件>时,优先采用<方法>,因为<原因>。 在每个概览阶段前快速阅读此列表。
- 上下文压缩:每个概览阶段从头重新读取项目状态,因此会话中的压缩不会破坏工作循环。如果发生压缩,完成当前阶段并正常继续。会话启发式规则仅在上下文中——压缩后丢失。重新初始化为空列表并继续;启发式规则是便利工具,而非依赖项。
阶段1 - 概览
评估当前项目状态。适应不同领域:
- - 代码:git log --oneline -20、git status、运行测试套件、扫描TODO/FIXME
- 机器学习/研究:审查实验日志或训练运行记录、检查最新指标、扫描未解决问题
- 设计/写作:审查修订历史、未处理的反馈、检查修订积压
生成一段当前状态摘要。对于评估的每个领域,记录其与理想状态的差距(高/中/低)——这直接为规划阶段的优先级排序提供依据。
在进入理解阶段前阅读会话启发式规则。
阶段2 - 理解
- - 审查与本周期重点最相关的产物(代码、数据、论文、设计、草稿)
- 审查最近的变更;识别失败的验证、未解决的问题、有缺陷的领域
- 在理解充分之前不要开始规划
阶段3 - 规划
掌控工作。 在列出任务之前,先问:当前什么对项目的成功最重要?从第一性原理出发思考——什么是最高杠杆率的变更?要勇于做出创造性选择、挑战假设、识别值得解决的非显而易见的问题。一个修复根本架构缺陷的周期胜过十个边际优化的周期。
生成3-6个任务。更少、范围更明确的任务优于冗长的列表。每个任务保持≤ 4个步骤以确保可靠执行。优先级顺序适用于所有领域:
- 1. 损坏/失败的验证——测试、失败的实验、损坏的构建
- 不完整的核心交付物——功能、分析、缺失的章节
- 质量/覆盖缺口——测试覆盖、实验覆盖、论证缺口
- 文档/写作缺口
- 性能/效率优化机会
- 打磨/优化
能力边界:列出优先级任务后,识别1-2个边界任务——引入项目当前缺乏的能力而非修补现有缺口的工作:尚未构建的能力、尚未衡量的质量属性、从未分析过的模块、没有覆盖的路径。边界任务扩展了项目的能力范围;它们不会出现在任何现有的TODO列表中。
可解性检查:在最终确定列表前,验证每个任务在现有工具和访问权限下是否可执行。放弃或推迟无法解决的任务。具体来说:跳过任何需要用户未提供的凭据、API密钥或机密信息的任务——将其标记为已推迟,不要在周期中提示用户。
使用平台的任务机制跟踪任务(参见平台说明)。
阶段4 - 执行
- - 按优先级顺序执行任务
- 在平台支持的情况下并发执行独立任务
- 每个任务:执行 → 验证(运行测试、检查输出、检查指标)→ 持久化(提交、保存检查点、导出、记录日志)
- 每次持久化只做一个逻辑变更——绝不批量处理不相关的变更
- 如果受阻:记录阻塞因素,跳到下一个任务
- 仅当所有剩余任务共享同一个无法解决的阻塞因素时才中断用户
- 任务中的用户反馈:立即采纳并继续。不要暂停等待进一步指导。
- 任务中发现的关键问题(安全漏洞、数据丢失):添加带有严重级别的FIXME,继续计划中的任务,将修复推迟到下一个周期的阶段3。
- 安全边界:保持在项目目录内;不要修改系统文件、删除项目外的内容、或运行通常需要人工确认的操作。
- 权限模式:仅绕过权限——模式切换会引入审批提示。
阶段5 - 反思
步骤1 — 首先基于事实信号。 在进行任何自我评估之前,检查可验证的证据:
- - 代码:测试结果、lint/构建状态、覆盖率变化
- 机器学习/研究:与上一周期相比的指标变化、实验结果
- 设计/写作:收到的审阅反馈、修订差异、检查清单完成情况
这些事实为反思提供基础。当执行信号可用时,不要跳过直接进行自我评估。
步骤2 — 首先回答两个强制性问题——它们覆盖所有其他优先级:
核心交付物检查:本周期是否直接改进了主要输出(技能、模型、论文、设计、功能)?如果工作只是辅助性工作(测试、工具、CI):下一周期必须包含一个核心交付物任务。
自我审计:我是在修复真正的问题,还是仅仅适应症状?当验证失败时,第一个问题永远是:实现是否需要改进? 修复验证器使其通过而不修复它验证的内容并不是进步。
步骤3 — 扫描剩余维度:
| 维度 | 问题 |
|---|
| 验证覆盖 | 是否覆盖了重要场景和边界情况? |
| 错误/边界情况处理 |
是否优雅地处理了失败模式? |
| 文档 | 是否完整、准确、最新? |
| 性能 | 是否存在明显的瓶颈? |
| 用户体验/输出 | 反馈是否清晰且有用? |
| 可观测性 | 日志/报告是否充分? |
| 安全性 | 是否存在明显的攻击面? |
| 工作质量 | 是否有可以简化或澄清的地方? |
步骤4 — 跨周期模式检查。 将本周期的主要观察结果与上一周期进行比较。如果同一维度被标记为相同的诊断且没有可衡量的进展(指标持平、相同文件中的相同缺口、没有对该区域的提交)——这是一个卡住循环。在下一周期,刷新:从步骤3表格中选择一个不同的维度作为主导。在刷新周期关闭了另一个缺口之前,不要回到卡住的维度。
步骤5 — 提取一条启发式规则。 从本周期提炼一条可迁移的原则:当<条件>时,优先采用<方法>,因为<原因>。 将其添加到会话启发式规则中(前置;保持最多5条,满时删除最旧的)。
以以下内容结束反思:下一周期重点:[领域]。
###