ClawSergeant: Boosting OpenClaw Agents from AI Feedback
ClawSergeant trains OpenClaw agents through a structured, LLM-driven pipeline. A Trainer LLM designs curriculum, generates training tasks, and adapts its teaching dynamically based on the agent's responses. A separate Evaluator LLM objectively scores each response, creating a feedback loop that drives iterative improvement.
Architecture Overview
CODEBLOCK0
Training Pipeline
Phase 1: Curriculum Design
The user's training intent is passed directly as input. The LLM generates a multi-stage curriculum as structured JSON based on this intent. The user reviews and approves the curriculum before training begins.
Each curriculum contains:
- - Title and overview of the training program
- Target persona describing the ideal agent after training
- 3–5 stages, each with:
- Name, description, and learning objectives
- 2–4 training tasks with scenario descriptions and expected behaviors
- Evaluation criteria with passing standards
Phase 2: Training Execution
For each stage and task, the system runs a dialogue loop:
- 1. Trainer LLM generates a task message tailored to the agent (it never sees hardcoded prompts — everything is dynamically composed)
- Message is sent to the Claw Agent via
openclaw agent CLI - Agent's reply is captured and fed back to the Trainer's conversation context
- Evaluator LLM scores the reply (1–10) and reports strengths, weaknesses, and improvement suggestions
- If the task is not passed and retries remain, the Trainer generates a follow-up message incorporating the evaluation feedback
- After a stage passes, the agent receives a summary prompt to internalize lessons learned
Environment Setup
Create a .env file in the project root with:
CODEBLOCK1
Running the Training
Full Training Session
CODEBLOCK2
The training intent is passed as a command-line argument. ClawSergeant designs a curriculum, presents it for approval, and runs the training session automatically. Results are saved to training_results.json.
Phase-by-Phase Testing
Use test_phases.py to verify each component independently before running a full session:
CODEBLOCK3
Always start with phase 1 to confirm the LLM connection works, then progress through subsequent phases.
Configuration
All training parameters are centralized in config.py:
| Parameter | Default | Purpose |
|---|
| INLINECODE5 / INLINECODE6 | 3 / 5 | Number of training stages |
| INLINECODE7 / INLINECODE8 |
2 / 4 | Tasks per stage |
|
CURRICULUM_TEMPERATURE | 0.4 | LLM temperature for curriculum design |
|
TRAINER_TEMPERATURE | 0.7 | LLM temperature for training messages |
|
EVALUATOR_TEMPERATURE | 0.2 | LLM temperature for evaluation (low = strict) |
|
MAX_ATTEMPTS_PER_TASK | 2 | Retries per task before moving on |
|
STAGE_PASS_THRESHOLD | 0.6 | Fraction of tasks needed to pass a stage |
Adjust STAGE_PASS_THRESHOLD higher (e.g., 0.8) for stricter training, or lower temperatures for more deterministic evaluations.
Key Components
| File | Role |
|---|
| INLINECODE15 | Entry point — orchestrates curriculum design → approval → training execution |
| INLINECODE16 |
Training session controller — manages dialogue loop and captures per-task/stage learnings |
|
curriculum.py | Curriculum data model and LLM-based generation |
|
claw_agent.py | Wraps
openclaw agent CLI for agent communication |
|
llm_handler.py | Async LLM client with conversation history management |
|
learning_logger.py | Structured experience logger — records training insights and writes to OpenClaw MEMORY.md |
|
config.py | Centralized training parameters |
|
test_phases.py | Step-by-step pipeline verification |
Training Results
After a session completes, training_results.json contains:
CODEBLOCK4
Experience Recording
Training experiences are automatically recorded throughout the session. Every task evaluation, stage result, and infrastructure error is logged to .claw_sergeant_accumulated_lessons/ as structured markdown entries for future reference.
After the session completes, a summary is written to ~/.openclaw/workspace/MEMORY.md containing the training timestamp, curriculum details, stage pass/fail results, and a pointer to the full logs. This allows the Claw agent to reference its training history in future sessions. If the OpenClaw workspace is not found, this step is silently skipped.
Troubleshooting
- - LLM connection fails: Run
python test_phases.py 1 to verify API key and endpoint. Check LLM_BASE_URL points to a valid OpenAI-compatible API. - Claw agent timeout: The default timeout is 120 seconds. If the agent is slow to respond, check network connectivity and the
openclaw CLI installation. - Curriculum has no stages: The LLM may have returned malformed JSON. Try lowering
CURRICULUM_TEMPERATURE or switching to a more capable model. - All tasks fail: Review evaluation criteria — they may be too strict. Lower
STAGE_PASS_THRESHOLD or increase MAX_ATTEMPTS_PER_TASK in config.py.
Dependencies
- - Python 3.11+
- INLINECODE34 — async HTTP client for LLM API calls
- INLINECODE35 — structured logging
- INLINECODE36 — environment variable management
- INLINECODE37 CLI — must be installed and accessible in PATH
ClawSergeant:通过AI反馈提升OpenClaw智能体
ClawSergeant通过结构化的LLM驱动流水线训练OpenClaw智能体。训练师LLM设计课程、生成训练任务,并根据智能体的响应动态调整教学方式。独立的评估师LLM对每个响应进行客观评分,形成驱动迭代改进的反馈循环。
架构概览
用户意图 ──────────────────────→ LLM(课程设计师)
↓
课程JSON(阶段、任务、标准)
↓
训练会话循环:
训练师LLM → 构造消息 → openclaw CLI → Claw智能体 → 回复
↓
评估师LLM → 评分 + 反馈
↓
记录到 .clawsergeantaccumulated_lessons/ ←──┘
↓
(若失败)→ 训练师LLM根据反馈重试
↓
(若阶段通过)→ 阶段总结用于记忆巩固
↓
[课程模式] → 记录到 .clawsergeantaccumulated_lessons/
训练流水线
第一阶段:课程设计
用户的训练意图直接作为输入传入。LLM根据该意图生成结构化的JSON格式多阶段课程。用户在训练开始前审核并批准课程。
每个课程包含:
- - 训练计划的标题和概述
- 描述训练后理想智能体的目标角色
- 3-5个阶段,每个阶段包含:
- 名称、描述和学习目标
- 2-4个训练任务,包含场景描述和预期行为
- 带有通过标准的评估标准
第二阶段:训练执行
对于每个阶段和任务,系统运行对话循环:
- 1. 训练师LLM生成针对智能体定制的任务消息(它从不看到硬编码提示——所有内容都是动态组合的)
- 通过openclaw agent CLI将消息发送给Claw智能体
- 捕获智能体的回复并反馈到训练师的对话上下文中
- 评估师LLM对回复进行评分(1-10分),并报告优势、劣势和改进建议
- 如果任务未通过且还有重试次数,训练师会生成包含评估反馈的后续消息
- 阶段通过后,智能体收到总结提示以内化所学经验
环境设置
在项目根目录创建.env文件,内容如下:
LLMAPIKEY= # 必需:LLM的API密钥
LLMBASEURL=https://api.openai.com/v1 # 可选:兼容OpenAI的端点
LLM_MODEL=gpt-4o # 可选:模型标识符
CLAW_RECIPIENT=+15555550123 # 必需:目标智能体的地址
运行训练
完整训练会话
bash
python main.py 一个高效、严谨的编程助手
训练意图作为命令行参数传入。ClawSergeant设计课程、提交审批,并自动运行训练会话。结果保存到training_results.json。
分阶段测试
在运行完整会话前,使用test_phases.py独立验证每个组件:
bash
python test_phases.py 1 # 验证LLM API连接
python test_phases.py 2 # 测试课程生成
python test_phases.py 3 # 测试Claw智能体通信
python test_phases.py 4 # 运行单任务训练轮次
python test_phases.py all # 按顺序运行所有阶段
始终从阶段1开始确认LLM连接正常,然后依次进行后续阶段。
配置
所有训练参数集中在config.py中:
| 参数 | 默认值 | 用途 |
|---|
| STAGECOUNTMIN / MAX | 3 / 5 | 训练阶段数量 |
| TASKSPERSTAGE_MIN / MAX |
2 / 4 | 每个阶段的任务数 |
| CURRICULUM_TEMPERATURE | 0.4 | 课程设计的LLM温度 |
| TRAINER_TEMPERATURE | 0.7 | 训练消息的LLM温度 |
| EVALUATOR_TEMPERATURE | 0.2 | 评估的LLM温度(低=严格) |
| MAX
ATTEMPTSPER_TASK | 2 | 每个任务继续前的重试次数 |
| STAGE
PASSTHRESHOLD | 0.6 | 阶段通过所需的任务比例 |
将STAGEPASSTHRESHOLD调高(如0.8)可获得更严格的训练,或降低温度以获得更确定的评估。
关键组件
| 文件 | 角色 |
|---|
| main.py | 入口点——协调课程设计→审批→训练执行 |
| trainer.py |
训练会话控制器——管理对话循环并捕获每个任务/阶段的学习成果 |
| curriculum.py | 课程数据模型和基于LLM的生成 |
| claw_agent.py | 封装openclaw agent CLI用于智能体通信 |
| llm_handler.py | 带对话历史管理的异步LLM客户端 |
| learning_logger.py | 结构化经验记录器——记录训练洞察并写入OpenClaw MEMORY.md |
| config.py | 集中式训练参数 |
| test_phases.py | 逐步流水线验证 |
训练结果
会话完成后,training_results.json包含:
json
{
curriculum: {
title: ...,
overview: ...,
target_persona: ...,
stages_total: 4,
stages_passed: 3
},
stage_reports: [
{
stage_id: 1,
stage_name: ...,
passed: true,
overall_feedback: ...,
tasks: [
{
task_id: 1.1,
passed: true,
score: 8,
strengths: [...],
weaknesses: [...],
feedback: ...
}
]
}
]
}
经验记录
训练经验在会话过程中自动记录。每个任务评估、阶段结果和基础设施错误都以结构化markdown条目记录到.clawsergeantaccumulated_lessons/,供将来参考。
会话完成后,将摘要写入~/.openclaw/workspace/MEMORY.md,包含训练时间戳、课程详情、阶段通过/失败结果以及完整日志的指针。这使得Claw智能体能够在未来的会话中引用其训练历史。如果未找到OpenClaw工作空间,此步骤将被静默跳过。
故障排除
- - LLM连接失败:运行python testphases.py 1验证API密钥和端点。检查LLMBASEURL是否指向有效的兼容OpenAI的API。
- Claw智能体超时:默认超时为120秒。如果智能体响应缓慢,检查网络连接和openclaw CLI安装。
- 课程没有阶段:LLM可能返回了格式错误的JSON。尝试降低CURRICULUMTEMPERATURE或切换到更强大的模型。
- 所有任务失败:检查评估标准——可能过于严格。在config.py中降低STAGEPASSTHRESHOLD或增加MAXATTEMPTSPER_TASK。
依赖项
- - Python 3.11+
- httpx — 用于LLM API调用的异步HTTP客户端
- loguru — 结构化日志记录
- python-dotenv — 环境变量管理
- openclaw CLI — 必须安装并在PATH中可访问