ClawSergeant: Boosting OpenClaw Agents from AI Feedback

ClawSergeant trains OpenClaw agents through a structured, LLM-driven pipeline. A Trainer LLM designs curriculum, generates training tasks, and adapts its teaching dynamically based on the agent's responses. A separate Evaluator LLM objectively scores each response, creating a feedback loop that drives iterative improvement.

Architecture Overview

CODEBLOCK0

Training Pipeline

Phase 1: Curriculum Design

The user's training intent is passed directly as input. The LLM generates a multi-stage curriculum as structured JSON based on this intent. The user reviews and approves the curriculum before training begins.

Each curriculum contains:

- Title and overview of the training program
Target persona describing the ideal agent after training
3–5 stages, each with:

- Name, description, and learning objectives - 2–4 training tasks with scenario descriptions and expected behaviors - Evaluation criteria with passing standards

Phase 2: Training Execution

For each stage and task, the system runs a dialogue loop:

1. Trainer LLM generates a task message tailored to the agent (it never sees hardcoded prompts — everything is dynamically composed)
Message is sent to the Claw Agent via openclaw agent CLI
Agent's reply is captured and fed back to the Trainer's conversation context
Evaluator LLM scores the reply (1–10) and reports strengths, weaknesses, and improvement suggestions
If the task is not passed and retries remain, the Trainer generates a follow-up message incorporating the evaluation feedback
After a stage passes, the agent receives a summary prompt to internalize lessons learned

Environment Setup

Create a .env file in the project root with:

CODEBLOCK1

Running the Training

Full Training Session

CODEBLOCK2

The training intent is passed as a command-line argument. ClawSergeant designs a curriculum, presents it for approval, and runs the training session automatically. Results are saved to training_results.json.

Phase-by-Phase Testing

Use test_phases.py to verify each component independently before running a full session:

CODEBLOCK3

Always start with phase 1 to confirm the LLM connection works, then progress through subsequent phases.

Configuration

All training parameters are centralized in config.py:

Parameter	Default	Purpose
INLINECODE5 / INLINECODE6	3 / 5	Number of training stages
INLINECODE7 / INLINECODE8

Adjust STAGE_PASS_THRESHOLD higher (e.g., 0.8) for stricter training, or lower temperatures for more deterministic evaluations.

Key Components

File	Role
INLINECODE15	Entry point — orchestrates curriculum design → approval → training execution
INLINECODE16

Training Results

After a session completes, training_results.json contains:

CODEBLOCK4

Experience Recording

Training experiences are automatically recorded throughout the session. Every task evaluation, stage result, and infrastructure error is logged to .claw_sergeant_accumulated_lessons/ as structured markdown entries for future reference.

After the session completes, a summary is written to ~/.openclaw/workspace/MEMORY.md containing the training timestamp, curriculum details, stage pass/fail results, and a pointer to the full logs. This allows the Claw agent to reference its training history in future sessions. If the OpenClaw workspace is not found, this step is silently skipped.

Troubleshooting

- LLM connection fails: Run python test_phases.py 1 to verify API key and endpoint. Check LLM_BASE_URL points to a valid OpenAI-compatible API.
Claw agent timeout: The default timeout is 120 seconds. If the agent is slow to respond, check network connectivity and the openclaw CLI installation.
Curriculum has no stages: The LLM may have returned malformed JSON. Try lowering CURRICULUM_TEMPERATURE or switching to a more capable model.
All tasks fail: Review evaluation criteria — they may be too strict. Lower STAGE_PASS_THRESHOLD or increase MAX_ATTEMPTS_PER_TASK in config.py.

Dependencies

- Python 3.11+
INLINECODE34 — async HTTP client for LLM API calls
INLINECODE35 — structured logging
INLINECODE36 — environment variable management
INLINECODE37 CLI — must be installed and accessible in PATH

ClawSergeant：通过AI反馈提升OpenClaw智能体

ClawSergeant通过结构化的LLM驱动流水线训练OpenClaw智能体。训练师LLM设计课程、生成训练任务，并根据智能体的响应动态调整教学方式。独立的评估师LLM对每个响应进行客观评分，形成驱动迭代改进的反馈循环。

架构概览

用户意图 ──────────────────────→ LLM（课程设计师）
↓
课程JSON（阶段、任务、标准）
↓
训练会话循环：
训练师LLM → 构造消息 → openclaw CLI → Claw智能体 → 回复
↓
评估师LLM → 评分 + 反馈
↓
记录到 .clawsergeantaccumulated_lessons/ ←──┘
↓
（若失败）→ 训练师LLM根据反馈重试
↓
（若阶段通过）→ 阶段总结用于记忆巩固
↓
[课程模式] → 记录到 .clawsergeantaccumulated_lessons/

训练流水线

第一阶段：课程设计

用户的训练意图直接作为输入传入。LLM根据该意图生成结构化的JSON格式多阶段课程。用户在训练开始前审核并批准课程。

每个课程包含：

- 训练计划的标题和概述
描述训练后理想智能体的目标角色
3-5个阶段，每个阶段包含：

- 名称、描述和学习目标 - 2-4个训练任务，包含场景描述和预期行为 - 带有通过标准的评估标准

第二阶段：训练执行

对于每个阶段和任务，系统运行对话循环：

1. 训练师LLM生成针对智能体定制的任务消息（它从不看到硬编码提示——所有内容都是动态组合的）
通过openclaw agent CLI将消息发送给Claw智能体
捕获智能体的回复并反馈到训练师的对话上下文中
评估师LLM对回复进行评分（1-10分），并报告优势、劣势和改进建议
如果任务未通过且还有重试次数，训练师会生成包含评估反馈的后续消息
阶段通过后，智能体收到总结提示以内化所学经验

环境设置

在项目根目录创建.env文件，内容如下：

LLMAPIKEY= # 必需：LLM的API密钥
LLMBASEURL=https://api.openai.com/v1 # 可选：兼容OpenAI的端点
LLM_MODEL=gpt-4o # 可选：模型标识符
CLAW_RECIPIENT=+15555550123 # 必需：目标智能体的地址

运行训练

完整训练会话

bash
python main.py 一个高效、严谨的编程助手

训练意图作为命令行参数传入。ClawSergeant设计课程、提交审批，并自动运行训练会话。结果保存到training_results.json。

分阶段测试

在运行完整会话前，使用test_phases.py独立验证每个组件：

bash
python test_phases.py 1 # 验证LLM API连接
python test_phases.py 2 # 测试课程生成
python test_phases.py 3 # 测试Claw智能体通信
python test_phases.py 4 # 运行单任务训练轮次
python test_phases.py all # 按顺序运行所有阶段

始终从阶段1开始确认LLM连接正常，然后依次进行后续阶段。

配置

所有训练参数集中在config.py中：

参数	默认值	用途
STAGECOUNTMIN / MAX	3 / 5	训练阶段数量
TASKSPERSTAGE_MIN / MAX

将STAGEPASSTHRESHOLD调高（如0.8）可获得更严格的训练，或降低温度以获得更确定的评估。

关键组件

文件	角色
main.py	入口点——协调课程设计→审批→训练执行
trainer.py

训练结果

会话完成后，training_results.json包含：

json
{
curriculum: {
title: ...,
overview: ...,
target_persona: ...,
stages_total: 4,
stages_passed: 3
},
stage_reports: [
{
stage_id: 1,
stage_name: ...,
passed: true,
overall_feedback: ...,
tasks: [
{
task_id: 1.1,
passed: true,
score: 8,
strengths: [...],
weaknesses: [...],
feedback: ...
}
]
}
]
}

经验记录

训练经验在会话过程中自动记录。每个任务评估、阶段结果和基础设施错误都以结构化markdown条目记录到.clawsergeantaccumulated_lessons/，供将来参考。

会话完成后，将摘要写入~/.openclaw/workspace/MEMORY.md，包含训练时间戳、课程详情、阶段通过/失败结果以及完整日志的指针。这使得Claw智能体能够在未来的会话中引用其训练历史。如果未找到OpenClaw工作空间，此步骤将被静默跳过。

故障排除

- LLM连接失败：运行python testphases.py 1验证API密钥和端点。检查LLMBASEURL是否指向有效的兼容OpenAI的API。
Claw智能体超时：默认超时为120秒。如果智能体响应缓慢，检查网络连接和openclaw CLI安装。
课程没有阶段：LLM可能返回了格式错误的JSON。尝试降低CURRICULUMTEMPERATURE或切换到更强大的模型。
所有任务失败：检查评估标准——可能过于严格。在config.py中降低STAGEPASSTHRESHOLD或增加MAXATTEMPTSPER_TASK。

依赖项

- Python 3.11+
httpx — 用于LLM API调用的异步HTTP客户端
loguru — 结构化日志记录
python-dotenv — 环境变量管理
openclaw CLI — 必须安装并在PATH中可访问

claw-sergeant爪士训练

claw-sergeant

ClawSergeant: Boosting OpenClaw Agents from AI Feedback

Architecture Overview

Training Pipeline

Phase 1: Curriculum Design

Phase 2: Training Execution

Environment Setup

Running the Training

Full Training Session

Phase-by-Phase Testing

Configuration

Key Components

Training Results

Experience Recording

Troubleshooting

Dependencies

ClawSergeant：通过AI反馈提升OpenClaw智能体

架构概览

训练流水线

第一阶段：课程设计

第二阶段：训练执行

环境设置

运行训练

完整训练会话

分阶段测试

配置

关键组件

训练结果

经验记录

故障排除

依赖项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

claw-sergeant爪士训练

claw-sergeant

ClawSergeant: Boosting OpenClaw Agents from AI Feedback

Architecture Overview

Training Pipeline

Phase 1: Curriculum Design

Phase 2: Training Execution

Environment Setup

Running the Training

Full Training Session

Phase-by-Phase Testing

Configuration

Key Components

Training Results

Experience Recording

Troubleshooting

Dependencies

ClawSergeant：通过AI反馈提升OpenClaw智能体

架构概览

训练流水线

第一阶段：课程设计

第二阶段：训练执行

环境设置

运行训练

完整训练会话

分阶段测试

配置

关键组件

训练结果

经验记录

故障排除

依赖项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement