Agent Lightning ⚡
Microsoft Research's agent training framework. Turn your AI agents into optimizable beasts with (almost) zero code changes.
Core Features
- - 🔌 Universal Compatibility: Works with LangChain, OpenAI Agent SDK, AutoGen, CrewAI, Microsoft Agent Framework, or plain Python OpenAI
- 🎯 Selective Optimization: Optimize one or more agents in a multi-agent system
- 🧠 Multiple Algorithms: Reinforcement Learning (RL), Automatic Prompt Optimization (APO), Supervised Fine-tuning (SFT)
- ⚡ Zero Code Change: Add
agl.emit_xxx() helpers or use tracer — your agent keeps running as usual
Installation
CODEBLOCK0
For latest nightly build:
CODEBLOCK1
Quick Start
1. Instrument Your Agent
Option A: Add emit helpers (recommended)
CODEBLOCK2
Option B: Use tracer (zero code change)
CODEBLOCK3
2. Create Training Config
CODEBLOCK4
3. Run Training
CODEBLOCK5
Algorithms
| Algorithm | Use Case | Description |
|---|
| GRPO | General RL | Group Relative Policy Optimization — stable, works well for most agents |
| APO |
Prompt Tuning | Automatic Prompt Optimization — improves system prompts |
|
SFT | Supervised Fine-tuning | Supervised Fine-tuning with preference data |
|
RLOO | Long-horizon | RLOO for tasks with sparse rewards |
Usage Commands
agent-lightning train
Train your agent with configured algorithm.
agent-lightning eval
Evaluate agent on benchmark tasks.
agent-lightning export
Export trained model/prompts for deployment.
agent-lightning serve
Launch serving endpoint for trained agent.
Example: SQL Agent Training
See full example: Train SQL Agent with RL
CODEBLOCK6
Integration with Clawdbot
Environment Variables
CODEBLOCK7
Python API
CODEBLOCK8
Monitoring Training
CODEBLOCK9
Best Practices
- 1. Start Small: Begin with 10-50 episodes to verify setup
- Define Clear Rewards: Design reward functions that match your goal
- Use Evaluation Tasks: Always eval on held-out tasks
- Checkpoint Frequently: Save model every N episodes
- Monitor Convergence: Watch loss curves in dashboard
Resources
Citation
If you use Agent Lightning in research:
CODEBLOCK10
Agent Lightning ⚡
微软研究团队的智能体训练框架。几乎无需修改代码,即可将你的AI智能体变为可优化的猛兽。
核心特性
- - 🔌 通用兼容性:支持LangChain、OpenAI Agent SDK、AutoGen、CrewAI、Microsoft Agent Framework或原生Python OpenAI
- 🎯 选择性优化:在多智能体系统中优化一个或多个智能体
- 🧠 多种算法:强化学习(RL)、自动提示优化(APO)、监督微调(SFT)
- ⚡ 零代码修改:添加agl.emit_xxx()辅助函数或使用追踪器——你的智能体照常运行
安装
bash
pip install agentlightning
安装最新夜间构建版本:
bash
pip install --upgrade --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ --pre agentlightning
快速开始
1. 检测你的智能体
选项A:添加emit辅助函数(推荐)
python
import agentlightning as agl
在你的智能体工具调用中
response = agl.emit
toolcall(
model=model,
messages=messages,
tools=tools,
context={task: search}
)
选项B:使用追踪器(零代码修改)
python
from agentlightning import tracer
用追踪器包裹你的智能体
with tracer.trace(my-agent, input_data):
result = your
agent.run(userquery)
2. 创建训练配置
yaml
config.yaml
agent:
name: my-agent
type: openai # openai, langchain, autogen, crewai
training:
algorithm: grpo # grpo, apo, sft, rloo
episodes: 100
batch_size: 16
environment:
eval_tasks:
- math
- coding
- reasoning
3. 运行训练
bash
agent-lightning train --config config.yaml
算法
| 算法 | 使用场景 | 描述 |
|---|
| GRPO | 通用强化学习 | 组相对策略优化——稳定,适用于大多数智能体 |
| APO |
提示调优 | 自动提示优化——改进系统提示 |
|
SFT | 监督微调 | 使用偏好数据进行监督微调 |
|
RLOO | 长周期任务 | 适用于稀疏奖励任务的RLOO |
使用命令
agent-lightning train
使用配置的算法训练你的智能体。
agent-lightning eval
在基准任务上评估智能体。
agent-lightning export
导出训练好的模型/提示用于部署。
agent-lightning serve
为训练好的智能体启动服务端点。
示例:SQL智能体训练
查看完整示例:使用强化学习训练SQL智能体
python
from agentlightning import Agent, RLConfig, GRPOTrainer
1. 定义你的智能体
sql_agent = Agent(
name=sql-agent,
system_prompt=你是一名SQL专家...,
tools=[execute
sql, queryschema]
)
2. 配置强化学习训练
config = RLConfig(
algorithm=grpo,
episodes=500,
learning_rate=1e-4
)
3. 训练
trainer = GRPOTrainer(config=config)
trainer.train(sql
agent, evaltasks=[sql-generation])
与Clawdbot集成
环境变量
bash
训练所需
export OPENAI
APIKEY=sk-...
可选:用于远程存储
export AGL_STORAGE=s3://my-bucket/agent-lightning/
Python API
python
from agentlightning import LightningStore, GRPOTrainer
LightningStore保持任务、资源和追踪同步
store = LightningStore()
读取追踪、学习并更新提示
trainer = GRPOTrainer(store=store)
trainer.train(agent=my_agent)
监控训练
bash
启动仪表板
agent-lightning dashboard --port 8080
查看日志
tail -f ~/.agent-lightning/logs/training.log
最佳实践
- 1. 从小处着手:从10-50个回合开始以验证设置
- 定义清晰的奖励:设计符合目标的奖励函数
- 使用评估任务:始终在保留任务上进行评估
- 频繁检查点:每N个回合保存模型
- 监控收敛:在仪表板中观察损失曲线
资源
引用
如果在研究中使用Agent Lightning:
bibtex
@misc{luo2025agentlightningtrainai,
title={Agent Lightning: Train ANY AI Agents with Reinforcement Learning},
author={Xufang Luo and Yuge Zhang and Zhiyuan He and Zilong Wang and Siyun Zhao and Dongsheng Li and Luna K. Qiu and Yuqing Yang},
year={2025},
eprint={2508.03680},
archivePrefix={arXiv},
primaryClass={cs.AI}
}