LLM Router
An intelligent proxy that classifies incoming requests by complexity and routes them to appropriate LLM models. Use cheaper/faster models for simple tasks and reserve expensive models for complex ones.
Works with OpenClaw to reduce token usage and API costs by routing simple requests to smaller models.
Status: Tested with Anthropic, OpenAI, Google Gemini, Kimi/Moonshot, and Ollama.
Quick Start
Prerequisites
- 1. Python 3.10+ with pip
- Ollama (optional - only if using local classification)
- Anthropic API key or Claude Code OAuth token (or other provider key)
Setup
CODEBLOCK0
Verify Installation
CODEBLOCK1
Start the Server
CODEBLOCK2
Options:
- -
--port PORT - Port to listen on (default: 4001) - INLINECODE1 - Host to bind (default: 127.0.0.1)
- INLINECODE2 - Config file path (default: config.yaml)
- INLINECODE3 - Enable verbose logging
- INLINECODE4 - Enable OpenClaw compatibility (rewrites model name in system prompt)
Configuration
Edit config.yaml to customize:
Model Routing
CODEBLOCK3
Note: Reasoning models are auto-detected and use correct API params.
Classifier
Three options for classifying request complexity:
Local (default) - Free, requires Ollama:
CODEBLOCK4
Anthropic - Uses Haiku, fast and cheap:
CODEBLOCK5
OpenAI - Uses GPT-4o-mini:
CODEBLOCK6
Google - Uses Gemini:
CODEBLOCK7
Kimi - Uses Moonshot:
CODEBLOCK8
Use remote (anthropic/openai/google/kimi) if your machine can't run local models.
Supported Providers
- -
anthropic:claude-* - Anthropic Claude models (tested) - INLINECODE7 ,
openai:o1-*, openai:o3-* - OpenAI models (tested) - INLINECODE10 - Google Gemini models (tested)
- INLINECODE11 ,
kimi:moonshot-* - Kimi/Moonshot models (tested) - INLINECODE13 - Local Ollama models (tested)
Complexity Levels
| Level | Use Case | Default Model |
|---|
| super_easy | Greetings, acknowledgments | Haiku |
| easy |
Simple Q&A, reminders | Haiku |
| medium | Coding, emails, research | Sonnet |
| hard | Complex reasoning, debugging | Opus |
| super_hard | System architecture, proofs | Opus |
Customizing Classification
Edit ROUTES.md to tune how messages are classified. The classifier reads the table in this file to determine complexity levels.
API Usage
The router exposes an OpenAI-compatible API:
CODEBLOCK9
Testing Classification
CODEBLOCK10
Running as macOS Service
Create ~/Library/LaunchAgents/com.llmrouter.plist:
CODEBLOCK11
Important: Replace /path/to/llmrouter with your actual install path. Must use the venv python, not system python.
CODEBLOCK12
OpenClaw Configuration
Add the router as a provider in ~/.openclaw/openclaw.json:
CODEBLOCK13
Note: Cost is set to 0 because actual costs depend on which model the router selects. The router logs which model handled each request.
Set as Default Model (Optional)
To use the router for all agents by default, add:
CODEBLOCK14
Using with OAuth Tokens
If your config.yaml uses an Anthropic OAuth token from OpenClaw's ~/.openclaw/auth-profiles.json, the router automatically handles Claude Code identity headers.
OpenClaw Compatibility Mode (Required)
If using with OpenClaw, you MUST start the server with --openclaw:
CODEBLOCK15
This flag enables compatibility features required for OpenClaw:
- - Rewrites model names in responses so OpenClaw shows the actual model being used
- Handles tool name and ID remapping for proper tool call routing
Without this flag, you may encounter errors when using the router with OpenClaw.
Common Tasks
- - Check server status: INLINECODE21
- View current config: INLINECODE22
- Test a classification: INLINECODE23
- Run classification tests: INLINECODE24
- Restart server: Stop and run
python server.py again - View logs (if running as service): INLINECODE26
Troubleshooting
"externally-managed-environment" error
Python 3.11+ requires virtual environments. Create one:
CODEBLOCK16
"Connection refused" on port 4001
Server isn't running. Start it:
CODEBLOCK17
Classification returns wrong complexity
Edit
ROUTES.md to tune classification rules. The classifier reads this file to determine complexity levels.
Ollama errors / "model not found"
Ensure Ollama is running and the model is pulled:
CODEBLOCK18
OAuth token not working
Ensure your token in
config.yaml starts with
sk-ant-oat. The router auto-detects OAuth tokens and adds required identity headers.
LaunchAgent not starting
Check logs and ensure paths are absolute:
CODEBLOCK19
LLM Router
一个智能代理,能够按复杂度对传入请求进行分类,并将其路由到合适的LLM模型。对于简单任务使用更便宜/更快的模型,将昂贵模型保留给复杂任务。
与 OpenClaw 配合使用,通过将简单请求路由到较小模型来减少Token用量和API成本。
状态: 已在Anthropic、OpenAI、Google Gemini、Kimi/Moonshot和Ollama上测试通过。
快速开始
前置条件
- 1. Python 3.10+ 及 pip
- Ollama(可选 - 仅在使用本地分类时需要)
- Anthropic API密钥 或 Claude Code OAuth令牌(或其他提供商密钥)
安装
bash
如果尚未克隆,请先克隆
git clone https://github.com/alexrudloff/llmrouter.git
cd llmrouter
创建虚拟环境(现代Python必需)
python3 -m venv venv
source venv/bin/activate
安装依赖
pip install -r requirements.txt
拉取分类模型(如果使用本地分类)
ollama pull qwen2.5:3b
复制并自定义配置
cp config.yaml.example config.yaml
编辑config.yaml,填入你的API密钥和模型偏好
验证安装
bash
启动服务器
source venv/bin/activate
python server.py
在另一个终端中,测试健康检查端点
curl http://localhost:4001/health
应返回:{status: ok, ...}
启动服务器
bash
python server.py
选项:
- - --port PORT - 监听端口(默认:4001)
- --host HOST - 绑定主机(默认:127.0.0.1)
- --config PATH - 配置文件路径(默认:config.yaml)
- --log - 启用详细日志
- --openclaw - 启用OpenClaw兼容性(重写系统提示中的模型名称)
配置
编辑 config.yaml 进行自定义:
模型路由
yaml
Anthropic路由
models:
super_easy: anthropic:claude-haiku-4-5-20251001
easy: anthropic:claude-haiku-4-5-20251001
medium: anthropic:claude-sonnet-4-20250514
hard: anthropic:claude-opus-4-20250514
super_hard: anthropic:claude-opus-4-20250514
OpenAI路由
models:
super_easy: openai:gpt-4o-mini
easy: openai:gpt-4o-mini
medium: openai:gpt-4o
hard: openai:o3-mini
super_hard: openai:o3
Google Gemini路由
models:
super_easy: google:gemini-2.0-flash
easy: google:gemini-2.0-flash
medium: google:gemini-2.0-flash
hard: google:gemini-2.0-flash
super_hard: google:gemini-2.0-flash
注意: 推理模型会被自动检测并使用正确的API参数。
分类器
三种请求复杂度分类选项:
本地(默认) - 免费,需要Ollama:
yaml
classifier:
provider: local
model: qwen2.5:3b
Anthropic - 使用Haiku,快速且便宜:
yaml
classifier:
provider: anthropic
model: claude-haiku-4-5-20251001
OpenAI - 使用GPT-4o-mini:
yaml
classifier:
provider: openai
model: gpt-4o-mini
Google - 使用Gemini:
yaml
classifier:
provider: google
model: gemini-2.0-flash
Kimi - 使用Moonshot:
yaml
classifier:
provider: kimi
model: moonshot-v1-8k
如果您的机器无法运行本地模型,请使用远程(anthropic/openai/google/kimi)分类器。
支持的提供商
- - anthropic:claude- - Anthropic Claude模型(已测试)
- openai:gpt-、openai:o1-、openai:o3- - OpenAI模型(已测试)
- google:gemini- - Google Gemini模型(已测试)
- kimi:kimi-k2.5、kimi:moonshot- - Kimi/Moonshot模型(已测试)
- local:model-name - 本地Ollama模型(已测试)
复杂度级别
| 级别 | 使用场景 | 默认模型 |
|---|
| super_easy | 问候、确认 | Haiku |
| easy |
简单问答、提醒 | Haiku |
| medium | 编程、邮件、研究 | Sonnet |
| hard | 复杂推理、调试 | Opus |
| super_hard | 系统架构、证明 | Opus |
自定义分类
编辑 ROUTES.md 来调整消息的分类方式。分类器读取此文件中的表格来确定复杂度级别。
API使用
路由器暴露了一个兼容OpenAI的API:
bash
curl http://localhost:4001/v1/chat/completions \
-H Authorization: Bearer $ANTHROPICAPIKEY \
-H Content-Type: application/json \
-d {
model: llm-router,
messages: [{role: user, content: Hello!}]
}
测试分类
bash
python classifier.py Write a Python sort function
输出:medium
python classifier.py --test
运行测试套件
作为macOS服务运行
创建 ~/Library/LaunchAgents/com.llmrouter.plist:
xml
Label
com.llmrouter
ProgramArguments
/path/to/llmrouter/venv/bin/python
/path/to/llmrouter/server.py
--openclaw
RunAtLoad
KeepAlive
WorkingDirectory
/path/to/llmrouter
StandardOutPath
/path/to/llmrouter/logs/stdout.log
StandardErrorPath
/path/to/llmrouter/logs/stderr.log
重要: 将 /path/to/llmrouter 替换为您的实际安装路径。必须使用venv中的python,而不是系统python。
bash
创建日志目录
mkdir -p ~/path/to/llmrouter/logs
加载服务
launchctl load ~/Library/LaunchAgents/com.llmrouter.plist
验证服务正在运行
curl http://localhost:4001/health
停止/重启
launchctl unload ~/Library/LaunchAgents/com.llmrouter.plist
launchctl load ~/Library/LaunchAgents/com.llmrouter.plist
OpenClaw配置
在 ~/.openclaw/openclaw.json 中将路由器添加为提供商:
json
{
models: {
providers: {
localrouter: {
baseUrl: http://localhost:4001/v1,
apiKey: via-router,
api: openai-completions,
models: [
{
id: llm-router,
name: LLM Router (按复杂度自动路由),
reasoning: false,
input: [text, image],
cost: {
input: 0,
output: 0,
cacheRead: 0,
cacheWrite: 0
},
contextWindow: 200000,