Pura — cut your agent's LLM costs 40-60%
Your OpenClaw agent probably sends every request to GPT-4o. Most of those requests are simple enough for a model that costs 10x less. Pura sits between your agent and the LLM providers, scores each request's complexity, and routes it to the cheapest model that can handle it.
Cascade routing: tries Groq first ($0.00059/1K tokens). If the response looks weak (too short, hedging language, refusal), it escalates to Gemini, then OpenAI, then Anthropic. Your agent gets a good answer. You pay the minimum.
Drop-in OpenAI-compatible. One env var change.
What you get
- - 4 providers (Groq, Gemini, OpenAI, Anthropic) behind one endpoint
- Automatic complexity scoring — no manual model selection
- Cascade routing — cheapest sufficient tier per request
- Per-request cost headers so your agent tracks spend
- Daily cost reports and income statements
- Free tier: 5,000 requests, no credit card
Setup
Get an API key:
CODEBLOCK0
Save the returned key (starts with pura_):
CODEBLOCK1
Sending requests
Pura is OpenAI SDK-compatible. Change your base URL and you're done:
CODEBLOCK2
Or with curl:
CODEBLOCK3
Response headers
Every response includes routing metadata:
| Header | Value |
|---|
| INLINECODE1 | Which provider handled it (openai, anthropic, groq, gemini) |
| INLINECODE2 |
Specific model used |
|
X-Pura-Cost | Estimated cost in USD |
|
X-Pura-Budget-Remaining | Remaining daily budget in USD |
|
X-Pura-Tier | Complexity tier (cheap, mid, premium) |
|
X-Pura-Quality | Provider quality score (0-1) |
Cost reports
CODEBLOCK4
Lightning wallet
5,000 free requests included. After that, fund a Lightning wallet:
CODEBLOCK5
Model behavior
Set model="auto" or omit the model field entirely — both trigger cascade routing, where Pura scores your request's complexity and picks the cheapest provider that can handle it.
If you specify a model name directly (e.g. gpt-4o, claude-sonnet-4-20250514), Pura skips the cascade and routes straight to that provider.
Explicit model routing
Override auto-routing by specifying a model:
CODEBLOCK6
Supported models: gpt-4o, claude-sonnet-4-20250514, llama-3.3-70b-versatile, gemini-2.0-flash.
Routing hints
Influence provider selection without forcing a specific model:
CODEBLOCK7
How routing works
Pura scores each request's complexity based on message length, code blocks, reasoning triggers, and conversation depth. Simple tasks go to Groq or Gemini. Complex reasoning goes to Anthropic or OpenAI. Quality scores (derived from recent success rate and latency) weight the selection so underperforming providers get fewer requests until they recover.
Cascade routing adds a second layer: if the cheapest provider's response looks bad (too short, hedging, refusal, incomplete), Pura automatically retries at the next tier up. You only pay premium prices when the cheap answer was genuinely insufficient.
Typical cost savings
| Request type | Direct GPT-4o cost | Pura cascade cost | Savings |
|---|
| Simple Q&A ("What is X?") | $0.005 | $0.00059 (Groq) | 88% |
| Code explanation |
$0.005 | $0.00125 (Gemini) | 75% |
| Complex reasoning | $0.005 | $0.005 (GPT-4o) | 0% |
| Long-context analysis | $0.005 | $0.003 (Claude) | 40% |
In practice, 70-80% of agent requests are simple enough for the cheapest tier.
Security note
Your API keys for OpenAI, Anthropic, Groq, and Gemini stay on the Pura server. They never touch your agent or the OpenClaw runtime. If you are running an OpenClaw agent with plugins from untrusted sources, routing through Pura means those plugins cannot access your provider API keys.
Links
- - Gateway:
- Website:
- Compare cascade routing:
- Docs:
- GitHub:
Pura — 将你的智能体LLM成本降低40-60%
你的OpenClaw智能体可能将所有请求都发送给GPT-4o。其中大多数请求简单到只需成本低10倍的模型即可处理。Pura位于你的智能体和LLM提供商之间,对每个请求的复杂度进行评分,并将其路由到能够处理该请求的最便宜的模型。
级联路由:首先尝试Groq($0.00059/1K tokens)。如果响应看起来较弱(过短、含糊语言、拒绝回答),则升级到Gemini,然后是OpenAI,最后是Anthropic。你的智能体获得优质答案。你支付最低费用。
即插即用,兼容OpenAI。只需更改一个环境变量。
你将获得
- - 4个提供商(Groq、Gemini、OpenAI、Anthropic)统一在一个端点下
- 自动复杂度评分——无需手动选择模型
- 级联路由——每个请求使用最便宜的足够层级
- 每个请求的成本标头,让你的智能体追踪支出
- 每日成本报告和损益表
- 免费层:5,000个请求,无需信用卡
设置
获取API密钥:
bash
curl -s -X POST https://api.pura.xyz/api/keys \
-H Content-Type: application/json \
-d {label:my-agent} | python3 -m json.tool
保存返回的密钥(以pura_开头):
bash
export PURAAPIKEY=purayourkey_here
发送请求
Pura兼容OpenAI SDK。更改你的基础URL即可:
python
from openai import OpenAI
import os
client = OpenAI(
base_url=https://api.pura.xyz/v1,
apikey=os.environ[PURAAPI_KEY]
)
response = client.chat.completions.create(
model=auto, # Pura选择最便宜且能胜任的模型
messages=[{role: user, content: Hello}]
)
或者使用curl:
bash
curl -s -X POST https://api.pura.xyz/v1/chat/completions \
-H Authorization: Bearer $PURAAPIKEY \
-H Content-Type: application/json \
-d {messages: [{role: user, content: Hello}], stream: false}
响应标头
每个响应都包含路由元数据:
| 标头 | 值 |
|---|
| X-Pura-Provider | 处理请求的提供商(openai, anthropic, groq, gemini) |
| X-Pura-Model |
使用的具体模型 |
| X-Pura-Cost | 预估成本(美元) |
| X-Pura-Budget-Remaining | 剩余每日预算(美元) |
| X-Pura-Tier | 复杂度层级(cheap, mid, premium) |
| X-Pura-Quality | 提供商质量评分(0-1) |
成本报告
bash
24小时支出明细
curl -s https://api.pura.xyz/api/report \
-H Authorization: Bearer $PURA
APIKEY | python3 -m json.tool
格式化损益表
curl -s https://api.pura.xyz/api/income \
-H Authorization: Bearer $PURA
APIKEY | python3 -m json.tool
闪电钱包
包含5,000个免费请求。之后,充值闪电钱包:
bash
获取充值发票
curl -s -X POST https://api.pura.xyz/api/wallet/fund \
-H Authorization: Bearer $PURA
APIKEY \
-H Content-Type: application/json \
-d {amount: 10000} | python3 -m json.tool
查询余额
curl -s https://api.pura.xyz/api/wallet/balance \
-H Authorization: Bearer $PURA
APIKEY | python3 -m json.tool
模型行为
设置model=auto或完全省略模型字段——两者都会触发级联路由,Pura会对你的请求复杂度进行评分,并选择最便宜且能胜任的提供商。
如果你直接指定模型名称(例如gpt-4o、claude-sonnet-4-20250514),Pura会跳过级联路由,直接路由到该提供商。
显式模型路由
通过指定模型覆盖自动路由:
bash
强制使用GPT-4o
curl -s -X POST https://api.pura.xyz/v1/chat/completions \
-H Authorization: Bearer $PURA
APIKEY \
-H Content-Type: application/json \
-d {model: gpt-4o, messages: [{role:user,content:Hello}], stream: false}
支持的模型:gpt-4o、claude-sonnet-4-20250514、llama-3.3-70b-versatile、gemini-2.0-flash。
路由提示
在不强制指定具体模型的情况下影响提供商选择:
json
{
messages: [{role: user, content: ...}],
routing: {
quality: high,
prefer: anthropic,
maxCost: 0.01,
maxLatency: 5000
}
}
路由工作原理
Pura根据消息长度、代码块、推理触发词和对话深度对每个请求的复杂度进行评分。简单任务分配给Groq或Gemini。复杂推理分配给Anthropic或OpenAI。质量评分(基于近期成功率和延迟)会加权选择,表现不佳的提供商将获得更少的请求,直到其恢复。
级联路由增加了第二层:如果最便宜提供商的响应看起来不佳(过短、含糊、拒绝、不完整),Pura会自动在下一层级重试。只有当廉价答案确实不足时,你才需要支付高级价格。
典型成本节省
| 请求类型 | 直接GPT-4o成本 | Pura级联成本 | 节省 |
|---|
| 简单问答(什么是X?) | $0.005 | $0.00059(Groq) | 88% |
| 代码解释 |
$0.005 | $0.00125(Gemini) | 75% |
| 复杂推理 | $0.005 | $0.005(GPT-4o) | 0% |
| 长上下文分析 | $0.005 | $0.003(Claude) | 40% |
实际应用中,70-80%的智能体请求简单到可以使用最便宜的层级。
安全说明
你用于OpenAI、Anthropic、Groq和Gemini的API密钥保留在Pura服务器上。它们永远不会触及你的智能体或OpenClaw运行时。如果你正在运行带有来自不可信来源插件的OpenClaw智能体,通过Pura路由意味着这些插件无法访问你的提供商API密钥。
链接
- - 网关:
- 网站:
- 比较级联路由:
- 文档:
- GitHub: