Pura — cut your agent's LLM costs 40-60%

Your OpenClaw agent probably sends every request to GPT-4o. Most of those requests are simple enough for a model that costs 10x less. Pura sits between your agent and the LLM providers, scores each request's complexity, and routes it to the cheapest model that can handle it.

Cascade routing: tries Groq first ($0.00059/1K tokens). If the response looks weak (too short, hedging language, refusal), it escalates to Gemini, then OpenAI, then Anthropic. Your agent gets a good answer. You pay the minimum.

Drop-in OpenAI-compatible. One env var change.

What you get

- 4 providers (Groq, Gemini, OpenAI, Anthropic) behind one endpoint
Automatic complexity scoring — no manual model selection
Cascade routing — cheapest sufficient tier per request
Per-request cost headers so your agent tracks spend
Daily cost reports and income statements
Free tier: 5,000 requests, no credit card

Setup

Get an API key:

CODEBLOCK0

Save the returned key (starts with pura_):

CODEBLOCK1

Sending requests

Pura is OpenAI SDK-compatible. Change your base URL and you're done:

CODEBLOCK2

Or with curl:

CODEBLOCK3

Response headers

Every response includes routing metadata:

Header	Value
INLINECODE1	Which provider handled it (openai, anthropic, groq, gemini)
INLINECODE2

Cost reports

CODEBLOCK4

Lightning wallet

5,000 free requests included. After that, fund a Lightning wallet:

CODEBLOCK5

Model behavior

Set model="auto" or omit the model field entirely — both trigger cascade routing, where Pura scores your request's complexity and picks the cheapest provider that can handle it.

If you specify a model name directly (e.g. gpt-4o, claude-sonnet-4-20250514), Pura skips the cascade and routes straight to that provider.

Explicit model routing

Override auto-routing by specifying a model:

CODEBLOCK6

Supported models: gpt-4o, claude-sonnet-4-20250514, llama-3.3-70b-versatile, gemini-2.0-flash.

Routing hints

Influence provider selection without forcing a specific model:

CODEBLOCK7

How routing works

Pura scores each request's complexity based on message length, code blocks, reasoning triggers, and conversation depth. Simple tasks go to Groq or Gemini. Complex reasoning goes to Anthropic or OpenAI. Quality scores (derived from recent success rate and latency) weight the selection so underperforming providers get fewer requests until they recover.

Cascade routing adds a second layer: if the cheapest provider's response looks bad (too short, hedging, refusal, incomplete), Pura automatically retries at the next tier up. You only pay premium prices when the cheap answer was genuinely insufficient.

Typical cost savings

Request type	Direct GPT-4o cost	Pura cascade cost	Savings
Simple Q&A ("What is X?")	$0.005	$0.00059 (Groq)	88%
Code explanation

$0.005 | $0.00125 (Gemini) | 75% | | Complex reasoning | $0.005 | $0.005 (GPT-4o) | 0% | | Long-context analysis | $0.005 | $0.003 (Claude) | 40% |

In practice, 70-80% of agent requests are simple enough for the cheapest tier.

Security note

Your API keys for OpenAI, Anthropic, Groq, and Gemini stay on the Pura server. They never touch your agent or the OpenClaw runtime. If you are running an OpenClaw agent with plugins from untrusted sources, routing through Pura means those plugins cannot access your provider API keys.

Pura — 将你的智能体LLM成本降低40-60%

你的OpenClaw智能体可能将所有请求都发送给GPT-4o。其中大多数请求简单到只需成本低10倍的模型即可处理。Pura位于你的智能体和LLM提供商之间，对每个请求的复杂度进行评分，并将其路由到能够处理该请求的最便宜的模型。

级联路由：首先尝试Groq（$0.00059/1K tokens）。如果响应看起来较弱（过短、含糊语言、拒绝回答），则升级到Gemini，然后是OpenAI，最后是Anthropic。你的智能体获得优质答案。你支付最低费用。

即插即用，兼容OpenAI。只需更改一个环境变量。

你将获得

- 4个提供商（Groq、Gemini、OpenAI、Anthropic）统一在一个端点下
自动复杂度评分——无需手动选择模型
级联路由——每个请求使用最便宜的足够层级
每个请求的成本标头，让你的智能体追踪支出
每日成本报告和损益表
免费层：5,000个请求，无需信用卡

设置

获取API密钥：

bash
curl -s -X POST https://api.pura.xyz/api/keys \
-H Content-Type: application/json \
-d {label:my-agent} | python3 -m json.tool

保存返回的密钥（以pura_开头）：

bash
export PURAAPIKEY=purayourkey_here

发送请求

Pura兼容OpenAI SDK。更改你的基础URL即可：

python
from openai import OpenAI
import os

client = OpenAI(
base_url=https://api.pura.xyz/v1,
apikey=os.environ[PURAAPI_KEY]
)

response = client.chat.completions.create(
model=auto, # Pura选择最便宜且能胜任的模型
messages=[{role: user, content: Hello}]
)

或者使用curl：

bash
curl -s -X POST https://api.pura.xyz/v1/chat/completions \
-H Authorization: Bearer $PURAAPIKEY \
-H Content-Type: application/json \
-d {messages: [{role: user, content: Hello}], stream: false}

响应标头

每个响应都包含路由元数据：

标头	值
X-Pura-Provider	处理请求的提供商（openai, anthropic, groq, gemini）
X-Pura-Model

成本报告

bash

24小时支出明细

curl -s https://api.pura.xyz/api/report \
-H Authorization: Bearer $PURAAPIKEY | python3 -m json.tool

格式化损益表

curl -s https://api.pura.xyz/api/income \ -H Authorization: Bearer $PURAAPIKEY | python3 -m json.tool

闪电钱包

包含5,000个免费请求。之后，充值闪电钱包：

bash

获取充值发票

curl -s -X POST https://api.pura.xyz/api/wallet/fund \
-H Authorization: Bearer $PURAAPIKEY \
-H Content-Type: application/json \
-d {amount: 10000} | python3 -m json.tool

查询余额

curl -s https://api.pura.xyz/api/wallet/balance \ -H Authorization: Bearer $PURAAPIKEY | python3 -m json.tool

模型行为

设置model=auto或完全省略模型字段——两者都会触发级联路由，Pura会对你的请求复杂度进行评分，并选择最便宜且能胜任的提供商。

如果你直接指定模型名称（例如gpt-4o、claude-sonnet-4-20250514），Pura会跳过级联路由，直接路由到该提供商。

显式模型路由

通过指定模型覆盖自动路由：

bash

强制使用GPT-4o

curl -s -X POST https://api.pura.xyz/v1/chat/completions \
-H Authorization: Bearer $PURAAPIKEY \
-H Content-Type: application/json \
-d {model: gpt-4o, messages: [{role:user,content:Hello}], stream: false}

支持的模型：gpt-4o、claude-sonnet-4-20250514、llama-3.3-70b-versatile、gemini-2.0-flash。

路由提示

在不强制指定具体模型的情况下影响提供商选择：

json
{
messages: [{role: user, content: ...}],
routing: {
quality: high,
prefer: anthropic,
maxCost: 0.01,
maxLatency: 5000
}
}

路由工作原理

Pura根据消息长度、代码块、推理触发词和对话深度对每个请求的复杂度进行评分。简单任务分配给Groq或Gemini。复杂推理分配给Anthropic或OpenAI。质量评分（基于近期成功率和延迟）会加权选择，表现不佳的提供商将获得更少的请求，直到其恢复。

级联路由增加了第二层：如果最便宜提供商的响应看起来不佳（过短、含糊、拒绝、不完整），Pura会自动在下一层级重试。只有当廉价答案确实不足时，你才需要支付高级价格。

典型成本节省

请求类型	直接GPT-4o成本	Pura级联成本	节省
简单问答（什么是X？）	$0.005	$0.00059（Groq）	88%
代码解释

$0.005 | $0.00125（Gemini） | 75% | | 复杂推理 | $0.005 | $0.005（GPT-4o） | 0% | | 长上下文分析 | $0.005 | $0.003（Claude） | 40% |

实际应用中，70-80%的智能体请求简单到可以使用最便宜的层级。

安全说明

你用于OpenAI、Anthropic、Groq和Gemini的API密钥保留在Pura服务器上。它们永远不会触及你的智能体或OpenClaw运行时。如果你正在运行带有来自不可信来源插件的OpenClaw智能体，通过Pura路由意味着这些插件无法访问你的提供商API密钥。

链接

- 网关：
网站：
比较级联路由：
文档：
GitHub：

pura降低LLM成本40-60%

pura

Pura — cut your agent's LLM costs 40-60%

What you get

Setup

Sending requests

Response headers

Cost reports

Lightning wallet

Model behavior

Explicit model routing

Routing hints

How routing works

Typical cost savings

Security note

Links

Pura — 将你的智能体LLM成本降低40-60%

你将获得

设置

发送请求

响应标头

成本报告

24小时支出明细

格式化损益表

闪电钱包

获取充值发票

查询余额

模型行为

显式模型路由

强制使用GPT-4o

路由提示

路由工作原理

典型成本节省

安全说明

链接

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement