🦞 Gradient AI — Serverless Inference
⚠️ This is an unofficial community skill, not maintained by DigitalOcean. Use at your own risk.
"Why manage GPUs when the ocean provides?" — ancient lobster proverb
Use DigitalOcean's Gradient Serverless Inference to call large language models without managing infrastructure. The API is OpenAI-compatible, so standard SDKs and patterns work — just point at https://inference.do-ai.run/v1 and swim.
Authentication
All requests need a Model Access Key in the Authorization: Bearer header.
CODEBLOCK0
Where to get one: DigitalOcean Console → Gradient AI → Model Access Keys → Create Key.
📖 Full auth docs
Tools
🔍 List Available Models
Window-shop for LLMs before you swipe the card.
CODEBLOCK1
Use this before hardcoding model IDs — models are added and deprecated over time.
Direct API call:
CODEBLOCK2
📖 Models reference
💬 Chat Completions
The classic. Send structured messages (system/user/assistant roles), get a response. OpenAI-compatible, so you probably already know how this works.
CODEBLOCK3
Direct API call:
CODEBLOCK4
📖 Chat Completions docs
⚡ Responses API (Recommended)
DigitalOcean's recommended endpoint for new integrations. Simpler request format and supports prompt caching — a.k.a. "stop paying twice for the same context."
CODEBLOCK5
Direct API call:
CODEBLOCK6
When to use which:
| Chat Completions | Responses API |
|---|
| Request format | Array of messages with roles | Single input string |
| Prompt caching |
❌ | ✅ via
store: true |
|
Multi-step tool use | Manual | Built-in |
|
Best for | Structured conversations | Simple queries, cost savings |
📖 Responses API docs
🖼️ Generate Images
Turn text prompts into images. Because sometimes a chart isn't enough.
CODEBLOCK7
Direct API call:
CODEBLOCK8
📖 Image generation docs
🧠 Model Selection Guide
Not all models are created equal. Choose wisely, young crustacean:
| Model | Best For | Speed | Quality | Context |
|---|
| INLINECODE4 | Complex reasoning, analysis, writing | Medium | ★★★★★ | 128K |
| INLINECODE5 |
General tasks, instruction following | Fast | ★★★★ | 128K |
|
deepseek-r1-distill-llama-70b | Math, code, step-by-step reasoning | Slow | ★★★★★ | 128K |
|
qwen3-32b | Quick triage, short tasks | Fastest | ★★★ | 32K |
🦞 Pro tip: Cost-aware routing. Use a fast model (e.g., qwen3-32b) to score or triage, then only escalate to a strong model (e.g., openai-gpt-oss-120b) when depth is needed. Enable prompt caching for repeated context.
Always run python3 gradient_models.py to check what's currently available — the menu changes.
📖 Available models
💰 Model Pricing Lookup
Check what models cost before you rack up a bill. Scrapes the official DigitalOcean pricing page — no API key needed.
CODEBLOCK9
How it works:
- - Fetches live pricing from DigitalOcean's docs (public page, no auth)
- Caches results for 24 hours in INLINECODE11
- Falls back to a bundled snapshot if the live fetch fails
🦞 Pro tip: Run python3 gradient_pricing.py --model "gpt-oss" before choosing a model to see the cost difference between gpt-oss-120b ($0.10/$0.70) and gpt-oss-20b ($0.05/$0.45) per 1M tokens.
📖 Pricing docs
CLI Reference
All scripts accept --json for machine-readable output.
CODEBLOCK10
External Endpoints
| Endpoint | Purpose |
|---|
| INLINECODE16 | List available models |
| INLINECODE17 |
Chat Completions API |
|
https://inference.do-ai.run/v1/responses | Responses API (recommended) |
|
https://inference.do-ai.run/v1/images/generations | Image generation |
|
https://docs.digitalocean.com/.../pricing/ | Pricing page (scraped, public) |
Security & Privacy
- - All requests go to
inference.do-ai.run — DigitalOcean's own endpoint - Your
GRADIENT_API_KEY is sent as a Bearer token in the Authorization header - No other credentials or local data leave the machine
- Model Access Keys are scoped to inference only — they can't manage your DO account
- Prompt caching entries are scoped to your account and automatically expire
Trust Statement
By using this skill, prompts and data are sent to DigitalOcean's Gradient Inference API.
Only install if you trust DigitalOcean with the content you send to their LLMs.
Important Notes
- - Run
python3 gradient_models.py before assuming a model exists — they rotate - All scripts exit with code 1 and print errors to stderr on failure
🦞 Gradient AI — 无服务器推理
⚠️ 这是一个非官方社区技能,并非由 DigitalOcean 维护。使用风险自负。
当海洋已经提供时,何必还要管理 GPU?——古代龙虾谚语
使用 DigitalOcean 的 Gradient 无服务器推理 调用大型语言模型,无需管理基础设施。该 API 兼容 OpenAI,因此标准 SDK 和模式均可使用——只需指向 https://inference.do-ai.run/v1 即可畅游。
身份验证
所有请求都需要在 Authorization: Bearer 标头中包含一个模型访问密钥。
bash
export GRADIENTAPIKEY=your-model-access-key
获取方式: DigitalOcean 控制台 → Gradient AI → 模型访问密钥 → 创建密钥。
📖 完整身份验证文档
工具
🔍 列出可用模型
在刷卡前先浏览一下 LLM 的橱窗。
bash
python3 gradient_models.py # 美观表格
python3 gradient_models.py --json # 机器可读
python3 gradient_models.py --filter llama # 按名称搜索
在硬编码模型 ID 之前使用此命令——模型会随时间添加和弃用。
直接 API 调用:
bash
curl -s https://inference.do-ai.run/v1/models \
-H Authorization: Bearer $GRADIENTAPIKEY | python3 -m json.tool
📖 模型参考
💬 聊天补全
经典功能。发送结构化消息(系统/用户/助手角色),获取响应。兼容 OpenAI,因此你可能已经知道如何使用。
bash
python3 gradient_chat.py \
--model openai-gpt-oss-120b \
--system 你是一个有用的助手。 \
--prompt 用一段话解释无服务器推理。
不同模型
python3 gradient_chat.py \
--model llama3.3-70b-instruct \
--prompt 写一首关于云计算的诗。
直接 API 调用:
bash
curl -s https://inference.do-ai.run/v1/chat/completions \
-H Authorization: Bearer $GRADIENTAPIKEY \
-H Content-Type: application/json \
-d {
model: openai-gpt-oss-120b,
messages: [
{role: system, content: 你是一个有用的助手。},
{role: user, content: 你好!}
],
temperature: 0.7,
max_tokens: 1000
}
📖 聊天补全文档
⚡ 响应 API(推荐)
DigitalOcean 为新集成推荐的端点。请求格式更简单,并支持提示缓存——也就是不必为相同上下文支付两次费用。
bash
基本用法
python3 gradient_chat.py \
--model openai-gpt-oss-120b \
--prompt 总结这份财报。 \
--responses-api
使用提示缓存(节省后续查询成本)
python3 gradient_chat.py \
--model openai-gpt-oss-120b \
--prompt 现在将其与上一季度进行比较。 \
--responses-api --cache
直接 API 调用:
bash
curl -s https://inference.do-ai.run/v1/responses \
-H Authorization: Bearer $GRADIENTAPIKEY \
-H Content-Type: application/json \
-d {
model: openai-gpt-oss-120b,
input: 解释提示缓存。,
store: true
}
何时使用哪种:
| 聊天补全 | 响应 API |
|---|
| 请求格式 | 带角色的消息数组 | 单个 input 字符串 |
| 提示缓存 |
❌ | ✅ 通过 store: true |
|
多步骤工具使用 | 手动 | 内置 |
|
最佳用途 | 结构化对话 | 简单查询,节省成本 |
📖 响应 API 文档
🖼️ 生成图像
将文本提示转换为图像。因为有时图表还不够。
bash
python3 gradient_image.py --prompt 一只在华尔街交易股票的大龙虾
python3 gradient_image.py --prompt 纽约证券交易所的日落 --output sunset.png
python3 gradient_image.py --prompt 金融科技标志 --json
直接 API 调用:
bash
curl -s https://inference.do-ai.run/v1/images/generations \
-H Authorization: Bearer $GRADIENTAPIKEY \
-H Content-Type: application/json \
-d {
model: dall-e-3,
prompt: 一只分析蜡烛图的大龙虾,
n: 1
}
📖 图像生成文档
🧠 模型选择指南
并非所有模型都生而平等。明智选择,年轻的甲壳类动物:
| 模型 | 最佳用途 | 速度 | 质量 | 上下文 |
|---|
| openai-gpt-oss-120b | 复杂推理、分析、写作 | 中等 | ★★★★★ | 128K |
| llama3.3-70b-instruct |
通用任务、指令遵循 | 快速 | ★★★★ | 128K |
| deepseek-r1-distill-llama-70b | 数学、代码、逐步推理 | 慢速 | ★★★★★ | 128K |
| qwen3-32b | 快速分类、短任务 | 最快 | ★★★ | 32K |
🦞 专业提示:成本感知路由。 使用快速模型(例如 qwen3-32b)进行评分或分类,仅在需要深度时升级到强模型(例如 openai-gpt-oss-120b)。对重复上下文启用提示缓存。
始终运行 python3 gradient_models.py 检查当前可用的模型——菜单会变化。
📖 可用模型
💰 模型定价查询
在产生账单之前检查模型的成本。抓取官方 DigitalOcean 定价页面——无需 API 密钥。
bash
python3 gradient_pricing.py # 美观表格
python3 gradient_pricing.py --json # 机器可读
python3 gradient_pricing.py --model llama # 按模型名称筛选
python3 gradient_pricing.py --no-cache # 跳过缓存,实时获取
工作原理:
- - 从 DigitalOcean 文档(公共页面,无需身份验证)获取实时定价
- 将结果缓存 24 小时到 /tmp/gradientpricingcache.json
- 如果实时获取失败,则回退到捆绑的快照
🦞 专业提示: 在选择模型之前运行 python3 gradient_pricing.py --model gpt-oss,查看 gpt-oss-120b($0.10/$0.70)和 gpt-oss-20b($0.05/$0.45)每 100 万个 token 的成本差异。
📖 定价文档
CLI 参考
所有脚本都接受 --json 参数以输出机器可读格式。
gradient_models.py [--json] [--filter QUERY]
gradient_chat.py --prompt TEXT [--model ID] [--system TEXT]
[--responses-api] [--cache] [--temperature F]
[--max-tokens N] [--json]
gradient_image.py --prompt TEXT [--model ID] [