🦞 Gradient AI — Serverless Inference

⚠️ This is an unofficial community skill, not maintained by DigitalOcean. Use at your own risk.

"Why manage GPUs when the ocean provides?" — ancient lobster proverb

Use DigitalOcean's Gradient Serverless Inference to call large language models without managing infrastructure. The API is OpenAI-compatible, so standard SDKs and patterns work — just point at https://inference.do-ai.run/v1 and swim.

Authentication

All requests need a Model Access Key in the Authorization: Bearer header.

CODEBLOCK0

Where to get one: DigitalOcean Console → Gradient AI → Model Access Keys → Create Key.

📖 Full auth docs

Tools

🔍 List Available Models

Window-shop for LLMs before you swipe the card.

CODEBLOCK1

Use this before hardcoding model IDs — models are added and deprecated over time.

Direct API call:
CODEBLOCK2

📖 Models reference

💬 Chat Completions

The classic. Send structured messages (system/user/assistant roles), get a response. OpenAI-compatible, so you probably already know how this works.

CODEBLOCK3

Direct API call:
CODEBLOCK4

📖 Chat Completions docs

⚡ Responses API (Recommended)

DigitalOcean's recommended endpoint for new integrations. Simpler request format and supports prompt caching — a.k.a. "stop paying twice for the same context."

CODEBLOCK5

Direct API call:
CODEBLOCK6

When to use which:

	Chat Completions	Responses API
Request format	Array of messages with roles	Single `input` string
Prompt caching

📖 Responses API docs

🖼️ Generate Images

Turn text prompts into images. Because sometimes a chart isn't enough.

CODEBLOCK7

Direct API call:
CODEBLOCK8

📖 Image generation docs

🧠 Model Selection Guide

Not all models are created equal. Choose wisely, young crustacean:

Model	Best For	Speed	Quality	Context
INLINECODE4	Complex reasoning, analysis, writing	Medium	★★★★★	128K
INLINECODE5

General tasks, instruction following | Fast | ★★★★ | 128K |
| deepseek-r1-distill-llama-70b | Math, code, step-by-step reasoning | Slow | ★★★★★ | 128K |
| qwen3-32b | Quick triage, short tasks | Fastest | ★★★ | 32K |

🦞 Pro tip: Cost-aware routing. Use a fast model (e.g., qwen3-32b) to score or triage, then only escalate to a strong model (e.g., openai-gpt-oss-120b) when depth is needed. Enable prompt caching for repeated context.

Always run python3 gradient_models.py to check what's currently available — the menu changes.

📖 Available models

💰 Model Pricing Lookup

Check what models cost before you rack up a bill. Scrapes the official DigitalOcean pricing page — no API key needed.

CODEBLOCK9

How it works:

- Fetches live pricing from DigitalOcean's docs (public page, no auth)
Caches results for 24 hours in INLINECODE11
Falls back to a bundled snapshot if the live fetch fails

🦞 Pro tip: Run python3 gradient_pricing.py --model "gpt-oss" before choosing a model to see the cost difference between gpt-oss-120b ($0.10/$0.70) and gpt-oss-20b ($0.05/$0.45) per 1M tokens.

📖 Pricing docs

CLI Reference

All scripts accept --json for machine-readable output.

CODEBLOCK10

External Endpoints

Endpoint	Purpose
INLINECODE16	List available models
INLINECODE17

Security & Privacy

- All requests go to inference.do-ai.run — DigitalOcean's own endpoint
Your GRADIENT_API_KEY is sent as a Bearer token in the Authorization header
No other credentials or local data leave the machine
Model Access Keys are scoped to inference only — they can't manage your DO account
Prompt caching entries are scoped to your account and automatically expire

Trust Statement

By using this skill, prompts and data are sent to DigitalOcean's Gradient Inference API.
Only install if you trust DigitalOcean with the content you send to their LLMs.

Important Notes

- Run python3 gradient_models.py before assuming a model exists — they rotate
All scripts exit with code 1 and print errors to stderr on failure

🦞 Gradient AI — 无服务器推理

⚠️ 这是一个非官方社区技能，并非由 DigitalOcean 维护。使用风险自负。

当海洋已经提供时，何必还要管理 GPU？——古代龙虾谚语

使用 DigitalOcean 的 Gradient 无服务器推理调用大型语言模型，无需管理基础设施。该 API 兼容 OpenAI，因此标准 SDK 和模式均可使用——只需指向 https://inference.do-ai.run/v1 即可畅游。

身份验证

所有请求都需要在 Authorization: Bearer 标头中包含一个模型访问密钥。

bash
export GRADIENTAPIKEY=your-model-access-key

获取方式： DigitalOcean 控制台 → Gradient AI → 模型访问密钥 → 创建密钥。

📖 完整身份验证文档

工具

🔍 列出可用模型

在刷卡前先浏览一下 LLM 的橱窗。

bash
python3 gradient_models.py # 美观表格
python3 gradient_models.py --json # 机器可读
python3 gradient_models.py --filter llama # 按名称搜索

在硬编码模型 ID 之前使用此命令——模型会随时间添加和弃用。

直接 API 调用：
bash
curl -s https://inference.do-ai.run/v1/models \
-H Authorization: Bearer $GRADIENTAPIKEY | python3 -m json.tool

📖 模型参考

💬 聊天补全

经典功能。发送结构化消息（系统/用户/助手角色），获取响应。兼容 OpenAI，因此你可能已经知道如何使用。

bash
python3 gradient_chat.py \
--model openai-gpt-oss-120b \
--system 你是一个有用的助手。 \
--prompt 用一段话解释无服务器推理。

不同模型

python3 gradient_chat.py \ --model llama3.3-70b-instruct \ --prompt 写一首关于云计算的诗。

直接 API 调用：
bash
curl -s https://inference.do-ai.run/v1/chat/completions \
-H Authorization: Bearer $GRADIENTAPIKEY \
-H Content-Type: application/json \
-d {
model: openai-gpt-oss-120b,
messages: [
{role: system, content: 你是一个有用的助手。},
{role: user, content: 你好！}
],
temperature: 0.7,
max_tokens: 1000
}

📖 聊天补全文档

⚡ 响应 API（推荐）

DigitalOcean 为新集成推荐的端点。请求格式更简单，并支持提示缓存——也就是不必为相同上下文支付两次费用。

bash

基本用法

python3 gradient_chat.py \
--model openai-gpt-oss-120b \
--prompt 总结这份财报。 \
--responses-api

使用提示缓存（节省后续查询成本）

python3 gradient_chat.py \ --model openai-gpt-oss-120b \ --prompt 现在将其与上一季度进行比较。 \ --responses-api --cache

直接 API 调用：
bash
curl -s https://inference.do-ai.run/v1/responses \
-H Authorization: Bearer $GRADIENTAPIKEY \
-H Content-Type: application/json \
-d {
model: openai-gpt-oss-120b,
input: 解释提示缓存。,
store: true
}

何时使用哪种：

	聊天补全	响应 API
请求格式	带角色的消息数组	单个 input 字符串
提示缓存

📖 响应 API 文档

🖼️ 生成图像

将文本提示转换为图像。因为有时图表还不够。

bash
python3 gradient_image.py --prompt 一只在华尔街交易股票的大龙虾
python3 gradient_image.py --prompt 纽约证券交易所的日落 --output sunset.png
python3 gradient_image.py --prompt 金融科技标志 --json

直接 API 调用：
bash
curl -s https://inference.do-ai.run/v1/images/generations \
-H Authorization: Bearer $GRADIENTAPIKEY \
-H Content-Type: application/json \
-d {
model: dall-e-3,
prompt: 一只分析蜡烛图的大龙虾,
n: 1
}

📖 图像生成文档

🧠 模型选择指南

并非所有模型都生而平等。明智选择，年轻的甲壳类动物：

模型	最佳用途	速度	质量	上下文
openai-gpt-oss-120b	复杂推理、分析、写作	中等	★★★★★	128K
llama3.3-70b-instruct

通用任务、指令遵循 | 快速 | ★★★★ | 128K |
| deepseek-r1-distill-llama-70b | 数学、代码、逐步推理 | 慢速 | ★★★★★ | 128K |
| qwen3-32b | 快速分类、短任务 | 最快 | ★★★ | 32K |

🦞 专业提示：成本感知路由。 使用快速模型（例如 qwen3-32b）进行评分或分类，仅在需要深度时升级到强模型（例如 openai-gpt-oss-120b）。对重复上下文启用提示缓存。

始终运行 python3 gradient_models.py 检查当前可用的模型——菜单会变化。

📖 可用模型

💰 模型定价查询

在产生账单之前检查模型的成本。抓取官方 DigitalOcean 定价页面——无需 API 密钥。

bash
python3 gradient_pricing.py # 美观表格
python3 gradient_pricing.py --json # 机器可读
python3 gradient_pricing.py --model llama # 按模型名称筛选
python3 gradient_pricing.py --no-cache # 跳过缓存，实时获取

工作原理：

- 从 DigitalOcean 文档（公共页面，无需身份验证）获取实时定价
将结果缓存 24 小时到 /tmp/gradientpricingcache.json
如果实时获取失败，则回退到捆绑的快照

🦞 专业提示： 在选择模型之前运行 python3 gradient_pricing.py --model gpt-oss，查看 gpt-oss-120b（$0.10/$0.70）和 gpt-oss-20b（$0.05/$0.45）每 100 万个 token 的成本差异。

📖 定价文档

CLI 参考

所有脚本都接受 --json 参数以输出机器可读格式。

gradient_models.py [--json] [--filter QUERY]
gradient_chat.py --prompt TEXT [--model ID] [--system TEXT]
[--responses-api] [--cache] [--temperature F]
[--max-tokens N] [--json]
gradient_image.py --prompt TEXT [--model ID] [

gradient-inference梯度推理

gradient-inference

🦞 Gradient AI — Serverless Inference

Authentication