Ollama Proxy — One Endpoint for All Your Ollama Instances

You have Ollama running on multiple machines. Instead of hardcoding IPs and manually picking which Ollama instance to hit, point everything at the Ollama proxy. The Ollama proxy routes to the best available device automatically.

CODEBLOCK0

Set up the Ollama proxy

CODEBLOCK1

On one machine (the Ollama proxy):
CODEBLOCK2

On every machine running Ollama:
CODEBLOCK3

Now point your apps at http://ollama-proxy:11435 instead of http://localhost:11434. Same Ollama API, same model names, same streaming — the Ollama proxy handles smarter routing.

Drop-in Ollama proxy replacement

Every Ollama API endpoint works through the Ollama proxy:

CODEBLOCK4

OpenAI-compatible Ollama proxy API

The Ollama proxy also exposes an OpenAI-compatible endpoint — same models, no code changes:

CODEBLOCK5

What the Ollama proxy does that direct Ollama doesn't

Feature	Direct Ollama	Ollama Proxy (Herd)
Multiple machines	Manual IP switching	Ollama proxy routes automatically
Load balancing

Ollama proxy works with your existing tools

Just change the Ollama URL to the Ollama proxy — no other configuration needed:

Tool	Before (direct Ollama)	After (Ollama proxy)
Open WebUI	INLINECODE2	INLINECODE3
Aider

How the Ollama proxy routes requests

When a request arrives at the Ollama proxy, it scores all Ollama nodes that have the requested model:

1. Thermal state — is the model already loaded in the Ollama instance (hot)?
Memory fit — does the Ollama node have enough free RAM?
Queue depth — is the Ollama node busy with other requests?
Latency history — how fast has this Ollama node been recently?
Role affinity — the Ollama proxy sends big models to big machines
Availability trend — is this Ollama node reliably available?
Context fit — does the loaded context window match the request?

The highest-scoring Ollama node wins. If it fails, the Ollama proxy retries on the next best node automatically.

Monitor your Ollama proxy fleet

Ollama proxy dashboard at http://ollama-proxy:11435/dashboard — see every Ollama node, every model, every queue in real time.

CODEBLOCK6

Full documentation

- Agent Setup Guide — setting up the Ollama proxy
API Reference — all Ollama proxy endpoints
Configuration — Ollama proxy settings

Contribute

Ollama Herd (the Ollama proxy) is open source (MIT). We welcome contributions:

- Star on GitHub — help others find the Ollama proxy
Open an issue — bug reports, feature requests
PRs welcome — CLAUDE.md gives AI agents full Ollama proxy context. 444 tests, async Python.

Guardrails

- No automatic model downloads — the Ollama proxy requires explicit user confirmation for model pulls.
Model deletion requires explicit user confirmation via the Ollama proxy.
All Ollama proxy requests stay local — no data leaves your network.
Never delete or modify files in ~/.fleet-manager/.

Ollama Proxy — 一个端点管理所有Ollama实例

你在多台机器上运行Ollama。无需硬编码IP地址或手动选择要访问的Ollama实例，只需将所有请求指向Ollama Proxy。Ollama Proxy会自动路由到最佳可用设备。

之前: 应用 → http://macmini:11434 (单个Ollama实例，希望它不忙)
之后: 应用 → http://ollama-proxy:11435 (Ollama Proxy自动选择最佳机器)

设置Ollama Proxy

bash
pip install ollama-herd # PyPI: https://pypi.org/project/ollama-herd/

在一台机器上（作为Ollama Proxy）：
bash
herd # 在端口11435上启动Ollama Proxy

在每台运行Ollama的机器上：
bash
herd-node # 自动发现网络上的Ollama Proxy

现在将你的应用指向 http://ollama-proxy:11435 而不是 http://localhost:11434。相同的Ollama API、相同的模型名称、相同的流式传输——Ollama Proxy负责更智能的路由。

即插即用的Ollama Proxy替代方案

所有Ollama API端点均可通过Ollama Proxy工作：

bash

通过Ollama Proxy聊天（与直接使用Ollama相同）

curl http://ollama-proxy:11435/api/chat -d {
model: llama3.3:70b,
messages: [{role: user, content: 通过Ollama Proxy问好}]
}

通过Ollama Proxy生成（与直接使用Ollama相同）

curl http://ollama-proxy:11435/api/generate -d { model: qwen3:32b, prompt: 通过Ollama Proxy解释量子计算 }

通过Ollama Proxy列出模型（汇总所有Ollama节点）

curl http://ollama-proxy:11435/api/tags

通过Ollama Proxy列出已加载模型（跨所有Ollama节点）

curl http://ollama-proxy:11435/api/ps

通过Ollama Proxy拉取模型（自动选择最佳节点）

curl -N http://ollama-proxy:11435/api/pull -d {name: codestral}

兼容OpenAI的Ollama Proxy API

Ollama Proxy还暴露了兼容OpenAI的端点——相同的模型，无需修改代码：

python
from openai import OpenAI

指向Ollama Proxy而非直接使用Ollama

ollamaproxyclient = OpenAI(baseurl=http://ollama-proxy:11435/v1, apikey=not-needed) ollamaproxyresponse = ollamaproxyclient.chat.completions.create( model=llama3.3:70b, messages=[{role: user, content: 通过Ollama Proxy问好}], stream=True, )

Ollama Proxy相比直接使用Ollama的额外功能

功能	直接使用Ollama	Ollama Proxy (Herd)
多机器支持	手动切换IP	Ollama Proxy自动路由
负载均衡

Ollama Proxy与现有工具兼容

只需将Ollama URL改为Ollama Proxy——无需其他配置：

工具	之前（直接使用Ollama）	之后（使用Ollama Proxy）
Open WebUI	http://localhost:11434	http://ollama-proxy:11435
Aider

Ollama Proxy如何路由请求

当请求到达Ollama Proxy时，它会对所有拥有请求模型的Ollama节点进行评分：

1. 热状态 — 模型是否已加载到Ollama实例中（热加载）？
内存适配 — Ollama节点是否有足够的空闲RAM？
队列深度 — Ollama节点是否忙于处理其他请求？
延迟历史 — 该Ollama节点最近响应速度如何？
角色亲和性 — Ollama Proxy将大模型发送到大机器
可用性趋势 — 该Ollama节点是否稳定可用？
上下文适配 — 已加载的上下文窗口是否匹配请求？

得分最高的Ollama节点胜出。如果失败，Ollama Proxy会自动重试下一个最佳节点。

监控你的Ollama Proxy集群

Ollama Proxy仪表盘位于 http://ollama-proxy:11435/dashboard —— 实时查看每个Ollama节点、每个模型、每个队列。

bash

Ollama Proxy集群概览

curl -s http://ollama-proxy:11435/fleet/status | python3 -m json.tool

Ollama Proxy健康检查

curl -s http://ollama-proxy:11435/dashboard/api/health | python3 -m json.tool

完整文档

- 代理设置指南 — 设置Ollama Proxy
API参考 — 所有Ollama Proxy端点
配置 — Ollama Proxy设置

贡献

Ollama Herd（Ollama Proxy）是开源的（MIT协议）。我们欢迎贡献：

- 在GitHub上星标 — 帮助他人找到Ollama Proxy
提交问题 — 错误报告、功能请求
欢迎PR — CLAUDE.md为AI代理提供完整的Ollama Proxy上下文。444个测试，异步Python。

安全护栏

- 不自动下载模型 — Ollama Proxy需要用户明确确认才能拉取模型。
删除模型需要通过Ollama Proxy进行用户明确确认。
所有Ollama Proxy请求保持本地 — 没有数据离开你的网络。
切勿删除或修改 ~/.fleet-manager/ 中的文件。

ollama-proxyOllama代理

ollama-proxy

Ollama Proxy — One Endpoint for All Your Ollama Instances

Set up the Ollama proxy

Drop-in Ollama proxy replacement

OpenAI-compatible Ollama proxy API

What the Ollama proxy does that direct Ollama doesn't

Ollama proxy works with your existing tools

How the Ollama proxy routes requests

Monitor your Ollama proxy fleet

Full documentation

Contribute

Guardrails

Ollama Proxy — 一个端点管理所有Ollama实例

设置Ollama Proxy

即插即用的Ollama Proxy替代方案

通过Ollama Proxy聊天（与直接使用Ollama相同）

通过Ollama Proxy生成（与直接使用Ollama相同）

通过Ollama Proxy列出模型（汇总所有Ollama节点）

通过Ollama Proxy列出已加载模型（跨所有Ollama节点）

通过Ollama Proxy拉取模型（自动选择最佳节点）

兼容OpenAI的Ollama Proxy API

指向Ollama Proxy而非直接使用Ollama

Ollama Proxy相比直接使用Ollama的额外功能

Ollama Proxy与现有工具兼容

Ollama Proxy如何路由请求

监控你的Ollama Proxy集群

Ollama Proxy集群概览

Ollama Proxy健康检查

完整文档

贡献

安全护栏

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

ollama-proxyOllama代理

ollama-proxy

Ollama Proxy — One Endpoint for All Your Ollama Instances

Set up the Ollama proxy

Drop-in Ollama proxy replacement

OpenAI-compatible Ollama proxy API

What the Ollama proxy does that direct Ollama doesn't

Ollama proxy works with your existing tools

How the Ollama proxy routes requests

Monitor your Ollama proxy fleet

Full documentation

Contribute

Guardrails

Ollama Proxy — 一个端点管理所有Ollama实例

设置Ollama Proxy

即插即用的Ollama Proxy替代方案

通过Ollama Proxy聊天（与直接使用Ollama相同）

通过Ollama Proxy生成（与直接使用Ollama相同）

通过Ollama Proxy列出模型（汇总所有Ollama节点）

通过Ollama Proxy列出已加载模型（跨所有Ollama节点）

通过Ollama Proxy拉取模型（自动选择最佳节点）

兼容OpenAI的Ollama Proxy API

指向Ollama Proxy而非直接使用Ollama

Ollama Proxy相比直接使用Ollama的额外功能

Ollama Proxy与现有工具兼容

Ollama Proxy如何路由请求

监控你的Ollama Proxy集群

Ollama Proxy集群概览

Ollama Proxy健康检查

完整文档

贡献

安全护栏

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement