Self-Hosted AI — Own Your Entire AI Stack

Stop paying per token. Stop sending data to cloud APIs. Run self-hosted LLMs, self-hosted image generation, self-hosted speech-to-text, and self-hosted embeddings on your own hardware. One self-hosted router makes all your devices act like one system.

What self-hosted AI replaces

Cloud service	Self-hosted replacement	How
OpenAI API	Self-hosted Llama 3.3, Qwen 3.5, DeepSeek-R1 via Ollama	Same OpenAI SDK, swap the base URL
DALL-E / Midjourney

Same APIs. Same quality. Zero per-request costs. All data stays on your self-hosted machines.

Self-Hosted Setup

CODEBLOCK0

No Docker. No Kubernetes. No config files. Self-hosted devices find each other automatically on your local network.

Self-Hosted LLM Inference

Drop-in self-hosted replacement for the OpenAI SDK:

CODEBLOCK1

Self-hosted Ollama API

CODEBLOCK2

Self-Hosted Image Generation

Self-hosted replacement for DALL-E and Midjourney:

CODEBLOCK3

Self-Hosted Speech-to-Text

Self-hosted replacement for Whisper API:

CODEBLOCK4

All self-hosted transcription stays on your network. No audio data sent to cloud services.

Self-Hosted Embeddings

Self-hosted replacement for OpenAI's embedding API:

CODEBLOCK5

Self-Hosted Cost Comparison

Service	Cloud cost	Self-hosted cost
GPT-4o (1M tokens/month)	~$15-30/month	$0 (self-hosted hardware you own)
DALL-E (1000 images/month)

After hardware investment, every self-hosted request is free forever. No rate limits, no usage caps, no surprise bills.

Self-Hosted Advantages

- Self-hosted data sovereignty — prompts, images, audio, and documents never leave your network
Self-hosted throughput — your hardware, no rate limits
Self-hosted uptime — cloud API outages don't affect your self-hosted fleet
Self-hosted flexibility — switch models instantly, no vendor lock-in
Self-hosted compliance — HIPAA, GDPR, SOC2 — no third-party data processors
Self-hosted predictability — hardware depreciates, but never surprises you with a bill

Self-Hosted Fleet Routing

The self-hosted router scores each device on 7 signals and picks the best one for every request. Multiple self-hosted machines share the load automatically.

CODEBLOCK6

Self-hosted dashboard at http://localhost:11435/dashboard for visual monitoring of your entire self-hosted fleet.

Full self-hosted documentation

- Agent Setup Guide — all 4 self-hosted model types
Image Generation Guide — 3 self-hosted image backends
API Reference
Configuration

Contribute

Ollama Herd is open source (MIT). Self-hosted AI for everyone:

- Star on GitHub — help others discover self-hosted AI
Open an issue — share your self-hosted setup
PRs welcome from humans and AI agents. CLAUDE.md gives full self-hosted context. 444 tests.

Self-Hosted Guardrails

- No automatic downloads — all self-hosted model pulls require explicit user confirmation.
Self-hosted model deletion requires explicit user confirmation.
All self-hosted requests stay local — no data leaves your network. No telemetry, no analytics, no cloud callbacks.
Never delete or modify self-hosted files in ~/.fleet-manager/.
Your self-hosted fleet has zero cloud dependencies — works fully offline after initial model downloads.

自托管AI — 拥有你的完整AI栈

停止按token付费。停止向云端API发送数据。在你的自有硬件上运行自托管LLM、自托管图像生成、自托管语音转文本和自托管嵌入。一个自托管路由器让你的所有设备如同一个系统般协同工作。

自托管AI替代方案

云服务	自托管替代方案	实现方式
OpenAI API	通过Ollama自托管Llama 3.3、Qwen 3.5、DeepSeek-R1	相同OpenAI SDK，更换基础URL
DALL-E / Midjourney

相同API。相同质量。零按次请求成本。所有数据保留在你的自托管机器上。

自托管设置

bash
pip install ollama-herd # 从PyPI安装自托管AI路由器
herd # 启动自托管路由器
herd-node # 在每台自托管机器上运行 — 自动发现路由器

无需Docker。无需Kubernetes。无需配置文件。自托管设备在本地网络上自动相互发现。

自托管LLM推理

OpenAI SDK的直接自托管替代方案：

python
from openai import OpenAI

自托管推理客户端 — 替代OpenAI云端

selfhostedclient = OpenAI(baseurl=http://localhost:11435/v1, apikey=not-needed)

selfhostedresponse = selfhostedclient.chat.completions.create(
model=llama3.3:70b, # 自托管模型，无云端依赖
messages=[{role: user, content: 分析这份合同的风险}],
stream=True,
)
for chunk in selfhostedresponse:
print(chunk.choices[0].delta.content or , end=)

自托管Ollama API

bash
curl http://localhost:11435/api/chat -d {
model: deepseek-r1:70b,
messages: [{role: user, content: 解释自托管AI相比云端API的优势}],
stream: false
}

自托管图像生成

DALL-E和Midjourney的自托管替代方案：

bash

在任何节点上安装自托管图像后端

uv tool install mflux # 自托管Flux模型（约7秒）
uv tool install diffusionkit # 自托管Stable Diffusion 3/3.5

在你的自托管集群上生成图像

curl -o selfhostedoutput.png http://localhost:11435/api/generate-image \ -H Content-Type: application/json \ -d {model: z-image-turbo, prompt: 自托管AI生成产品模型, width: 1024, height: 1024}

自托管语音转文本

Whisper API的自托管替代方案：

bash
curl http://localhost:11435/api/transcribe \
-F file=@selfhostedmeeting.wav \
-F model=qwen3-asr

所有自托管转录保留在你的网络中。无音频数据发送到云端服务。

自托管嵌入

OpenAI嵌入API的自托管替代方案：

bash
curl http://localhost:11435/api/embed \
-d {model: nomic-embed-text, input: 用于私有RAG管道的自托管文档嵌入}

自托管成本对比

服务	云端成本	自托管成本
GPT-4o（每月100万token）	约15-30美元/月	0美元（你拥有的自托管硬件）
DALL-E（每月1000张图像）

硬件投资后，每个自托管请求永久免费。无速率限制，无使用上限，无意外账单。

自托管优势

- 自托管数据主权 — 提示词、图像、音频和文档永不离开你的网络
自托管吞吐量 — 你的硬件，无速率限制
自托管正常运行时间 — 云端API中断不影响你的自托管集群
自托管灵活性 — 即时切换模型，无供应商锁定
自托管合规性 — HIPAA、GDPR、SOC2 — 无第三方数据处理者
自托管可预测性 — 硬件折旧，但永不给你意外账单

自托管集群路由

自托管路由器根据7个信号对每台设备评分，并为每个请求选择最佳设备。多台自托管机器自动分担负载。

bash

自托管集群概览

curl -s http://localhost:11435/fleet/status | python3 -m json.tool

自托管健康检查

curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

针对你硬件的自托管模型推荐

curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool

自托管仪表板位于 http://localhost:11435/dashboard，用于可视化监控你的整个自托管集群。

完整自托管文档

- Agent设置指南 — 全部4种自托管模型类型
图像生成指南 — 3个自托管图像后端
API参考
配置

贡献

Ollama Herd是开源项目（MIT）。为所有人提供自托管AI：

- 在GitHub上标星 — 帮助他人发现自托管AI
提交Issue — 分享你的自托管设置
欢迎PR，来自人类和AI Agent。CLAUDE.md提供完整的自托管上下文。444个测试。

自托管安全护栏

- 无自动下载 — 所有自托管模型拉取都需要明确的用户确认。
自托管模型删除需要明确的用户确认。
所有自托管请求保持本地 — 无数据离开你的网络。无遥测，无分析，无云端回调。
切勿删除或修改 ~/.fleet-manager/ 中的自托管文件。
你的自托管集群零云端依赖 — 初始模型下载后可完全离线工作。

self-hosted-ai自托管AI平台

self-hosted-ai

Self-Hosted AI — Own Your Entire AI Stack

What self-hosted AI replaces

Self-Hosted Setup

Self-Hosted LLM Inference

Self-hosted Ollama API

Self-Hosted Image Generation

Self-Hosted Speech-to-Text

Self-Hosted Embeddings

Self-Hosted Cost Comparison

Self-Hosted Advantages

Self-Hosted Fleet Routing

Full self-hosted documentation

Contribute

Self-Hosted Guardrails

自托管AI — 拥有你的完整AI栈

自托管AI替代方案

自托管设置

自托管LLM推理

自托管推理客户端 — 替代OpenAI云端

自托管Ollama API

自托管图像生成

在任何节点上安装自托管图像后端

在你的自托管集群上生成图像

自托管语音转文本

自托管嵌入

自托管成本对比

自托管优势

自托管集群路由

自托管集群概览

自托管健康检查

针对你硬件的自托管模型推荐

完整自托管文档

贡献

自托管安全护栏

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement