DeepSeek — Run DeepSeek Models Across Your Local Fleet

Run DeepSeek-V3, DeepSeek-R1, and DeepSeek-Coder on your own hardware. The fleet router picks the best device for every request — no cloud API needed, zero per-token costs, all data stays on your machines.

Supported DeepSeek models

Model	Parameters	Ollama name	Best for
DeepSeek-V3	671B MoE (37B active)	INLINECODE0	General — matches GPT-4o on most benchmarks
DeepSeek-V3.1

Setup

CODEBLOCK0

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Models are pulled on demand — the router auto-pulls when a request arrives for a model not yet on any node, or you can pull manually via the dashboard. No models are downloaded during installation.

Use DeepSeek through the fleet

OpenAI SDK

CODEBLOCK1

DeepSeek-Coder for code

CODEBLOCK2

Ollama API

CODEBLOCK3

Hardware recommendations (optional — choose models that fit your RAM)

Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.

DeepSeek offers models at every size. Pick the one that fits your available memory — smaller models work great for most tasks:

Model	Min RAM	Recommended hardware
INLINECODE6	4GB	Any Mac
INLINECODE7

The fleet router automatically sends requests to the machine where the model is loaded — no manual routing needed.

Why run DeepSeek locally

- Zero cost — DeepSeek API charges per token. Local is free after hardware.
Privacy — code and business data never leave your network.
No rate limits — DeepSeek API throttles during peak hours. Local has no throttle.
Availability — DeepSeek API has had outages. Your hardware doesn't depend on their servers.
Fleet routing — multiple machines share the load. One busy? Request goes to the next.

Fleet features

- 7-signal scoring — picks the optimal node for every request
Auto-retry — fails over to next best node transparently
VRAM-aware fallback — routes to a loaded model in the same category instead of cold-loading
Context protection — prevents expensive model reloads from num_ctx changes
Request tagging — track per-project DeepSeek usage

Also available on this fleet

Other LLM models

Llama 3.3, Qwen 3.5, Phi 4, Mistral, Gemma 3 — any Ollama model routes through the same endpoint.

Image generation

CODEBLOCK4

Speech-to-text

CODEBLOCK5

Embeddings

CODEBLOCK6

Dashboard

INLINECODE14 — monitor DeepSeek requests alongside all other models. Per-model latency, token throughput, health checks.

Full documentation

Agent Setup Guide

Guardrails

- Model downloads require explicit user confirmation — DeepSeek models range from 1GB (1.5B) to 400GB+ (671B). Always confirm before pulling.
Model deletion requires explicit user confirmation — never remove models without asking.
Never delete or modify files in ~/.fleet-manager/.
If a DeepSeek model is too large for available memory, suggest a smaller variant (e.g., deepseek-r1:7b instead of :70b).
No models are downloaded automatically — all pulls are user-initiated or require opt-in via the auto_pull setting.

DeepSeek — 在本地设备群中运行DeepSeek模型

在您自己的硬件上运行DeepSeek-V3、DeepSeek-R1和DeepSeek-Coder。设备群路由器为每个请求选择最佳设备——无需云API，零按token计费，所有数据保留在您的机器上。

支持的DeepSeek模型

模型	参数	Ollama名称	最佳用途
DeepSeek-V3	671B MoE（37B活跃）	deepseek-v3	通用——在大多数基准测试中与GPT-4o相当
DeepSeek-V3.1

安装设置

bash
pip install ollama-herd
herd # 启动路由器（端口11435）
herd-node # 在每台机器上运行

软件包：ollama-herd | 仓库：github.com/geeks-accelerator/ollama-herd

模型按需拉取——当请求到达时，如果模型尚未在任何节点上，路由器会自动拉取；或者您可以通过仪表盘手动拉取。安装过程中不会下载任何模型。

通过设备群使用DeepSeek

OpenAI SDK

python
from openai import OpenAI

client = OpenAI(baseurl=http://localhost:11435/v1, apikey=not-needed)

DeepSeek-R1用于推理

response = client.chat.completions.create( model=deepseek-r1:70b, messages=[{role: user, content: 证明存在无穷多个素数}], stream=True, ) for chunk in response: print(chunk.choices[0].delta.content or , end=)

DeepSeek-Coder用于代码

python
response = client.chat.completions.create(
model=deepseek-coder-v2:16b,
messages=[{role: user, content: 用Python编写一个Redis缓存装饰器}],
)
print(response.choices[0].message.content)

Ollama API

bash

DeepSeek-V3通用对话

curl http://localhost:11435/api/chat -d {
model: deepseek-v3,
messages: [{role: user, content: 解释量子计算}],
stream: false
}

DeepSeek-R1推理

curl http://localhost:11435/api/chat -d { model: deepseek-r1:70b, messages: [{role: user, content: 逐步解决这个问题：...}], stream: false }

硬件建议（可选——选择适合您内存的模型）

跨平台： 这些是示例配置。任何具有等效内存的设备（Mac、Linux、Windows）均可使用。设备群路由器支持所有平台。

DeepSeek提供各种尺寸的模型。选择适合您可用内存的模型——较小的模型在大多数任务中表现良好：

模型	最小内存	推荐硬件
deepseek-r1:1.5b	4GB	任意Mac
deepseek-r1:7b

设备群路由器自动将请求发送到加载了模型的机器——无需手动路由。

为什么在本地运行DeepSeek

- 零成本——DeepSeek API按token收费。本地运行在硬件投入后完全免费。
隐私——代码和业务数据永远不会离开您的网络。
无速率限制——DeepSeek API在高峰时段会限流。本地运行无限制。
可用性——DeepSeek API曾出现过宕机。您的硬件不依赖于他们的服务器。
设备群路由——多台机器分担负载。一台繁忙？请求自动转到下一台。

设备群功能

- 7信号评分——为每个请求选择最优节点
自动重试——透明地故障转移到下一个最佳节点
VRAM感知回退——路由到同一类别中已加载的模型，而不是冷加载
上下文保护——防止因num_ctx变化导致昂贵的模型重新加载
请求标记——跟踪每个项目的DeepSeek使用情况

该设备群还支持

其他LLM模型

Llama 3.3、Qwen 3.5、Phi 4、Mistral、Gemma 3——任何Ollama模型都通过同一端点路由。

图像生成

bash
curl -o image.png http://localhost:11435/api/generate-image \
-H Content-Type: application/json \
-d {model:z-image-turbo,prompt:日落,width:1024,height:1024,steps:4}

语音转文字

bash
curl http://localhost:11435/api/transcribe -F audio=@recording.wav

嵌入向量

bash
curl http://localhost:11435/api/embeddings -d {model:nomic-embed-text,prompt:查询}

仪表盘

http://localhost:11435/dashboard——监控DeepSeek请求以及所有其他模型。每个模型的延迟、token吞吐量、健康检查。

完整文档

智能体设置指南

安全护栏

- 模型下载需要明确的用户确认——DeepSeek模型范围从1GB（1.5B）到400GB+（671B）。拉取前务必确认。
模型删除需要明确的用户确认——未经询问绝不删除模型。
绝不删除或修改~/.fleet-manager/中的文件。
如果DeepSeek模型对于可用内存来说过大，建议使用较小的变体（例如，使用deepseek-r1:7b代替:70b）。
不会自动下载任何模型——所有拉取均由用户发起，或需要通过auto_pull设置选择加入。

deepseek-deepseek-coderDeepSeek编码器

deepseek-deepseek-coder

DeepSeek — Run DeepSeek Models Across Your Local Fleet

Supported DeepSeek models

Setup

Use DeepSeek through the fleet

OpenAI SDK

DeepSeek-Coder for code

Ollama API

Hardware recommendations (optional — choose models that fit your RAM)

Why run DeepSeek locally

Fleet features

Also available on this fleet

Other LLM models

Image generation

Speech-to-text

Embeddings

Dashboard

Full documentation

Guardrails

DeepSeek — 在本地设备群中运行DeepSeek模型

支持的DeepSeek模型

安装设置

通过设备群使用DeepSeek

OpenAI SDK

DeepSeek-R1用于推理

DeepSeek-Coder用于代码

Ollama API

DeepSeek-V3通用对话

DeepSeek-R1推理

硬件建议（可选——选择适合您内存的模型）

为什么在本地运行DeepSeek

设备群功能

该设备群还支持

其他LLM模型

图像生成

语音转文字

嵌入向量

仪表盘

完整文档

安全护栏

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement