Local Coding Assistant — Code Models Across Your Fleet

Run the best open-source coding models on your own hardware. DeepSeek-Coder, Codestral, StarCoder, and Qwen-Coder routed across your devices — the fleet picks the best machine for every code generation request.

Your code never leaves your network. No GitHub Copilot subscription, no cloud API costs.

Coding models available

Model	Parameters	Ollama name	Strengths
Codestral	22B	INLINECODE0	80+ languages, fill-in-the-middle, Mistral's code specialist
DeepSeek-Coder-V2

Quick start

CODEBLOCK0

No models are downloaded during installation. All pulls require user confirmation.

Code generation

Write new code

CODEBLOCK1

Code review

CODEBLOCK2python\ndef processpayment(amount, cardnumber):\n ...\n``"}] }' CODEBLOCK3bash curl http://localhost:11435/api/chat -d '{ "model": "qwen2.5-coder:32b", "messages": [{"role": "user", "content": "Refactor this to use async/await: ..."}], "stream": false }' CODEBLOCK4bash # Models loaded in memory curl -s http://localhost:11435/api/ps | python3 -m json.tool # All available models curl -s http://localhost:11435/api/tags | python3 -m json.tool # Recent coding request traces curl -s "http://localhost:11435/dashboard/api/traces?limit=5" | python3 -m json.tool CODEBLOCK5bash curl http://localhost:11435/api/generate-image \ -d '{"model": "z-image-turbo", "prompt": "developer workspace illustration", "width": 512, "height": 512}' CODEBLOCK6bash curl http://localhost:11435/api/transcribe -F "file=@standup.wav" -F "model=qwen3-asr"`## Full documentation - [Agent Setup Guide](https://github.com/geeks-accelerator/ollama-herd/blob/main/docs/guides/agent-setup-guide.md) — all 4 model types - [API Reference](https://github.com/geeks-accelerator/ollama-herd/blob/main/docs/api-reference.md) — complete endpoint docs ## Guardrails - **Model downloads require explicit user confirmation** — coding models range from 2GB to 130GB+. Always confirm before pulling. - **Model deletion requires explicit user confirmation.** - Never delete or modify files in~/.fleet-manager/`.

- No models are downloaded automatically — all pulls are user-initiated or require opt-in.
Your code stays local — no prompts or generated code leave your network.

本地编码助手 — 跨设备代码模型

在自有硬件上运行最优秀的开源编码模型。DeepSeek-Coder、Codestral、StarCoder 和 Qwen-Coder 可在您的设备间路由——集群为每个代码生成请求选择最佳机器。

您的代码永远不会离开您的网络。无需 GitHub Copilot 订阅，无需云 API 费用。

可用编码模型

模型	参数规模	Ollama 名称	优势
Codestral	220亿	codestral	支持80+种语言、中间填充、Mistral 代码专家
DeepSeek-Coder-V2

快速开始

bash
pip install ollama-herd # PyPI: https://pypi.org/project/ollama-herd/
herd # 启动路由器（端口11435）
herd-node # 在每个设备上运行——自动发现路由器

安装过程中不会下载任何模型。所有拉取操作均需用户确认。

代码生成

编写新代码

python
from openai import OpenAI

client = OpenAI(baseurl=http://localhost:11435/v1, apikey=not-needed)

response = client.chat.completions.create(
model=codestral,
messages=[{role: user, content: 用Python编写一个支持TTL的线程安全LRU缓存}],
)
print(response.choices[0].message.content)

代码审查

bash
curl http://localhost:11435/v1/chat/completions \
-H Content-Type: application/json \
-d {
model: deepseek-coder-v2:16b,
messages: [{role: user, content: 审查以下代码中的错误和安全问题：\n\npython\ndef processpayment(amount, cardnumber):\n ...\n}]
}

代码重构

bash
curl http://localhost:11435/api/chat -d {
model: qwen2.5-coder:32b,
messages: [{role: user, content: 将此代码重构为使用async/await：...}],
stream: false
}

与您的IDE工具配合使用

集群在 http://localhost:11435/v1 暴露兼容OpenAI的API。将任何编码工具指向该地址：

工具	配置
Aider	aider --openai-api-base http://localhost:11435/v1 --model codestral
Continue.dev

根据您的内存选择合适模型

跨平台： 以下为示例配置。任何具有同等内存的设备（Mac、Linux、Windows）均可使用。

设备	内存	最佳编码模型
MacBook Air（8GB）	8GB	starcoder2:3b 或 deepseek-coder:6.7b
Mac Mini（16GB）

检查运行状态

bash

内存中加载的模型

curl -s http://localhost:11435/api/ps | python3 -m json.tool

所有可用模型

curl -s http://localhost:11435/api/tags | python3 -m json.tool

最近的编码请求追踪

curl -s http://localhost:11435/dashboard/api/traces?limit=5 | python3 -m json.tool

此集群还提供以下功能

通用大语言模型

Llama 3.3、Qwen 3.5、DeepSeek-R1、Mistral Large — 通过同一端点处理非编码任务。

图像生成

bash curl http://localhost:11435/api/generate-image \ -d {model: z-image-turbo, prompt: 开发者工作区插图, width: 512, height: 512}

语音转文字

bash curl http://localhost:11435/api/transcribe -F file=@standup.wav -F model=qwen3-asr

完整文档

- 代理设置指南 — 全部4种模型类型
API参考 — 完整端点文档

安全护栏

- 模型下载需要用户明确确认 — 编码模型大小从2GB到130GB+不等。拉取前务必确认。
模型删除需要用户明确确认。
切勿删除或修改 ~/.fleet-manager/ 中的文件。
不会自动下载任何模型——所有拉取均由用户发起或需用户选择加入。
您的代码保持本地化 — 任何提示或生成的代码都不会离开您的网络。

local-coding本地编码助手

local-coding

Local Coding Assistant — Code Models Across Your Fleet

Coding models available

Quick start