Gemma 3 — Run Google's Open Models Across Your Fleet

Gemma 3 is Google's most capable open-source LLM family. 128K context window, strong coding performance, multilingual support across 140+ languages. The fleet router picks the best device for every request — no manual load balancing.

Supported Gemma models

Model	Parameters	Ollama name	Best for
Gemma 3 27B	27B	INLINECODE0	Highest quality — rivals much larger models
Gemma 3 12B

Quick start

CODEBLOCK0

No models are downloaded during installation. Models are pulled on demand when a request arrives, or manually via the dashboard. All pulls require user confirmation.

Use Gemma through the fleet

OpenAI SDK (drop-in replacement)

CODEBLOCK1

Code generation with CodeGemma

CODEBLOCK2

curl (Ollama format)

CODEBLOCK3

curl (OpenAI format)

CODEBLOCK4

Which Gemma for your hardware

Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.

Device	RAM	Best Gemma model
MacBook Air (8GB)	8GB	INLINECODE5 — instant responses
Mac Mini (16GB)

Why Gemma locally

- 128K context — process entire codebases and long documents
140+ languages — multilingual without switching models
Google quality, zero cost — no per-token charges after hardware
Privacy — all data stays on your network
Fleet routing — multiple machines share the load

Check what's running

CODEBLOCK5

Web dashboard at http://localhost:11435/dashboard — live monitoring.

Also available on this fleet

Other LLMs

Llama 3.3, Qwen 3.5, DeepSeek-V3, DeepSeek-R1, Phi 4, Mistral, Codestral — same endpoint.

Image generation

CODEBLOCK6

Speech-to-text

CODEBLOCK7

Embeddings

CODEBLOCK8

Full documentation

Contribute

Ollama Herd is open source (MIT). Stars, issues, and PRs welcome — from humans and AI agents alike:

- GitHub — 444 tests, fully async, CLAUDE.md makes AI agents productive instantly
Found a bug? Open an issue
Want to add a feature? Fork, branch, PR — the test suite runs in under 40 seconds

Guardrails

- Model downloads require explicit user confirmation — Gemma models range from 1GB (1B) to 16GB (27B).
Model deletion requires explicit user confirmation.
Never delete or modify files in ~/.fleet-manager/.
No models are downloaded automatically — all pulls are user-initiated or require opt-in via auto_pull.

Gemma 3 — 在你的设备集群中运行谷歌开源模型

Gemma 3 是谷歌最强大的开源大语言模型系列。支持128K上下文窗口，强大的编码性能，覆盖140多种语言的多语言支持。集群路由器会为每个请求自动选择最佳设备——无需手动负载均衡。

支持的Gemma模型

模型	参数量	Ollama名称	最佳用途
Gemma 3 27B	270亿	gemma3:27b	最高质量——可与更大模型媲美
Gemma 3 12B

快速开始

bash
pip install ollama-herd # PyPI: https://pypi.org/project/ollama-herd/
herd # 启动路由器（端口11435）
herd-node # 在每个设备上运行——自动发现路由器

安装过程中不会下载任何模型。模型会在请求到达时按需拉取，或通过仪表盘手动拉取。所有拉取操作都需要用户确认。

通过集群使用Gemma

OpenAI SDK（即插即用替代方案）

python
from openai import OpenAI

client = OpenAI(baseurl=http://localhost:11435/v1, apikey=not-needed)

Gemma 3 27B 用于复杂推理

response = client.chat.completions.create( model=gemma3:27b, messages=[{role: user, content: 向10岁孩子解释量子纠缠}], stream=True, ) for chunk in response: print(chunk.choices[0].delta.content or , end=)

使用CodeGemma生成代码

python
response = client.chat.completions.create(
model=codegemma,
messages=[{role: user, content: 用Rust编写一个包含插入、删除和搜索功能的二叉搜索树}],
)
print(response.choices[0].message.content)

curl（Ollama格式）

bash

Gemma 3 27B

curl http://localhost:11435/api/chat -d {
model: gemma3:27b,
messages: [{role: user, content: 翻译成日语：今天天气真好}],
stream: false
}

curl（OpenAI格式）

bash
curl http://localhost:11435/v1/chat/completions \
-H Content-Type: application/json \
-d {model: gemma3:4b, messages: [{role: user, content: 你好}]}

为你的硬件选择合适的Gemma

跨平台： 以下为示例配置。任何具有同等内存的设备（Mac、Linux、Windows）均可运行。集群路由器支持所有平台。

设备	内存	最佳Gemma模型
MacBook Air（8GB）	8GB	gemma3:1b — 即时响应
Mac Mini（16GB）

为什么在本地运行Gemma

- 128K上下文 — 处理整个代码库和长文档
140+种语言 — 无需切换模型即可支持多语言
谷歌品质，零成本 — 硬件之外无按token计费
隐私保护 — 所有数据保留在你的网络中
集群路由 — 多台机器分担负载

查看运行状态

bash

已加载到内存中的模型

curl -s http://localhost:11435/api/ps | python3 -m json.tool

集群健康状态

curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Web仪表盘地址：http://localhost:11435/dashboard — 实时监控。

该集群还提供以下功能

其他大语言模型

Llama 3.3、Qwen 3.5、DeepSeek-V3、DeepSeek-R1、Phi 4、Mistral、Codestral — 使用同一端点。

图像生成

bash curl -o image.png http://localhost:11435/api/generate-image \ -d {model: z-image-turbo, prompt: 一颗捕捉光线的宝石, width: 1024, height: 1024}

语音转文字

bash curl http://localhost:11435/api/transcribe -F file=@meeting.wav -F model=qwen3-asr

嵌入向量

bash curl http://localhost:11435/api/embed \ -d {model: nomic-embed-text, input: Google Gemma开源语言模型}

完整文档

贡献

Ollama Herd是开源项目（MIT协议）。欢迎来自人类和AI代理的Star、Issue和PR：

- GitHub — 444个测试，完全异步，CLAUDE.md让AI代理立即高效工作
发现Bug？提交Issue
想添加功能？Fork、分支、PR — 测试套件运行时间不到40秒

安全护栏

- 模型下载需要用户明确确认 — Gemma模型大小从1GB（1B）到16GB（27B）不等。
模型删除需要用户明确确认。
切勿删除或修改~/.fleet-manager/目录中的文件。
不会自动下载任何模型——所有拉取操作均由用户发起，或需要通过auto_pull选择加入。

gemma-gemma3Gemma3本地部署

gemma-gemma3

Gemma 3 — Run Google's Open Models Across Your Fleet

Supported Gemma models

Quick start

Use Gemma through the fleet

OpenAI SDK (drop-in replacement)

Code generation with CodeGemma

curl (Ollama format)

curl (OpenAI format)

Which Gemma for your hardware

Why Gemma locally

Check what's running

Also available on this fleet

Other LLMs

Image generation

Speech-to-text

Embeddings

Full documentation

Contribute

Guardrails

Gemma 3 — 在你的设备集群中运行谷歌开源模型

支持的Gemma模型

快速开始

通过集群使用Gemma

OpenAI SDK（即插即用替代方案）

Gemma 3 27B 用于复杂推理

使用CodeGemma生成代码

curl（Ollama格式）

Gemma 3 27B

curl（OpenAI格式）

为你的硬件选择合适的Gemma

为什么在本地运行Gemma

查看运行状态

已加载到内存中的模型

集群健康状态

该集群还提供以下功能

其他大语言模型

图像生成

语音转文字

嵌入向量

完整文档

贡献

安全护栏

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement