Qwen 3.5 — Alibaba's Latest LLM on Your Local Fleet

Qwen 3.5 is the newest and most capable model in the Qwen family. It rivals GPT-4o and Claude 3.5 Sonnet on reasoning, coding, and multilingual benchmarks — and you can run it locally on your own hardware for free.

Supported Qwen models

Model	Parameters	Ollama name	Best for
Qwen 3.5	72B	INLINECODE0	Frontier reasoning — rivals GPT-4o
Qwen 3.5

Quick start

CODEBLOCK0

No models are downloaded during installation. Models are pulled on demand. All pulls require user confirmation.

Use Qwen 3.5 through the fleet

OpenAI SDK

CODEBLOCK1

Qwen3-Coder for code

CODEBLOCK2

Ollama API

CODEBLOCK3

Qwen3-ASR speech-to-text

CODEBLOCK4

Hardware recommendations

Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.

Device	RAM	Best Qwen model
Mac Mini (16GB)	16GB	INLINECODE7
Mac Mini (32GB)

Why Qwen 3.5 locally

- GPT-4o quality — Qwen 3.5 72B matches GPT-4o on MMLU, HumanEval, and MT-Bench
Zero cost — no per-token charges after hardware
Privacy — all data stays on your network
No rate limits — Qwen's cloud API throttles during peak hours. Your hardware doesn't.
Fleet routing — multiple machines share the load

Also available on this fleet

Other LLMs

Llama 3.3, DeepSeek-V3, DeepSeek-R1, Phi 4, Mistral, Gemma 3, Codestral — same endpoint.

Image generation

CODEBLOCK5

Embeddings

CODEBLOCK6

Monitor

CODEBLOCK7

Dashboard at http://localhost:11435/dashboard.

Full documentation

Contribute

Ollama Herd is open source (MIT):

- Star on GitHub — help others run Qwen locally
Open an issue — share your Qwen setup, report bugs
PRs welcome — CLAUDE.md gives AI agents full context. 444 tests, async Python.

Guardrails

- Model downloads require explicit user confirmation — Qwen models range from 4GB (7B) to 42GB (72B).
Model deletion requires explicit user confirmation.
Never delete or modify files in ~/.fleet-manager/.
No models are downloaded automatically — all pulls are user-initiated or require opt-in.

Qwen 3.5 — 阿里巴巴最新大语言模型，部署于您的本地集群

Qwen 3.5 是 Qwen 系列中最新、能力最强的模型。它在推理、编程和多语言基准测试中可与 GPT-4o 和 Claude 3.5 Sonnet 相媲美——并且您可以免费在本地硬件上运行它。

支持的 Qwen 模型

模型	参数量	Ollama 名称	最佳用途
Qwen 3.5	72B	qwen3.5	前沿推理——媲美 GPT-4o
Qwen 3.5

快速开始

bash
pip install ollama-herd # PyPI: https://pypi.org/project/ollama-herd/
herd # 启动路由器（端口 11435）
herd-node # 在每个设备上运行——自动发现路由器

安装过程中不会下载任何模型。模型按需拉取。所有拉取操作均需用户确认。

通过集群使用 Qwen 3.5

OpenAI SDK

python
from openai import OpenAI

client = OpenAI(baseurl=http://localhost:11435/v1, apikey=not-needed)

Qwen 3.5 用于复杂推理

response = client.chat.completions.create( model=qwen3.5, messages=[{role: user, content: 比较微服务与单体架构}], stream=True, ) for chunk in response: print(chunk.choices[0].delta.content or , end=)

Qwen3-Coder 用于代码生成

python
response = client.chat.completions.create(
model=qwen3-coder:32b,
messages=[{role: user, content: 用 Go 语言编写一个线程安全的连接池}],
)
print(response.choices[0].message.content)

Ollama API

bash

Qwen 3.5 对话

curl http://localhost:11435/api/chat -d {
model: qwen3.5,
messages: [{role: user, content: 解释注意力机制}],
stream: false
}

Qwen3-ASR 语音转文字

bash
curl http://localhost:11435/api/transcribe \
-F file=@meeting.wav \
-F model=qwen3-asr

硬件推荐

跨平台： 以下为示例配置。任何具有同等内存的设备（Mac、Linux、Windows）均可使用。集群路由器支持所有平台。

设备	内存	最佳 Qwen 模型
Mac Mini（16GB）	16GB	qwen3.5:7b
Mac Mini（32GB）

为何选择本地 Qwen 3.5

- GPT-4o 质量 — Qwen 3.5 72B 在 MMLU、HumanEval 和 MT-Bench 上媲美 GPT-4o
零成本 — 硬件投入后无需按 token 付费
隐私保护 — 所有数据保留在您的网络中
无速率限制 — Qwen 云端 API 在高峰时段会限流，您的硬件不会
集群路由 — 多台机器分担负载

本集群还提供

其他大语言模型

Llama 3.3、DeepSeek-V3、DeepSeek-R1、Phi 4、Mistral、Gemma 3、Codestral——同一端点。

图像生成

bash curl -o image.png http://localhost:11435/api/generate-image \ -d {model: z-image-turbo, prompt: 一个帮助编写代码的 AI 助手, width: 1024, height: 1024}

嵌入向量

bash curl http://localhost:11435/api/embed \ -d {model: nomic-embed-text, input: Qwen 3.5 大语言模型}

监控

bash
curl -s http://localhost:11435/fleet/status | python3 -m json.tool
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

仪表板地址：http://localhost:11435/dashboard。

完整文档

贡献

Ollama Herd 是开源项目（MIT 许可）：

- 在 GitHub 上星标 — 帮助他人在本地运行 Qwen
提交 issue — 分享您的 Qwen 配置、报告 bug
欢迎提交 PR — CLAUDE.md 为 AI 代理提供完整上下文。444 个测试，异步 Python。

安全护栏

- 模型下载需要用户明确确认 — Qwen 模型大小从 4GB（7B）到 42GB（72B）不等。
模型删除需要用户明确确认。
切勿删除或修改 ~/.fleet-manager/ 中的文件。
不会自动下载任何模型——所有拉取操作均由用户发起或需要用户选择加入。

qwen-qwen3-5通义千问3.5

qwen-qwen3-5

Qwen 3.5 — Alibaba's Latest LLM on Your Local Fleet

Supported Qwen models

Quick start

Use Qwen 3.5 through the fleet

OpenAI SDK

Qwen3-Coder for code

Ollama API

Qwen3-ASR speech-to-text

Hardware recommendations

Why Qwen 3.5 locally

Also available on this fleet

Other LLMs

Image generation

Embeddings

Monitor

Full documentation

Contribute

Guardrails

Qwen 3.5 — 阿里巴巴最新大语言模型，部署于您的本地集群

支持的 Qwen 模型

快速开始

通过集群使用 Qwen 3.5

OpenAI SDK

Qwen 3.5 用于复杂推理

Qwen3-Coder 用于代码生成

Ollama API

Qwen 3.5 对话

Qwen3-ASR 语音转文字

硬件推荐

为何选择本地 Qwen 3.5

本集群还提供

其他大语言模型

图像生成

嵌入向量

监控

完整文档

贡献

安全护栏

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement