Phi 4 — Microsoft's Small Models, Big Results

Phi models prove you don't need 70B parameters for great results. Phi-4 matches much larger models on reasoning benchmarks while running on hardware as modest as an 8GB MacBook Air. Route them across your fleet for even better throughput.

Supported Phi models

Model	Parameters	Ollama name	RAM needed	Best for
Phi-4	14B	INLINECODE0	10GB	Reasoning, math, code — punches way above its weight
Phi-4-mini

3.8B | phi4-mini | 4GB | Ultra-fast on any device, even 8GB Macs | | Phi-3.5-mini | 3.8B | phi3.5 | 4GB | Proven lightweight model | | Phi-3-medium | 14B | phi3:14b | 10GB | Balanced quality and speed |

Quick start

CODEBLOCK0

No models are downloaded during installation. All pulls require user confirmation.

Why Phi for small devices

A Mac Mini with 16GB RAM can run Phi-4 (14B) with room to spare. A MacBook Air with 8GB runs Phi-4-mini comfortably. These models start in seconds and respond fast — ideal for devices that can't load a 70B model.

CODEBLOCK1

Phi-4-mini — fastest response times

CODEBLOCK2

OpenAI-compatible API

CODEBLOCK3

Ideal hardware pairings

Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.

Your device	RAM	Best Phi model	Why
MacBook Air (8GB)	8GB	INLINECODE4	Fits with room for other apps
Mac Mini (16GB)

16GB | phi4 | Full Phi-4 with headroom | | Mac Mini (24GB) | 24GB | phi4 | Can run Phi-4 + an embedding model simultaneously | | MacBook Pro (36GB) | 36GB | phi4 + phi4-mini | Both loaded, router picks based on task |

Monitor your fleet

CODEBLOCK4

Web dashboard at http://localhost:11435/dashboard — live view of nodes, queues, and performance.

Also available on this fleet

Larger LLMs (when you need more power)

Llama 3.3 (70B), Qwen 3.5, DeepSeek-R1, Mistral Large — route to a bigger machine in the fleet.

Image generation

CODEBLOCK5

Speech-to-text

CODEBLOCK6

Embeddings

CODEBLOCK7

Full documentation

- Agent Setup Guide — all 4 model types
API Reference — complete endpoint docs

Guardrails

- Model downloads require explicit user confirmation — Phi models are small (2-8GB) but still require confirmation.
Model deletion requires explicit user confirmation.
Never delete or modify files in ~/.fleet-manager/.
No models are downloaded automatically — all pulls are user-initiated or require opt-in.

Phi 4 — 微软的小模型，大成果

Phi 模型证明，你不需要 70B 参数也能获得出色成果。Phi-4 在推理基准测试中与更大的模型不相上下，同时能在低至 8GB MacBook Air 这样的硬件上运行。将它们部署到你的设备集群中，还能获得更高的吞吐量。

支持的 Phi 模型

模型	参数	Ollama 名称	所需内存	最佳用途
Phi-4	14B	phi4	10GB	推理、数学、代码 —— 以小博大
Phi-4-mini

3.8B | phi4-mini | 4GB | 在任何设备上超快运行，甚至 8GB Mac 也能胜任 | | Phi-3.5-mini | 3.8B | phi3.5 | 4GB | 久经考验的轻量级模型 | | Phi-3-medium | 14B | phi3:14b | 10GB | 质量与速度的平衡之选 |

快速开始

bash
pip install ollama-herd # PyPI: https://pypi.org/project/ollama-herd/
herd # 启动路由器（端口 11435）
herd-node # 在每个设备上运行 —— 自动发现路由器

安装过程中不会下载任何模型。所有拉取操作都需要用户确认。

为什么选择 Phi 用于小型设备

一台 16GB 内存的 Mac Mini 可以运行 Phi-4（14B）且仍有富余。一台 8GB 内存的 MacBook Air 可以流畅运行 Phi-4-mini。这些模型秒级启动且响应迅速 —— 非常适合无法加载 70B 模型的设备。

python
from openai import OpenAI

client = OpenAI(baseurl=http://localhost:11435/v1, apikey=not-needed)

Phi-4 用于推理

response = client.chat.completions.create( model=phi4, messages=[{role: user, content: 求解：如果 3x + 7 = 22，x 等于多少？}], ) print(response.choices[0].message.content)

Phi-4-mini —— 最快的响应时间

bash
curl http://localhost:11435/api/chat -d {
model: phi4-mini,
messages: [{role: user, content: 用 3 个要点总结以下内容：...}],
stream: false
}

兼容 OpenAI 的 API

bash
curl http://localhost:11435/v1/chat/completions \
-H Content-Type: application/json \
-d {model: phi4, messages: [{role: user, content: 为登录函数编写一个单元测试}]}

理想的硬件搭配

跨平台： 以下为示例配置。任何具有同等内存的设备（Mac、Linux、Windows）均可使用。设备集群路由器支持所有平台。

你的设备	内存	最佳 Phi 模型	原因
MacBook Air（8GB）	8GB	phi4-mini	运行后仍有空间运行其他应用
Mac Mini（16GB）

16GB | phi4 | 完整 Phi-4 且有余量 | | Mac Mini（24GB） | 24GB | phi4 | 可同时运行 Phi-4 和嵌入模型 | | MacBook Pro（36GB） | 36GB | phi4 + phi4-mini | 两个模型均加载，路由器根据任务选择 |

监控你的设备集群

bash

查看已加载的模型及其位置

curl -s http://localhost:11435/api/ps | python3 -m json.tool

设备集群健康概览

curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

基于你的硬件的模型推荐

curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool

Web 仪表盘位于 http://localhost:11435/dashboard —— 实时查看节点、队列和性能。

该设备集群上还可用的其他模型

更大的 LLM（当你需要更强算力时）

Llama 3.3（70B）、Qwen 3.5、DeepSeek-R1、Mistral Large —— 路由到集群中性能更强的机器。

图像生成

bash curl http://localhost:11435/api/generate-image \ -d {model: z-image-turbo, prompt: 极简电路板艺术, width: 512, height: 512}

语音转文字

bash curl http://localhost:11435/api/transcribe -F file=@meeting.wav -F model=qwen3-asr

嵌入

bash curl http://localhost:11435/api/embed \ -d {model: nomic-embed-text, input: 微软 Phi 小型语言模型}

完整文档

- 代理设置指南 —— 所有 4 种模型类型
API 参考 —— 完整端点文档

安全护栏

- 模型下载需要用户明确确认 —— Phi 模型虽小（2-8GB），但仍需确认。
模型删除需要用户明确确认。
切勿删除或修改 ~/.fleet-manager/ 中的文件。
不会自动下载任何模型 —— 所有拉取操作均由用户发起或需要用户选择加入。

phi-phi4微软Phi-4

phi-phi4

Phi 4 — Microsoft's Small Models, Big Results

Supported Phi models

Quick start

Why Phi for small devices

Phi-4-mini — fastest response times

OpenAI-compatible API

Ideal hardware pairings

Monitor your fleet

Also available on this fleet

Larger LLMs (when you need more power)

Image generation

Speech-to-text

Embeddings

Full documentation

Guardrails

Phi 4 — 微软的小模型，大成果

支持的 Phi 模型

快速开始

为什么选择 Phi 用于小型设备

Phi-4 用于推理

Phi-4-mini —— 最快的响应时间

兼容 OpenAI 的 API

理想的硬件搭配

监控你的设备集群

查看已加载的模型及其位置

设备集群健康概览

基于你的硬件的模型推荐

该设备集群上还可用的其他模型

更大的 LLM（当你需要更强算力时）

图像生成

语音转文字

嵌入

完整文档

安全护栏

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement