Mac Studio AI — The Most Powerful Local AI Machine
The Mac Studio is the best hardware for local AI. Mac Studio M4 Ultra with 256GB of unified memory runs 120B+ parameter models. Mac Studio M3 Ultra with 512GB loads frontier models that need 4-8 NVIDIA A100s elsewhere. The Mac Studio runs everything in one memory pool — no PCIe bottleneck.
One Mac Studio is a powerhouse. Multiple Mac Studios become a fleet.
Mac Studio configurations for AI
| Mac Studio Config | Chip | Memory | GPU Cores | Mac Studio LLM Sweet Spot |
|---|
| Mac Studio M4 Max | M4 Max | 128GB | 40 | 70B models on Mac Studio |
| Mac Studio M4 Ultra |
M4 Ultra | 256GB | 80 | 120B+ models on Mac Studio |
| Mac Studio M3 Ultra | M3 Ultra | 192-512GB | 76 | 236B models on Mac Studio |
| Mac Studio M2 Ultra | M2 Ultra | 192GB | 76 | 70B-120B on Mac Studio |
Setup your Mac Studio
CODEBLOCK0
Mac Studios discover each other automatically on your local network.
Add Mac Studio image generation
CODEBLOCK1
Use your Mac Studio for AI inference
Mac Studio LLM inference — run the biggest models
CODEBLOCK2
Mac Studio image generation
CODEBLOCK3
Mac Studio speech-to-text
CODEBLOCK4
Mac Studio embeddings
CODEBLOCK5
Recommended models for Mac Studio
| Mac Studio Config | Models for this Mac Studio |
|---|
| Mac Studio M4 Max (128GB) | INLINECODE0 , qwen3:72b, deepseek-r1:70b, INLINECODE3 |
| Mac Studio M4 Ultra (256GB) |
gpt-oss:120b,
qwen3:110b, two 70B models simultaneously |
| Mac Studio M3 Ultra (512GB) |
deepseek-v3:236b (quantized), multiple 70B models at once |
Ask the Mac Studio for recommendations: INLINECODE7
Multiple Mac Studios as a fleet
CODEBLOCK6
The Mac Studio router scores each device on 7 signals. Big models route to the Mac Studio with the most memory.
Monitor your Mac Studio
Mac Studio dashboard at http://mac-studio:11435/dashboard — models loaded on each Mac Studio, queue depths, thermal state, memory.
CODEBLOCK7
Example Mac Studio fleet status response:
CODEBLOCK8
Full documentation
Contribute
Ollama Herd is open source (MIT). Built by Mac Studio owners for Mac Studio owners:
- - Star on GitHub — help other Mac Studio users find us
- Open an issue — share your Mac Studio AI setup
- PRs welcome —
CLAUDE.md gives AI agents full context. 444 tests, async Python.
Guardrails
- - No automatic downloads — Mac Studio model pulls require explicit user confirmation.
- Model deletion requires explicit user confirmation.
- All Mac Studio requests stay local — no data leaves your network.
- Never delete or modify files in
~/.fleet-manager/.
Mac Studio AI — 最强大的本地AI机器
Mac Studio是本地AI的最佳硬件。配备256GB统一内存的Mac Studio M4 Ultra可运行120B+参数模型。配备512GB内存的Mac Studio M3 Ultra可加载在其他设备上需要4-8块NVIDIA A100的前沿模型。Mac Studio在单一内存池中运行一切——无PCIe瓶颈。
一台Mac Studio就是一台性能猛兽。多台Mac Studio组成一个集群。
Mac Studio的AI配置
| Mac Studio配置 | 芯片 | 内存 | GPU核心数 | Mac Studio LLM最佳适配 |
|---|
| Mac Studio M4 Max | M4 Max | 128GB | 40 | Mac Studio上的70B模型 |
| Mac Studio M4 Ultra |
M4 Ultra | 256GB | 80 | Mac Studio上的120B+模型 |
| Mac Studio M3 Ultra | M3 Ultra | 192-512GB | 76 | Mac Studio上的236B模型 |
| Mac Studio M2 Ultra | M2 Ultra | 192GB | 76 | Mac Studio上的70B-120B模型 |
配置你的Mac Studio
bash
pip install ollama-herd # 在你的Mac Studio上安装
herd # 启动Mac Studio作为路由器(端口11435)
herd-node # 连接其他Mac Studio或其他设备
Mac Studio会在你的本地网络中自动发现彼此。
添加Mac Studio图像生成功能
bash
uv tool install mflux # Flux模型(Mac Studio M4 Ultra上512px约5秒)
uv tool install diffusionkit # Mac Studio上的Stable Diffusion 3/3.5
使用你的Mac Studio进行AI推理
Mac Studio LLM推理——运行最大的模型
python
from openai import OpenAI
连接到运行Ollama Herd的Mac Studio
mac
studio = OpenAI(baseurl=http://mac-studio:11435/v1, api_key=not-needed)
120B模型——在Mac Studio M4 Ultra(256GB统一内存)上流畅运行
response = mac_studio.chat.completions.create(
model=gpt-oss:120b, # 完全加载在Mac Studio统一内存中
messages=[{role: user, content: Mac Studio如何处理大型AI模型?}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or , end=)
Mac Studio图像生成
bash
通过mflux使用Flux——Mac Studio M4 Ultra上约5秒
curl -o mac
studioart.png http://mac-studio:11435/api/generate-image \
-H Content-Type: application/json \
-d {model: z-image-turbo, prompt: 极简桌面上的Mac Studio,带有全息AI显示屏, width: 1024, height: 1024}
Mac Studio上的Stable Diffusion 3——约9秒
curl -o mac
studiosd3.png http://mac-studio:11435/api/generate-image \
-H Content-Type: application/json \
-d {model: sd3-medium, prompt: Mac Studio M4 Ultra渲染AI艺术, width: 1024, height: 1024, steps: 20}
Mac Studio语音转文字
bash
通过Qwen3-ASR在Mac Studio上转写
curl http://mac-studio:11435/api/transcribe \
-F file=@mac
studiomeeting.wav \
-F model=qwen3-asr
Mac Studio嵌入向量
bash
在Mac Studio上生成嵌入向量
curl http://mac-studio:11435/api/embed \
-d {model: nomic-embed-text, input: Mac Studio M4 Ultra统一内存AI推理}
Mac Studio推荐模型
| Mac Studio配置 | 此Mac Studio适用的模型 |
|---|
| Mac Studio M4 Max (128GB) | llama3.3:70b, qwen3:72b, deepseek-r1:70b, codestral |
| Mac Studio M4 Ultra (256GB) |
gpt-oss:120b, qwen3:110b, 同时运行两个70B模型 |
| Mac Studio M3 Ultra (512GB) | deepseek-v3:236b(量化版),同时运行多个70B模型 |
向Mac Studio询问推荐:GET http://mac-studio:11435/dashboard/api/recommendations
多台Mac Studio组成集群
Mac Studio #1 (M4 Ultra, 256GB) ─┐
Mac Studio #2 (M4 Max, 128GB) ├──→ Mac Studio路由器 (:11435) ←── 你的应用
Mac Mini (32GB) ─┘
Mac Studio路由器根据7个信号对每台设备进行评分。大型模型会路由到内存最大的Mac Studio。
监控你的Mac Studio
Mac Studio仪表盘位于http://mac-studio:11435/dashboard——显示每台Mac Studio上加载的模型、队列深度、热状态和内存。
bash
Mac Studio集群状态
curl -s http://mac-studio:11435/fleet/status | python3 -m json.tool
Mac Studio健康检查
curl -s http://mac-studio:11435/dashboard/api/health | python3 -m json.tool
Mac Studio集群状态响应示例:
json
{
fleet: {nodesonline: 2, nodestotal: 2},
nodes: [
{nodeid: Mac-Studio-Ultra, memory: {totalgb: 256, used_gb: 120}},
{nodeid: Mac-Studio-Max, memory: {totalgb: 128, used_gb: 85}}
]
}
完整文档
贡献
Ollama Herd是开源项目(MIT协议)。由Mac Studio用户为Mac Studio用户构建:
- - 在GitHub上标星 — 帮助其他Mac Studio用户找到我们
- 提交问题 — 分享你的Mac Studio AI设置
- 欢迎提交PR — CLAUDE.md为AI代理提供完整上下文。444个测试,异步Python。
安全护栏
- - 无自动下载 — Mac Studio模型拉取需要明确的用户确认。
- 模型删除需要明确的用户确认。
- 所有Mac Studio请求保持本地 — 无数据离开你的网络。
- 切勿删除或修改~/.fleet-manager/中的文件。