Apple Silicon AI — Your Macs Are the Cluster

Turn your Mac Studio, Mac Mini, MacBook Pro, or Mac Pro into a local Apple Silicon AI fleet. One endpoint routes LLM inference, image generation, speech-to-text, and embeddings across every Apple Silicon device on your network.

No cloud APIs. No GPU rentals. No Docker. Your Apple Silicon M1/M2/M3/M4 chips with unified memory are already better inference hardware than most cloud instances — you just need software that treats them as an Apple Silicon fleet.

Why Apple Silicon for AI

Apple Silicon unified memory keeps the entire model in one address space — no PCIe bottleneck, no CPU-GPU transfer overhead. A Mac Studio with M4 Ultra and 256GB runs 120B parameter models that would need multiple NVIDIA A100s. That is the Apple Silicon advantage.

Apple Silicon Chip	Unified Memory	LLM Sweet Spot	Apple Silicon Image Gen	Notes
M1 (8GB)	8GB	7B models	Slow	Entry-level Apple Silicon
M1 Pro/Max (32-64GB)

32-64GB | 14B-32B | Capable | Apple Silicon MacBook Pro |
| M2 Ultra (192GB) | 192GB | 70B-120B | Fast | Apple Silicon Mac Studio/Pro |
| M3 Max (128GB) | 128GB | 70B | Fast | Latest Apple Silicon MacBook Pro |
| M4 Max (128GB) | 128GB | 70B | Fast | Apple Silicon Mac Studio, newest gen |
| M4 Ultra (256GB) | 256GB | 120B+ | Very fast | Apple Silicon Mac Studio/Pro, largest models |

Apple Silicon Fleet Setup

1. Install on every Apple Silicon Mac

CODEBLOCK0

2. Start the Apple Silicon router (pick one Mac)

CODEBLOCK1

3. Start the Apple Silicon node agent on every Mac

CODEBLOCK2

That's it. Apple Silicon nodes discover the router automatically on your local network. No IP addresses to configure, no config files. For explicit connection, use herd-node --router-url http://<router-ip>:11435.

How Apple Silicon routing works

CODEBLOCK3

The Apple Silicon router scores each device on 7 signals and routes every request to the best available Mac — thermal state, memory fit, queue depth, and more.

Apple Silicon LLM Inference

Run Llama, Qwen, DeepSeek, Phi, Mistral, Gemma, and any Ollama model across your Apple Silicon fleet.

OpenAI-compatible API (Apple Silicon backend)

CODEBLOCK4

Ollama-compatible API

CODEBLOCK5

Apple Silicon Python Client

CODEBLOCK6

Apple Silicon Image Generation (mflux)

Generate images using MLX-native Flux models. Runs natively on Apple Silicon — no CUDA, no cloud.

CODEBLOCK7

Apple Silicon image generation performance:

- Mac Studio M4 Ultra: ~5s at 512px, ~14s at 1024px
MacBook Pro M3 Max: ~7s at 512px, ~18s at 1024px
Mac Mini M4: ~12s at 512px, ~30s at 1024px

Apple Silicon Speech-to-Text (Qwen ASR)

Transcribe audio locally on Apple Silicon using Qwen3-ASR via MLX. Meetings, voice notes, podcasts — no cloud, no Whisper API costs.

CODEBLOCK8

Supports WAV, MP3, M4A, FLAC. ~2s for a 30-second clip on Apple Silicon M4 Ultra.

Apple Silicon Embeddings

Embed documents across your Apple Silicon fleet using Ollama embedding models (nomic-embed-text, mxbai-embed-large, snowflake-arctic-embed).

CODEBLOCK9

Batch thousands of documents across Apple Silicon nodes instead of bottlenecking on one Mac.

Apple Silicon Fleet Monitoring

Dashboard

Open http://localhost:11435/dashboard — see every Apple Silicon Mac in your fleet: models loaded, queue depth, thermal state, memory usage, and health status.

Apple Silicon Fleet Status API

CODEBLOCK10

Returns every Apple Silicon node with hardware specs, loaded models, image/STT capabilities, and health metrics.

Apple Silicon Health Checks

CODEBLOCK11

15 automated checks: offline Apple Silicon nodes, memory pressure, thermal throttling, VRAM fallbacks, error rates, and more.

Recommended Models by Apple Silicon Hardware

Your Apple Silicon Mac	RAM	Recommended models
Mac Mini (16GB)	16GB	llama3.2:3b, phi4-mini, nomic-embed-text
Mac Mini (32GB)

The Apple Silicon router's model recommender analyzes your fleet hardware and suggests the optimal model mix: GET /dashboard/api/model-recommendations.

Full documentation

- Agent Setup Guide — complete Apple Silicon setup for all 4 model types
Configuration Reference — all 44+ environment variables
API Reference — all endpoints with request/response schemas
Troubleshooting — common Apple Silicon issues and fixes

Guardrails

- No automatic downloads: Apple Silicon model pulls are always user-initiated and require explicit confirmation. Downloads range from 2GB to 70GB+ depending on model size.
Model deletion requires confirmation: Never remove models from Apple Silicon nodes without explicit user approval.
All Apple Silicon requests stay local: No data leaves your local network — all inference happens on your Apple Silicon Macs.
No API keys: No accounts, no tokens, no cloud dependencies for your Apple Silicon fleet.
No external network access: The Apple Silicon router and nodes communicate only on your local network. No telemetry, no cloud callbacks.
Read-only local state: The only local files created are ~/.fleet-manager/latency.db (Apple Silicon routing metrics) and ~/.fleet-manager/logs/herd.jsonl (structured logs). Never delete or modify these files without user confirmation.

Apple Silicon AI — 你的Mac就是集群

将你的Mac Studio、Mac Mini、MacBook Pro或Mac Pro转变为一个本地Apple Silicon AI集群。一个端点即可将LLM推理、图像生成、语音转文本和嵌入任务路由到网络中的每一台Apple Silicon设备。

无需云API。无需租用GPU。无需Docker。你的Apple Silicon M1/M2/M3/M4芯片搭配统一内存，其推理硬件性能已超越大多数云实例——你只需要一款能将它们视为Apple Silicon集群的软件。

为什么选择Apple Silicon做AI

Apple Silicon统一内存将整个模型保存在一个地址空间中——没有PCIe瓶颈，没有CPU-GPU传输开销。搭载M4 Ultra和256GB内存的Mac Studio可以运行需要多块NVIDIA A100才能运行的120B参数模型。这就是Apple Silicon的优势。

Apple Silicon芯片	统一内存	LLM最佳适配	Apple Silicon图像生成	备注
M1 (8GB)	8GB	7B模型	慢	入门级Apple Silicon
M1 Pro/Max (32-64GB)

32-64GB | 14B-32B | 可用 | Apple Silicon MacBook Pro |
| M2 Ultra (192GB) | 192GB | 70B-120B | 快 | Apple Silicon Mac Studio/Pro |
| M3 Max (128GB) | 128GB | 70B | 快 | 最新Apple Silicon MacBook Pro |
| M4 Max (128GB) | 128GB | 70B | 快 | Apple Silicon Mac Studio，最新一代 |
| M4 Ultra (256GB) | 256GB | 120B+ | 非常快 | Apple Silicon Mac Studio/Pro，最大模型 |

Apple Silicon集群设置

1. 在每台Apple Silicon Mac上安装

bash
pip install ollama-herd # Apple Silicon优化推理路由器

2. 启动Apple Silicon路由器（选择一台Mac）

bash
herd # 在端口11435上启动Apple Silicon路由器

3. 在每台Mac上启动Apple Silicon节点代理

bash
herd-node # Apple Silicon节点自动发现路由器

就这样。Apple Silicon节点会在本地网络上自动发现路由器。无需配置IP地址，无需配置文件。如需显式连接，请使用herd-node --router-url http://:11435。

Apple Silicon路由工作原理

MacBook Pro (M3 Max, 64GB) ─┐
Mac Mini (M4, 32GB) ├──→ Apple Silicon路由器 (:11435) ←── 你的应用
Mac Studio (M4 Ultra, 256GB) ─┘

Apple Silicon路由器根据7个信号对每台设备进行评分，并将每个请求路由到最佳可用Mac——热状态、内存适配度、队列深度等。

Apple Silicon LLM推理

在你的Apple Silicon集群上运行Llama、Qwen、DeepSeek、Phi、Mistral、Gemma以及任何Ollama模型。

OpenAI兼容API（Apple Silicon后端）

bash
curl http://localhost:11435/v1/chat/completions \
-H Content-Type: application/json \
-d {
model: llama3.3:70b,
messages: [{role: user, content: 解释Apple Silicon统一内存架构}]
}

Ollama兼容API

bash
curl http://localhost:11435/api/chat \
-d {model: qwen3:32b, messages: [{role: user, content: 比较Apple Silicon M4与M3在AI推理方面的表现}]}

Apple Silicon Python客户端

python
from openai import OpenAI

Apple Silicon推理客户端

applesiliconclient = OpenAI(baseurl=http://localhost:11435/v1, apikey=unused)
applesiliconresponse = applesiliconclient.chat.completions.create(
model=deepseek-r1:70b,
messages=[{role: user, content: 为Apple Silicon优化此函数}]
)

Apple Silicon图像生成（mflux）

使用MLX原生Flux模型生成图像。原生运行于Apple Silicon——无需CUDA，无需云端。

bash
curl http://localhost:11435/api/generate-image \
-d {prompt: Apple Silicon Mac Studio渲染AI艺术，照片级真实感, model: z-image-turbo, width: 512, height: 512}

Apple Silicon图像生成性能：

- Mac Studio M4 Ultra：512px约5秒，1024px约14秒
MacBook Pro M3 Max：512px约7秒，1024px约18秒
Mac Mini M4：512px约12秒，1024px约30秒

Apple Silicon语音转文本（Qwen ASR）

使用通过MLX运行的Qwen3-ASR在Apple Silicon上本地转录音频。会议、语音笔记、播客——无需云端，无需Whisper API费用。

bash
curl http://localhost:11435/api/transcribe \
-F file=@applesiliconmeeting.wav \
-F model=qwen3-asr

支持WAV、MP3、M4A、FLAC格式。在Apple Silicon M4 Ultra上，30秒片段约需2秒。

Apple Silicon嵌入

使用Ollama嵌入模型（nomic-embed-text、mxbai-embed-large、snowflake-arctic-embed）在你的Apple Silicon集群上嵌入文档。

bash
curl http://localhost:11435/api/embed \
-d {model: nomic-embed-text, input: Apple Silicon统一内存架构用于AI推理}

跨Apple Silicon节点批量处理数千个文档，而不是在单台Mac上形成瓶颈。

Apple Silicon集群监控

仪表盘

打开http://localhost:11435/dashboard——查看集群中每台Apple Silicon Mac：加载的模型、队列深度、热状态、内存使用情况和健康状态。

Apple Silicon集群状态API

bash
curl http://localhost:11435/fleet/status

返回每个Apple Silicon节点的硬件规格、加载的模型、图像/STT能力和健康指标。

Apple Silicon健康检查

bash
curl http://localhost:11435/dashboard/api/health

15项自动检查：离线Apple Silicon节点、内存压力、热节流、VRAM回退、错误率等。

按Apple Silicon硬件推荐的模型

你的Apple Silicon Mac	内存	推荐模型
Mac Mini (16GB)	16GB	llama3.2:3b, phi4-mini, nomic-embed-text
Mac Mini (32GB)

Apple Silicon路由器的模型推荐器会分析你的集群硬件并建议最佳模型组合：GET /dashboard/api/model-recommendations。

完整文档

- 代理设置指南 — 所有4种模型类型的完整Apple Silicon设置
配置参考 — 所有44+个环境变量
API参考 — 所有端点及请求/响应模式
故障排除 — 常见Apple Silicon问题及修复

安全护栏

- 无自动下载：Apple Silicon模型拉取始终由用户发起，需要明确确认。下载大小从2GB到70GB+不等，取决于模型大小。
删除模型需要确认：未经用户明确批准，绝不从Apple Silicon节点移除模型。
所有Apple Silicon请求保持本地：无数据离开你的本地网络——所有推理都在你的Apple Silicon Mac上完成。
无需API密钥：你的Apple Silicon集群无需账户、无需令牌、无需云依赖。
无外部网络访问：Apple Silicon路由器和节点仅在你的本地网络上通信。无遥测、无云回调

apple-silicon-ai苹果芯片AI集群

apple-silicon-ai

Apple Silicon AI — Your Macs Are the Cluster

Why Apple Silicon for AI

Apple Silicon Fleet Setup

1. Install on every Apple Silicon Mac

2. Start the Apple Silicon router (pick one Mac)

3. Start the Apple Silicon node agent on every Mac

How Apple Silicon routing works

Apple Silicon LLM Inference

OpenAI-compatible API (Apple Silicon backend)

Ollama-compatible API

Apple Silicon Python Client

Apple Silicon Image Generation (mflux)

Apple Silicon Speech-to-Text (Qwen ASR)

Apple Silicon Embeddings

Apple Silicon Fleet Monitoring

Dashboard

Apple Silicon Fleet Status API

Apple Silicon Health Checks

Recommended Models by Apple Silicon Hardware

Full documentation

Guardrails

Apple Silicon AI — 你的Mac就是集群

为什么选择Apple Silicon做AI

Apple Silicon集群设置

1. 在每台Apple Silicon Mac上安装

2. 启动Apple Silicon路由器（选择一台Mac）

3. 在每台Mac上启动Apple Silicon节点代理

Apple Silicon路由工作原理

Apple Silicon LLM推理

OpenAI兼容API（Apple Silicon后端）

Ollama兼容API

Apple Silicon Python客户端

Apple Silicon推理客户端

Apple Silicon图像生成（mflux）

Apple Silicon语音转文本（Qwen ASR）

Apple Silicon嵌入

Apple Silicon集群监控

仪表盘

Apple Silicon集群状态API

Apple Silicon健康检查

按Apple Silicon硬件推荐的模型

完整文档

安全护栏

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement