Ollama Manager
You're helping someone wrangle their Ollama models. They've got Ollama models scattered across machines — some Ollama models loaded, some sitting cold on disk, some they forgot they pulled via Ollama six months ago. This skill gives you the tools to see every Ollama model, clean up the mess, and figure out what Ollama models they actually need.
The Ollama problem
Ollama makes it too easy to pull models. ollama pull this, ollama pull that — suddenly you've got 200GB of Ollama models across three machines and no idea which Ollama models you actually use. No way to see Ollama disk usage across machines. No way to compare which Ollama model is faster on which hardware. No "hey, you haven't touched this 40GB Ollama model in two weeks, maybe delete it?"
That's what Ollama Manager is for.
Get started with Ollama Manager
CODEBLOCK0
Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd
Connect to your Ollama fleet
The Ollama manager talks to an Ollama Herd router at http://localhost:11435. This router already knows about all your Ollama machines — it tracks heartbeats, loaded Ollama models, disk usage, and Ollama performance history.
See what Ollama models you've got
Every Ollama model available across all machines
CODEBLOCK1
Shows every Ollama model on every machine with sizes and which nodes have them.
What Ollama models are actually loaded in GPU memory right now
CODEBLOCK2
These are the "hot" Ollama models — ready to serve instantly. Everything else is cold on disk and needs Ollama loading time.
Per-machine Ollama breakdown with disk usage
CODEBLOCK3
The real picture: Ollama model sizes, last-used timestamps, which machines have which Ollama models, and how much disk each is eating.
Figure out what Ollama models to keep
Which Ollama models actually get used?
CODEBLOCK4
Which Ollama models haven't been touched?
CODEBLOCK5
If an Ollama model's last request was weeks ago, it's a candidate for deletion.
How much disk is each Ollama model using?
CODEBLOCK6
What Ollama models are fast and what's slow?
CODEBLOCK7
Get Ollama recommendations
What Ollama models should I be running?
CODEBLOCK8
AI-powered Ollama recommendations based on your actual hardware — RAM, cores, GPU memory. Tells you which Ollama models fit, which are too big, and the optimal Ollama model mix for your machines. Includes estimated RAM requirements and Ollama benchmark data.
Pull and delete Ollama models
Pull an Ollama model to a specific machine
CODEBLOCK9
The Ollama router picks the machine with the most free disk and memory if you're not sure which node to target.
Delete an Ollama model from a machine
CODEBLOCK10
Ollama Auto-pull (when enabled)
If a client requests an Ollama model that doesn't exist anywhere, the Ollama router can automatically pull it to the best machine. Toggle this:
CODEBLOCK11
Check Ollama fleet health
CODEBLOCK12
Automated Ollama checks for: Ollama model thrashing (models loading/unloading frequently — sign of memory pressure), disk pressure, and underutilized Ollama nodes that could take more models.
Ollama Dashboard
Open http://localhost:11435/dashboard and go to the Recommendations tab for a visual Ollama model management interface. One-click pull for recommended Ollama models. The Fleet Overview tab shows which Ollama models are loaded where in real time.
Ollama Guardrails
- - Never delete Ollama models without explicit user confirmation. Always show what Ollama model will be deleted and how much disk it frees.
- Never pull Ollama models without user confirmation. Ollama downloads can be 10-100+ GB.
- Never modify files in
~/.fleet-manager/ (contains Ollama data). - If the Ollama router isn't running, suggest
herd or uv run herd to start it.
Ollama Manager
你正在帮助某人管理他们的Ollama模型。他们的Ollama模型分散在多台机器上——有些Ollama模型已加载,有些闲置在磁盘上,还有一些是六个月前通过Ollama拉取后就被遗忘的。这项技能为你提供了查看所有Ollama模型、清理混乱局面以及确定他们实际需要的Ollama模型的工具。
Ollama的问题
Ollama让拉取模型变得过于简单。ollama pull这个,ollama pull那个——突然之间,你在三台机器上就有了200GB的Ollama模型,却不知道哪些Ollama模型是你实际使用的。无法跨机器查看Ollama磁盘使用情况。无法比较哪个Ollama模型在哪个硬件上运行更快。也没有嘿,你已经两周没碰这个40GB的Ollama模型了,也许该删掉它?这样的提醒。
这就是Ollama Manager的作用。
开始使用Ollama Manager
bash
pip install ollama-herd # 安装Ollama管理工具包
herd # 启动Ollama路由器(追踪你所有的Ollama机器)
herd-node # 在你想要管理的每台Ollama机器上运行
包:ollama-herd | 仓库:github.com/geeks-accelerator/ollama-herd
连接到你的Ollama集群
Ollama管理器与运行在http://localhost:11435的Ollama Herd路由器通信。该路由器已经知道你所有Ollama机器的信息——它追踪心跳、已加载的Ollama模型、磁盘使用情况和Ollama性能历史。
查看你有哪些Ollama模型
所有机器上的每个Ollama模型
bash
ollamaallmodels — 列出每个节点上的所有Ollama模型
curl -s http://localhost:11435/api/tags | python3 -m json.tool
显示每台机器上的每个Ollama模型,包括大小以及哪些节点拥有它们。
当前实际加载在GPU内存中的Ollama模型
bash
ollamahotmodels — 可立即提供服务的Ollama模型
curl -s http://localhost:11435/api/ps | python3 -m json.tool
这些是热Ollama模型——可立即提供服务。其他模型都冷存储在磁盘上,需要Ollama加载时间。
按机器划分的Ollama详情及磁盘使用情况
bash
ollamadiskusage — 每个节点的Ollama模型大小
curl -s http://localhost:11435/dashboard/api/model-management | python3 -m json.tool
真实情况:Ollama模型大小、最后使用时间戳、哪些机器拥有哪些Ollama模型,以及每个模型占用了多少磁盘空间。
确定要保留哪些Ollama模型
哪些Ollama模型实际被使用?
bash
sqlite3 ~/.fleet-manager/latency.db SELECT model, COUNT(*) as requests, SUM(COALESCE(completion
tokens,0)) as tokensgenerated, ROUND(AVG(latency
ms)/1000.0, 1) as avgsecs FROM request_traces WHERE status=completed GROUP BY model ORDER BY requests DESC
哪些Ollama模型很久没被碰过?
bash
sqlite3 ~/.fleet-manager/latency.db SELECT model, MAX(datetime(timestamp, unixepoch, localtime)) as last
used, COUNT(*) as totalrequests FROM request
traces GROUP BY model ORDER BY lastused ASC
如果某个Ollama模型的最后一次请求是几周前,那么它就是删除的候选对象。
每个Ollama模型占用多少磁盘空间?
bash
curl -s http://localhost:11435/dashboard/api/model-management | python3 -c
import sys, json
data = json.load(sys.stdin)
for node in data:
print(f\\\n{node[node_id]}:\)
ollama_total = 0
for m in node.get(models, []):
size = m.get(size_gb, 0)
ollama_total += size
print(f\ {m[name]:40s} {size:6.1f} GB\)
print(f\ {OLLAMA TOTAL:40s} {ollama_total:6.1f} GB\)
哪些Ollama模型快,哪些慢?
bash
sqlite3 ~/.fleet-manager/latency.db SELECT model, node
id, ROUND(AVG(latencyms)/1000.0, 1) as avg
secs, COUNT(*) as n FROM requesttraces WHERE status=completed GROUP BY model, node
id HAVING n > 5 ORDER BY avgsecs
获取Ollama推荐
我应该运行哪些Ollama模型?
bash
ollama_recommendations — 每个节点的最佳Ollama模型组合
curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool
基于你实际硬件(RAM、核心数、GPU内存)的AI驱动Ollama推荐。告诉你哪些Ollama模型适合,哪些太大,以及你机器的最佳Ollama模型组合。包括预估RAM需求和Ollama基准测试数据。
拉取和删除Ollama模型
将Ollama模型拉取到特定机器
bash
ollama_pull — 将Ollama模型下载到节点
curl -s -X POST http://localhost:11435/dashboard/api/pull \
-H Content-Type: application/json \
-d {model: llama3.3:70b, node_id: mac-studio}
如果你不确定目标节点,Ollama路由器会选择磁盘和内存最空闲的机器。
从机器上删除Ollama模型
bash
ollama_delete — 从节点移除Ollama模型
curl -s -X POST http://localhost:11435/dashboard/api/delete \
-H Content-Type: application/json \
-d {model: old-model:7b, node_id: mac-studio}
Ollama自动拉取(启用时)
如果客户端请求的Ollama模型在任何地方都不存在,Ollama路由器可以自动将其拉取到最佳机器。切换此功能:
bash
检查当前Ollama设置
curl -s http://localhost:11435/dashboard/api/settings | python3 -c import sys,json; print(json.load(sys.stdin)[config][toggles])
关闭Ollama自动拉取
curl -s -X POST http://localhost:11435/dashboard/api/settings \
-H Content-Type: application/json \
-d {auto_pull: false}
检查Ollama集群健康状态
bash
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool
自动化的Ollama检查包括:Ollama模型抖动(模型频繁加载/卸载——内存压力迹象)、磁盘压力,以及可以承载更多模型的未充分利用的Ollama节点。
Ollama仪表板
打开http://localhost:11435/dashboard并进入推荐标签页,获取可视化Ollama模型管理界面。一键拉取推荐的Ollama模型。集群概览标签页实时显示哪些Ollama模型加载在哪些位置。
Ollama安全护栏
- - 未经用户明确确认,绝不删除Ollama模型。 始终显示将要删除的Ollama模型以及释放的磁盘空间。
- 未经用户确认,绝不拉取Ollama模型。 Ollama下载可能达到10-100+ GB。
- 绝不修改~/.fleet-manager/中的文件(包含Ollama数据)。
- 如果Ollama路由器未运行,建议运行herd或uv run herd来启动它。