Ollama Manager

You're helping someone wrangle their Ollama models. They've got Ollama models scattered across machines — some Ollama models loaded, some sitting cold on disk, some they forgot they pulled via Ollama six months ago. This skill gives you the tools to see every Ollama model, clean up the mess, and figure out what Ollama models they actually need.

The Ollama problem

Ollama makes it too easy to pull models. ollama pull this, ollama pull that — suddenly you've got 200GB of Ollama models across three machines and no idea which Ollama models you actually use. No way to see Ollama disk usage across machines. No way to compare which Ollama model is faster on which hardware. No "hey, you haven't touched this 40GB Ollama model in two weeks, maybe delete it?"

That's what Ollama Manager is for.

Get started with Ollama Manager

CODEBLOCK0

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Connect to your Ollama fleet

The Ollama manager talks to an Ollama Herd router at http://localhost:11435. This router already knows about all your Ollama machines — it tracks heartbeats, loaded Ollama models, disk usage, and Ollama performance history.

See what Ollama models you've got

Every Ollama model available across all machines

CODEBLOCK1

Shows every Ollama model on every machine with sizes and which nodes have them.

What Ollama models are actually loaded in GPU memory right now

CODEBLOCK2

These are the "hot" Ollama models — ready to serve instantly. Everything else is cold on disk and needs Ollama loading time.

Per-machine Ollama breakdown with disk usage

CODEBLOCK3

The real picture: Ollama model sizes, last-used timestamps, which machines have which Ollama models, and how much disk each is eating.

Figure out what Ollama models to keep

Which Ollama models actually get used?

CODEBLOCK4

Which Ollama models haven't been touched?

CODEBLOCK5

If an Ollama model's last request was weeks ago, it's a candidate for deletion.

How much disk is each Ollama model using?

CODEBLOCK6

What Ollama models are fast and what's slow?

CODEBLOCK7

Get Ollama recommendations

What Ollama models should I be running?

CODEBLOCK8

AI-powered Ollama recommendations based on your actual hardware — RAM, cores, GPU memory. Tells you which Ollama models fit, which are too big, and the optimal Ollama model mix for your machines. Includes estimated RAM requirements and Ollama benchmark data.

Pull and delete Ollama models

Pull an Ollama model to a specific machine

CODEBLOCK9

The Ollama router picks the machine with the most free disk and memory if you're not sure which node to target.

Delete an Ollama model from a machine

CODEBLOCK10

Ollama Auto-pull (when enabled)

If a client requests an Ollama model that doesn't exist anywhere, the Ollama router can automatically pull it to the best machine. Toggle this: CODEBLOCK11

Check Ollama fleet health

CODEBLOCK12

Automated Ollama checks for: Ollama model thrashing (models loading/unloading frequently — sign of memory pressure), disk pressure, and underutilized Ollama nodes that could take more models.

Ollama Dashboard

Open http://localhost:11435/dashboard and go to the Recommendations tab for a visual Ollama model management interface. One-click pull for recommended Ollama models. The Fleet Overview tab shows which Ollama models are loaded where in real time.

Ollama Guardrails

- Never delete Ollama models without explicit user confirmation. Always show what Ollama model will be deleted and how much disk it frees.
Never pull Ollama models without user confirmation. Ollama downloads can be 10-100+ GB.
Never modify files in ~/.fleet-manager/ (contains Ollama data).
If the Ollama router isn't running, suggest herd or uv run herd to start it.

Ollama Manager

你正在帮助某人管理他们的Ollama模型。他们的Ollama模型分散在多台机器上——有些Ollama模型已加载，有些闲置在磁盘上，还有一些是六个月前通过Ollama拉取后就被遗忘的。这项技能为你提供了查看所有Ollama模型、清理混乱局面以及确定他们实际需要的Ollama模型的工具。

Ollama的问题

Ollama让拉取模型变得过于简单。ollama pull这个，ollama pull那个——突然之间，你在三台机器上就有了200GB的Ollama模型，却不知道哪些Ollama模型是你实际使用的。无法跨机器查看Ollama磁盘使用情况。无法比较哪个Ollama模型在哪个硬件上运行更快。也没有嘿，你已经两周没碰这个40GB的Ollama模型了，也许该删掉它？这样的提醒。

这就是Ollama Manager的作用。

开始使用Ollama Manager

bash
pip install ollama-herd # 安装Ollama管理工具包
herd # 启动Ollama路由器（追踪你所有的Ollama机器）
herd-node # 在你想要管理的每台Ollama机器上运行

包：ollama-herd | 仓库：github.com/geeks-accelerator/ollama-herd

连接到你的Ollama集群

Ollama管理器与运行在http://localhost:11435的Ollama Herd路由器通信。该路由器已经知道你所有Ollama机器的信息——它追踪心跳、已加载的Ollama模型、磁盘使用情况和Ollama性能历史。

查看你有哪些Ollama模型

所有机器上的每个Ollama模型

bash

ollamaallmodels — 列出每个节点上的所有Ollama模型

curl -s http://localhost:11435/api/tags | python3 -m json.tool

显示每台机器上的每个Ollama模型，包括大小以及哪些节点拥有它们。

当前实际加载在GPU内存中的Ollama模型

bash

ollamahotmodels — 可立即提供服务的Ollama模型

curl -s http://localhost:11435/api/ps | python3 -m json.tool

这些是热Ollama模型——可立即提供服务。其他模型都冷存储在磁盘上，需要Ollama加载时间。

按机器划分的Ollama详情及磁盘使用情况

bash

ollamadiskusage — 每个节点的Ollama模型大小

curl -s http://localhost:11435/dashboard/api/model-management | python3 -m json.tool

真实情况：Ollama模型大小、最后使用时间戳、哪些机器拥有哪些Ollama模型，以及每个模型占用了多少磁盘空间。

确定要保留哪些Ollama模型

哪些Ollama模型实际被使用？

bash sqlite3 ~/.fleet-manager/latency.db SELECT model, COUNT(*) as requests, SUM(COALESCE(completiontokens,0)) as tokensgenerated, ROUND(AVG(latencyms)/1000.0, 1) as avgsecs FROM request_traces WHERE status=completed GROUP BY model ORDER BY requests DESC

哪些Ollama模型很久没被碰过？

bash sqlite3 ~/.fleet-manager/latency.db SELECT model, MAX(datetime(timestamp, unixepoch, localtime)) as lastused, COUNT(*) as totalrequests FROM requesttraces GROUP BY model ORDER BY lastused ASC

如果某个Ollama模型的最后一次请求是几周前，那么它就是删除的候选对象。

每个Ollama模型占用多少磁盘空间？

bash curl -s http://localhost:11435/dashboard/api/model-management | python3 -c import sys, json data = json.load(sys.stdin) for node in data: print(f\\\n{node[node_id]}:\) ollama_total = 0 for m in node.get(models, []): size = m.get(size_gb, 0) ollama_total += size print(f\ {m[name]:40s} {size:6.1f} GB\) print(f\ {OLLAMA TOTAL:40s} {ollama_total:6.1f} GB\)

哪些Ollama模型快，哪些慢？

bash sqlite3 ~/.fleet-manager/latency.db SELECT model, nodeid, ROUND(AVG(latencyms)/1000.0, 1) as avgsecs, COUNT(*) as n FROM requesttraces WHERE status=completed GROUP BY model, nodeid HAVING n > 5 ORDER BY avgsecs

获取Ollama推荐

我应该运行哪些Ollama模型？

bash

ollama_recommendations — 每个节点的最佳Ollama模型组合

curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool

基于你实际硬件（RAM、核心数、GPU内存）的AI驱动Ollama推荐。告诉你哪些Ollama模型适合，哪些太大，以及你机器的最佳Ollama模型组合。包括预估RAM需求和Ollama基准测试数据。

拉取和删除Ollama模型

将Ollama模型拉取到特定机器

bash

ollama_pull — 将Ollama模型下载到节点

curl -s -X POST http://localhost:11435/dashboard/api/pull \ -H Content-Type: application/json \ -d {model: llama3.3:70b, node_id: mac-studio}

如果你不确定目标节点，Ollama路由器会选择磁盘和内存最空闲的机器。

从机器上删除Ollama模型

bash

ollama_delete — 从节点移除Ollama模型

curl -s -X POST http://localhost:11435/dashboard/api/delete \ -H Content-Type: application/json \ -d {model: old-model:7b, node_id: mac-studio}

Ollama自动拉取（启用时）

如果客户端请求的Ollama模型在任何地方都不存在，Ollama路由器可以自动将其拉取到最佳机器。切换此功能： bash

检查当前Ollama设置

curl -s http://localhost:11435/dashboard/api/settings | python3 -c import sys,json; print(json.load(sys.stdin)[config][toggles])

关闭Ollama自动拉取

curl -s -X POST http://localhost:11435/dashboard/api/settings \ -H Content-Type: application/json \ -d {auto_pull: false}

检查Ollama集群健康状态

bash curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

自动化的Ollama检查包括：Ollama模型抖动（模型频繁加载/卸载——内存压力迹象）、磁盘压力，以及可以承载更多模型的未充分利用的Ollama节点。

Ollama仪表板

打开http://localhost:11435/dashboard并进入推荐标签页，获取可视化Ollama模型管理界面。一键拉取推荐的Ollama模型。集群概览标签页实时显示哪些Ollama模型加载在哪些位置。

Ollama安全护栏

- 未经用户明确确认，绝不删除Ollama模型。 始终显示将要删除的Ollama模型以及释放的磁盘空间。
未经用户确认，绝不拉取Ollama模型。 Ollama下载可能达到10-100+ GB。
绝不修改~/.fleet-manager/中的文件（包含Ollama数据）。
如果Ollama路由器未运行，建议运行herd或uv run herd来启动它。

ollama-managerOllama模型管理

ollama-manager

Ollama Manager

The Ollama problem

Get started with Ollama Manager

Connect to your Ollama fleet

See what Ollama models you've got

Every Ollama model available across all machines

What Ollama models are actually loaded in GPU memory right now

Per-machine Ollama breakdown with disk usage

Figure out what Ollama models to keep

Which Ollama models actually get used?

Which Ollama models haven't been touched?

How much disk is each Ollama model using?

What Ollama models are fast and what's slow?

Get Ollama recommendations

What Ollama models should I be running?

Pull and delete Ollama models

Pull an Ollama model to a specific machine

Delete an Ollama model from a machine

Ollama Auto-pull (when enabled)

Check Ollama fleet health

Ollama Dashboard

Ollama Guardrails

Ollama Manager

Ollama的问题

开始使用Ollama Manager

连接到你的Ollama集群

查看你有哪些Ollama模型

所有机器上的每个Ollama模型

ollamaallmodels — 列出每个节点上的所有Ollama模型

当前实际加载在GPU内存中的Ollama模型

ollamahotmodels — 可立即提供服务的Ollama模型

按机器划分的Ollama详情及磁盘使用情况

ollamadiskusage — 每个节点的Ollama模型大小

确定要保留哪些Ollama模型

哪些Ollama模型实际被使用？

哪些Ollama模型很久没被碰过？

每个Ollama模型占用多少磁盘空间？

哪些Ollama模型快，哪些慢？

获取Ollama推荐

我应该运行哪些Ollama模型？

ollama_recommendations — 每个节点的最佳Ollama模型组合

拉取和删除Ollama模型

将Ollama模型拉取到特定机器

ollama_pull — 将Ollama模型下载到节点

从机器上删除Ollama模型

ollama_delete — 从节点移除Ollama模型

Ollama自动拉取（启用时）

检查当前Ollama设置

关闭Ollama自动拉取

检查Ollama集群健康状态

Ollama仪表板

Ollama安全护栏

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement