llmfit-advisor

Hardware-aware local LLM advisor. Detects your system specs (RAM, CPU, GPU/VRAM) and recommends models that actually fit, with optimal quantization and speed estimates.

When to use (trigger phrases)

Use this skill immediately when the user asks any of:

- "what local models can I run?"
"which LLMs fit my hardware?"
"recommend a local model"
"what's the best model for my GPU?"
"can I run Llama 70B locally?"
"configure local models"
"set up Ollama models"
"what models fit my VRAM?"
"help me pick a local model for coding"

Also use this skill when:

- The user wants to configure models.providers.ollama or INLINECODE1
The user mentions running models locally and you need to know what fits
A model recommendation is needed and the user has local inference capability (Ollama, vLLM, LM Studio)

Quick start

Detect hardware

CODEBLOCK0

Returns JSON with CPU, RAM, GPU name, VRAM, multi-GPU info, and whether memory is unified (Apple Silicon).

Get top recommendations

CODEBLOCK1

Returns the top 5 models ranked by a composite score (quality, speed, fit, context) with optimal quantization for the detected hardware.

Filter by use case

CODEBLOCK2

Valid use cases: general, coding, reasoning, chat, multimodal, embedding.

Filter by minimum fit level

CODEBLOCK3

Valid fit levels (best to worst): perfect, good, marginal.

Understanding the output

System JSON

CODEBLOCK4

Recommendation JSON

Each model in the models array includes:

Field	Meaning
INLINECODE12	HuggingFace model ID (e.g. `meta-llama/Llama-3.1-8B-Instruct`)
INLINECODE14

Fit levels explained

- Perfect: Model fits comfortably with room to spare. Ideal choice.
Good: Model fits but uses most available memory. Will work well.
Marginal: Model barely fits. May work but expect slower performance or reduced context.
TooTight: Model does not fit. Do not recommend.

Run modes explained

- GPU: Full GPU inference. Fastest. Model weights loaded entirely into VRAM.
CPU+GPU Offload: Some layers on GPU, rest in system RAM. Slower than pure GPU.
CPU Only: All inference on CPU using system RAM. Slowest but works without GPU.

Configuring OpenClaw with results

After getting recommendations, configure the user's local model provider.

For Ollama

Map the HuggingFace model name to its Ollama tag. Common mappings:

llmfit name	Ollama tag
INLINECODE40	INLINECODE41
INLINECODE42

Then update openclaw.json:

CODEBLOCK5

And optionally set as default:

CODEBLOCK6

For vLLM / LM Studio

Use the HuggingFace model name directly as the model identifier with the appropriate provider prefix (vllm/ or lmstudio/).

Workflow example

When a user asks "what local models can I run?":

1. Run llmfit --json system to show hardware summary
Run llmfit recommend --json --limit 5 to get top picks
Present the recommendations with scores and fit levels
If the user wants to configure one, map it to the appropriate Ollama/vLLM/LM Studio tag
Offer to update openclaw.json with the chosen model

When a user asks for a specific use case like "recommend a coding model":

1. Run INLINECODE66
Present the coding-specific recommendations
Offer to pull via Ollama and configure

Notes

- llmfit detects NVIDIA GPUs (via nvidia-smi), AMD GPUs (via rocm-smi), and Apple Silicon (unified memory).
Multi-GPU setups aggregate VRAM across cards automatically.
The best_quant field tells you the optimal quantization — higher quant (Q6K, Q80) means better quality if VRAM allows.
Speed estimates (estimated_tps) are approximate and vary by hardware and quantization.
Models with fit_level: "TooTight" should never be recommended to users.

llmfit-advisor

硬件感知的本地LLM顾问。检测您的系统规格（内存、CPU、GPU/显存），并推荐实际适配的模型，提供最佳量化方案和速度估算。

使用时机（触发短语）

当用户提出以下任一问题时，立即使用此技能：

- 我能运行哪些本地模型？
哪些LLM适合我的硬件？
推荐一个本地模型
我的GPU最适合什么模型？
我能在本地运行Llama 70B吗？
配置本地模型
设置Ollama模型
哪些模型适合我的显存？
帮我选一个用于编程的本地模型

在以下情况也使用此技能：

- 用户想要配置 models.providers.ollama 或 models.providers.lmstudio
用户提到在本地运行模型，你需要知道哪些模型适配
需要模型推荐，且用户具备本地推理能力（Ollama、vLLM、LM Studio）

快速开始

检测硬件

bash
llmfit --json system

返回包含CPU、内存、GPU名称、显存、多GPU信息以及内存是否统一（Apple Silicon）的JSON数据。

获取最佳推荐

bash
llmfit recommend --json --limit 5

返回按综合评分（质量、速度、适配度、上下文）排名前5的模型，并针对检测到的硬件提供最佳量化方案。

按使用场景筛选

bash
llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 3
llmfit recommend --json --use-case chat --limit 3

有效使用场景：general（通用）、coding（编程）、reasoning（推理）、chat（聊天）、multimodal（多模态）、embedding（嵌入）。

按最低适配等级筛选

bash
llmfit recommend --json --min-fit good --limit 10

有效适配等级（从好到差）：perfect（完美）、good（良好）、marginal（勉强）。

理解输出结果

系统JSON

json
{
system: {
cpu_name: Apple M2 Max,
cpu_cores: 12,
totalramgb: 32.0,
availableramgb: 24.5,
has_gpu: true,
gpu_name: Apple M2 Max,
gpuvramgb: 32.0,
gpu_count: 1,
backend: Metal,
unified_memory: true
}
}

字段	含义
name	HuggingFace模型ID（例如 meta-llama/Llama-3.1-8B-Instruct）
provider

适配等级说明

- Perfect（完美）：模型适配良好，且有富余空间。理想选择。
Good（良好）：模型适配，但占用大部分可用内存。运行效果良好。
Marginal（勉强）：模型勉强适配。可能可以运行，但预计性能较慢或上下文受限。
TooTight（太紧）：模型不适配。不推荐。

运行模式说明

- GPU：完全GPU推理。速度最快。模型权重完全加载到显存中。
CPU+GPU Offload（CPU+GPU卸载）：部分层在GPU上运行，其余在系统内存中。速度比纯GPU慢。
CPU Only（仅CPU）：所有推理在CPU上使用系统内存运行。速度最慢，但无需GPU即可运行。

使用结果配置OpenClaw

获取推荐后，配置用户的本地模型提供商。

针对Ollama

将HuggingFace模型名称映射到其Ollama标签。常见映射：

llmfit名称	Ollama标签
meta-llama/Llama-3.1-8B-Instruct	llama3.1:8b
meta-llama/Llama-3.3-70B-Instruct

然后更新 openclaw.json：

json
{
models: {
providers: {
ollama: {
models: [ollama/]
}
}
}
}

并可选择设置为默认：

json
{
agents: {
defaults: {
model: {
primary: ollama/
}
}
}
}

针对vLLM / LM Studio

直接使用HuggingFace模型名称作为模型标识符，并加上相应的提供商前缀（vllm/ 或 lmstudio/）。

工作流程示例

当用户询问我能运行哪些本地模型？时：

1. 运行 llmfit --json system 显示硬件摘要
运行 llmfit recommend --json --limit 5 获取最佳推荐
展示带评分和适配等级的推荐结果
如果用户想配置某个模型，将其映射到相应的Ollama/vLLM/LM Studio标签
提供更新 openclaw.json 并添加所选模型的选项

当用户询问特定使用场景，如推荐一个编程模型时：

1. 运行 llmfit recommend --json --use-case coding --limit 3
展示编程相关的推荐结果
提供通过Ollama拉取并配置的选项

注意事项

- llmfit可检测NVIDIA GPU（通过nvidia-smi）、AMD GPU（通过rocm-smi）和Apple Silicon（统一内存）。
多GPU设置会自动聚合各显卡的显存。
bestquant 字段指示最佳量化方案——如果显存允许，更高的量化等级（Q6K、Q80）意味着更好的质量。
速度估算（estimatedtps）为近似值，因硬件和量化方案而异。
fit_level: TooTight 的模型绝不应推荐给用户。

llmfit-advisor硬件适配推荐

llmfit-advisor

llmfit-advisor

When to use (trigger phrases)

Quick start

Detect hardware

Get top recommendations

Filter by use case

Filter by minimum fit level

Understanding the output

System JSON

Recommendation JSON

Fit levels explained

Run modes explained

Configuring OpenClaw with results

For Ollama

For vLLM / LM Studio

Workflow example

Notes

llmfit-advisor

使用时机（触发短语）

快速开始

检测硬件

获取最佳推荐

按使用场景筛选

按最低适配等级筛选

理解输出结果

系统JSON

推荐JSON

适配等级说明

运行模式说明

使用结果配置OpenClaw

针对Ollama

针对vLLM / LM Studio

工作流程示例

注意事项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement