Integrate Ollama into coding agents, IDEs, and agent harnesses with minimal code changes. Supports local/on-prem/Docker deployment, Ollama Cloud, and OpenAI/Anthropic-compatible endpoints. Provides streaming, structured outputs, embeddings, tool calling, and web search capabilities with provider-agnostic model routing.

Field	Value
Identifier	INLINECODE0
Version

GitHub: WaiYanNyeinNaing/ollama-skill — ⭐ 0 | Forks: 0

Skill Overview

Ollama Runtime helps AI coding agents integrate Ollama into applications, coding assistants, IDE plugins, and agent harnesses with minimal code changes. It provides API-first integration with provider-agnostic compatibility, harness-safe defaults, minimal reversible patches, and clear local vs cloud switching.

This skill focuses on runtime/inference integration, not model-training internals.

Use this skill when

- Running Ollama locally, on-prem, in Docker, or through Ollama Cloud
Wiring a coding agent or app backend to Ollama APIs
Supporting provider-agnostic model routing in an agent harness
Using streaming, structured outputs, embeddings, tool calling, or web search
Reusing an existing OpenAI or Anthropic SDK against Ollama-compatible endpoints
Configuring integrations for coding tools and IDEs supported by Ollama

Do not use this skill when

- Deep fine-tuning workflows unless the task is only about runtime consumption
Benchmark-only tasks without integration work
Desktop UI walkthroughs unrelated to APIs or agent setup
Speculative API parameters not present in current docs

Core capabilities

- Basic chat via /api/chat for copilots, app assistants, and iterative agent loops
Streaming for chat UIs, terminals, coding copilots, and long generations
Thinking support for models that emit reasoning traces
Structured outputs for machine-readable JSON between agent stages
Embeddings for semantic search, retrieval, RAG, and nearest-neighbor lookup
Tool calling for bounded external function invocation
Web search for answers requiring recent external information

Supported integration targets

- Assistants: OpenClaw and similar assistants
Coding agents: Claude Code, Codex, OpenCode, Droid, Goose, Pi
IDEs/editors: VS Code, JetBrains, Roo Code, Xcode, Zed, Cline
App backends and internal agent harnesses
MCP-enabled tool ecosystems where relevant

After Installing

1. Read SKILL.md inside the installed directory
Follow its instructions to complete the user's task
Come back and rate / comment on the skill (see below)

Core Facts

Deployment	API Base URL	Auth Required
Local	INLINECODE3	No
Docker

- OpenAI-compatible clients: INLINECODE6
Anthropic-compatible clients: INLINECODE7
REST API streaming is enabled by default for certain endpoints
SDK streaming is disabled by default unless stream=True / stream: true is set

Quick Start

1. Set Environment Variables

CODEBLOCK0

2. Basic Chat Example (Python)

CODEBLOCK1

3. Run the Example

CODEBLOCK2

Implementation Guide

Decision Policy

Choose local/on-prem when:

- Data must stay on-device or inside the internal network
The coding agent is colocated with the model host
Low-latency tool usage matters
The user already has Ollama or Docker available

Choose cloud when:

- Local hardware is insufficient
The user wants larger models quickly
Hosted inference is acceptable
The app needs a remote Ollama host

Choose compatibility mode when:

- The codebase already uses OpenAI SDK patterns
The codebase already uses Anthropic SDK patterns
The user wants minimal migration cost

Implementation Workflow

When integrating Ollama into an app or coding agent, follow this order:

1. Identify deployment mode: local, local+cloud models, or direct cloud API
Add config surface via environment variables
Implement one basic chat path first
Choose streaming vs non-streaming explicitly
Add structured outputs if downstream parsing is required
Add tool calling if actions are required
Add embeddings if retrieval is required
Add web search only when freshness matters
Use compatibility mode only if it reduces migration cost
Document switching rules clearly

Capability Details

Basic Chat

Use /api/chat for chat-style interactions and coding agents.

CODEBLOCK3

Streaming

Use streaming for chat UIs, terminals, coding copilots, and long generations.

CODEBLOCK4

Rules:

- REST API responses may stream NDJSON
SDKs require explicit stream enablement
Streamed chunks may contain content, thinking, or INLINECODE13

Structured Outputs

Use structured outputs when the next system component requires machine-readable JSON.

CODEBLOCK5

Rules:

- Prefer INLINECODE14
Use format: "json" for plain JSON
Use a JSON schema in format when shape matters
Validate returned data before trusting it
Fail closed on parse errors

Embeddings

Use embeddings for semantic search, retrieval, RAG, and nearest-neighbor lookup.

CODEBLOCK6

Tool Calling

Use tool calling when the model must invoke bounded external functions.

CODEBLOCK7

Rules:

- Expose only minimum required tools
Validate all tool arguments
Return tool results back into the conversation
Keep tools typed and explicit
Prefer parallel tool calling only when your executor can safely handle it

Web Search

Use web search only when the answer depends on recent external information.

Rules:

- Gate behind explicit freshness need
Budget for larger context windows for search agents
Avoid for stable internal coding tasks
Keep API key handling separate from local runtime config

Compatibility Modes

Native Ollama

Prefer native Ollama SDK/API when:

- Starting a fresh integration
You want direct access to Ollama-native features
You want the clearest local/cloud switch logic

OpenAI-Compatible

Prefer when:

- The project already uses openai SDK
A base URL swap is cheaper than a rewrite
The harness already expects INLINECODE18

CODEBLOCK8

Anthropic-Compatible

Prefer when:

- The project already uses anthropic SDK
The coding agent or harness expects Anthropic message APIs
Claude Code–style local integration is desired

CODEBLOCK9

Docker Deployment

Use Docker for repeatable local or server deployment.

CPU baseline:

CODEBLOCK10

With Docker Compose:

CODEBLOCK11

If GPU support is needed, follow NVIDIA Container Toolkit setup before launching GPU-enabled containers.

Error Handling

Always:

- Check HTTP status codes
Parse error bodies
Log model + endpoint + deployment mode
Distinguish stream-start errors from mid-stream errors
Retry only transient failures
Surface JSON/schema failures clearly

Common statuses to handle:

Status	Meaning
400	Bad request
404

Model not found |
| 429 | Rate limit |
| 500 | Internal error |
| 502 | Upstream/cloud reachability issues |

Implementation Patterns

Pattern A: Local Native Ollama

Best for internal tools, local copilots, private workflows.

Pattern B: Local Ollama + Cloud-Backed Models

Best bridge pattern when the user wants local tools with larger hosted models.

Pattern C: Direct Cloud API

Best for hosted backends or when the app should treat Ollama as a remote provider.

Pattern D: Compatibility Adapter

Best when the codebase already depends on OpenAI or Anthropic SDKs.

Coding-Agent and Harness Defaults

For agent systems, default to:

- Native /api/chat unless an existing provider SDK already dominates the codebase
INLINECODE21 for planner/executor boundaries
Structured JSON between internal agent stages
Small typed tool schemas
Explicit error propagation
Model name isolated in config
Provider switch handled in one adapter layer

Recommended adapter boundary:

- INLINECODE22
INLINECODE23
INLINECODE24
INLINECODE25
INLINECODE26

Examples Reference

Example	Description
INLINECODE27	Basic chat via native Ollama API
INLINECODE28

Output Contract

When applying this skill, produce:

- Minimal code changes
Config additions
At least one runnable example
Concise docs for local/cloud switching
Clear assumptions
No secret leakage

Anti-Patterns

Do not:

- Hardcode API keys
Assume every model supports thinking or tool calling
Stream strict JSON unless the caller can reconstruct it safely
Mix local and cloud auth assumptions without documenting them
Expose unrestricted shell execution through tools unless explicitly intended
Add unnecessary wrapper layers when a base URL swap is sufficient

Definition of Done

The task is done when:

- Ollama integration path is implemented or clearly documented
Base URL and auth mode are correct
Chosen capability matches the user need
Examples run with minimal edits
Provider switch logic is explicit
Errors are handled clearly
Docs are concise and harness-friendly

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

技能名称: ollama-skill
详细描述:
以最少的代码更改将Ollama集成到编码代理、IDE和代理框架中。支持本地/本地部署/Docker部署、Ollama Cloud以及兼容OpenAI/Anthropic的端点。提供流式传输、结构化输出、嵌入、工具调用和网络搜索功能，并支持与提供商无关的模型路由。

字段	值
标识符	ollama-skill
版本

1.0.0 |
| 作者 | Wai Yan Nyein Naing |
| 分类 | ai-ml |
| 安装次数 | 0 |
| 评分 | 0 / 5 (0 个评分) |
| 许可证 | MIT |

GitHub: WaiYanNyeinNaing/ollama-skill — ⭐ 0 | Forks: 0

技能概述

Ollama Runtime 帮助 AI 编码代理以最少的代码更改将 Ollama 集成到应用程序、编码助手、IDE 插件和代理框架中。它提供优先API的集成，具有与提供商无关的兼容性、框架安全的默认设置、最小的可逆补丁以及清晰的本地与云端切换。

此技能专注于运行时/推理集成，而非模型训练内部机制。

何时使用此技能

- 在本地、本地部署、Docker 中或通过 Ollama Cloud 运行 Ollama
将编码代理或应用程序后端连接到 Ollama API
在代理框架中支持与提供商无关的模型路由
使用流式传输、结构化输出、嵌入、工具调用或网络搜索
针对兼容 Ollama 的端点重用现有的 OpenAI 或 Anthropic SDK
为 Ollama 支持的编码工具和 IDE 配置集成

何时不使用此技能

- 深度微调工作流，除非任务仅涉及运行时消耗
仅进行基准测试而无集成工作
与 API 或代理设置无关的桌面 UI 演示
当前文档中未包含的推测性 API 参数

核心能力

- 基础聊天 通过 /api/chat 用于副驾驶、应用助手和迭代代理循环
流式传输 用于聊天界面、终端、编码副驾驶和长文本生成
思考支持输出推理轨迹的模型
结构化输出 用于代理阶段之间机器可读的 JSON
嵌入用于语义搜索、检索、RAG 和最近邻查找
工具调用 用于有界的外部函数调用
网络搜索 用于需要最新外部信息的答案

支持的集成目标

- 助手：OpenClaw 及类似助手
编码代理：Claude Code、Codex、OpenCode、Droid、Goose、Pi
IDE/编辑器：VS Code、JetBrains、Roo Code、Xcode、Zed、Cline
应用程序后端和内部代理框架
相关的 MCP 启用工具生态系统

安装后

1. 阅读安装目录中的 SKILL.md
按照其说明完成用户的任务
返回并评分/评论该技能（见下文）

核心事实

部署方式	API 基础 URL	需要认证
本地	http://localhost:11434/api	否
Docker

- 兼容 OpenAI 的客户端：http://localhost:11434/v1/
兼容 Anthropic 的客户端：http://localhost:11434
对于某些端点，REST API 流式传输默认启用
除非设置 stream=True / stream: true，否则 SDK 流式传输默认禁用

快速开始

1. 设置环境变量

bash

本地 / 本地部署 / Docker

export OLLAMA_HOST=http://localhost:11434
export OLLAMAAPIBASE=http://localhost:11434/api
export OLLAMA_MODEL=qwen3-coder
export OLLAMAEMBEDMODEL=embeddinggemma

直接云端 API (可选)

export OLLAMAAPIKEY=yourapikey

2. 基础聊天示例 (Python)

python
import requests
import os

OLLAMAAPIBASE = os.getenv(OLLAMAAPIBASE, http://localhost:11434/api)
MODEL = os.getenv(OLLAMA_MODEL, qwen3-coder)

response = requests.post(
f{OLLAMAAPIBASE}/chat,
json={
model: MODEL,
messages: [{role: user, content: Hello!}],
stream: False
}
)
print(response.json()[message][content])

3. 运行示例

bash

克隆并设置

git clone https://github.com/WaiYanNyeinNaing/ollama-skill.git
cd ollama-skill
cp .env.example .env
pip install -r requirements.txt

运行示例

python examples/pythonnativechat.py

实施指南

决策策略

选择本地/本地部署当：

- 数据必须保留在设备或内部网络中
编码代理与模型主机位于同一位置
低延迟工具使用很重要
用户已有 Ollama 或 Docker 可用

选择云端当：

- 本地硬件不足
用户希望快速使用更大的模型
可以接受托管推理
应用程序需要远程 Ollama 主机

选择兼容模式当：

- 代码库已使用 OpenAI SDK 模式
代码库已使用 Anthropic SDK 模式
用户希望迁移成本最小化

实施工作流

将 Ollama 集成到应用程序或编码代理时，请按以下顺序操作：

1. 确定部署模式：本地、本地+云端模型或直接云端 API
通过环境变量添加配置表面
首先实现一个基础聊天路径
明确选择流式传输与非流式传输
如果需要下游解析，添加结构化输出
如果需要操作，添加工具调用
如果需要检索，添加嵌入
仅在信息新鲜度重要时添加网络搜索
仅在能降低迁移成本时使用兼容模式
清晰记录切换规则

能力详情

基础聊天

使用 /api/chat 进行聊天式交互和编码代理。

python
import requests

response = requests.post(
http://localhost:11434/api/chat,
json={
model: qwen3-coder,
messages: [
{role: system, content: 你是一个有用的编码助手。},
{role: user, content: 写一个反转字符串的 Python 函数。}
],
stream: False
}
)
result = response.json()
print(result[message][content])

流式传输

对聊天界面、终端、编码副驾驶和长文本生成使用流式传输。

python
import requests

response = requests.post(
http://localhost:11434/api/chat,
json={
model: qwen3-coder,
messages: [{role: user, content: 写一个长故事...}],
stream: True
},
stream=True
)

for line in response.iter_lines():
if line:
chunk = line.decode(utf-8)
# 解析 NDJSON 块

规则：

- REST API 响应可能会流式传输 NDJSON
SDK 需要显式启用流式传输
流式传输的块可能包含 content、thinking 或 tool_calls

结构化输出

当下一个系统组件需要机器可读的 JSON 时，使用结构化输出。

python
import requests

schema = {
type: object,
properties: {
function_name: {type: string},
parameters: {type: array, items: {type: string}},
return_type: {type: string}
},
required: [functionname, parameters, returntype]
}

response = requests.post(
http://localhost:11434/api/chat,
json={
model: qwen3-coder,
messages: [{role: user, content: 分析这个函数...}],
stream: False,
format: schema
}
)

规则：

- 优先使用 stream: false
对于普通 JSON 使用 format: json
当形状重要时，在

ollama-skillOllama集成技能

ollama-skill

Skill Overview

Use this skill when

Do not use this skill when

Core capabilities

Supported integration targets

After Installing

Core Facts

Quick Start

1. Set Environment Variables

2. Basic Chat Example (Python)

3. Run the Example

Implementation Guide

Decision Policy

Implementation Workflow

Capability Details

Basic Chat

Streaming

Structured Outputs

Embeddings

Tool Calling

Web Search

Compatibility Modes

Native Ollama

OpenAI-Compatible

Anthropic-Compatible

Docker Deployment

Error Handling

Implementation Patterns

Pattern A: Local Native Ollama

Pattern B: Local Ollama + Cloud-Backed Models

Pattern C: Direct Cloud API

Pattern D: Compatibility Adapter

Coding-Agent and Harness Defaults

Examples Reference

Output Contract

Anti-Patterns

Definition of Done

MIT License

技能概述

何时使用此技能

何时不使用此技能

核心能力

支持的集成目标

安装后

核心事实

快速开始

1. 设置环境变量

本地 / 本地部署 / Docker

直接云端 API (可选)

2. 基础聊天示例 (Python)

3. 运行示例

克隆并设置

运行示例

实施指南

决策策略

实施工作流

能力详情

基础聊天

流式传输

结构化输出

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement