Ollama OCR Skill
Use this skill when you need to recognize text from images using Ollama's local vision/OCR models. No internet required - fully offline OCR.
When to Use
- - User sends an image and wants text extraction
- User asks to recognize text from a screenshot or picture
- Need local offline OCR without cloud API dependency
- Processing sensitive images that shouldn't be sent to third parties
Models Available
| Model | Best For | Size |
|---|
| INLINECODE0 | Chinese text OCR | ~2.2GB |
| INLINECODE1 |
General image understanding | ~4.7GB |
|
moondream | Lightweight vision model | ~1.5GB |
|
llama3.2-vision:latest | Large vision model | ~7GB+ |
Ollama Endpoint
Default: http://172.17.0.2:11434 (Docker container to host gateway)
Note: Endpoint is pre-configured for OpenClaw running in Docker accessing host Ollama. Adjust OLLAMA_HOST in ollama_ocr.py if your setup differs.
Usage
Command Line
CODEBLOCK0
Examples:
CODEBLOCK1
Python API
CODEBLOCK2
Example Prompts to Activate This Skill
- - "识别这张图片里的文字"
- "帮我 OCR 一下这个截图"
- "Extract text from this image"
- "What text is in this screenshot?"
Notes
- - Image path must be absolute or relative to script location
- For large images, consider resizing first to avoid timeout
- INLINECODE7 works best for Chinese text
- Some models may have output quirks (e.g., glm-ocr occasionally repeats)
- First call may be slow if model isn't cached in memory
Requirements
- - Ollama installed and running
- At least one vision/OCR model downloaded (e.g.,
ollama pull glm-ocr:latest)
Ollama OCR 技能
当您需要使用Ollama本地视觉/OCR模型从图像中识别文字时,请使用此技能。无需互联网——完全离线OCR。
使用场景
- - 用户发送图片并希望提取文字
- 用户要求从截图或图片中识别文字
- 需要本地离线OCR,不依赖云API
- 处理不应发送给第三方的敏感图像
可用模型
| 模型 | 最佳用途 | 大小 |
|---|
| glm-ocr:latest | 中文文字OCR | ~2.2GB |
| llava:7b |
通用图像理解 | ~4.7GB |
| moondream | 轻量级视觉模型 | ~1.5GB |
| llama3.2-vision:latest | 大型视觉模型 | ~7GB以上 |
Ollama端点
默认:http://172.17.0.2:11434(Docker容器到宿主机网关)
注意: 端点已预配置为在Docker中运行的OpenClaw访问宿主机Ollama。如果您的设置不同,请在ollamaocr.py中调整OLLAMAHOST。
使用方法
命令行
bash
python3 ollamaocr.py /path/to/image.jpg [modelname]
示例:
bash
python3 ollama_ocr.py receipt.png glm-ocr:latest
python3 ollama_ocr.py screenshot.jpg llava:7b
Python API
python
from ollamaocr import ollamaocr
使用默认模型(glm-ocr)进行基本OCR
result = ollama_ocr(/path/to/image.jpg)
指定模型
result = ollama_ocr(/path/to/image.jpg, glm-ocr:latest)
print(result)
激活此技能的示例提示词
- - 识别这张图片里的文字
- 帮我 OCR 一下这个截图
- Extract text from this image
- What text is in this screenshot?
注意事项
- - 图片路径必须是绝对路径或相对于脚本位置的路径
- 对于大图片,建议先调整大小以避免超时
- glm-ocr 最适合中文文字
- 某些模型可能存在输出异常(例如,glm-ocr偶尔会重复)
- 如果模型未缓存到内存中,首次调用可能会较慢
要求
- - 已安装并运行Ollama
- 至少下载一个视觉/OCR模型(例如,ollama pull glm-ocr:latest)