vlm-image-helper视觉图像辅助

Visual inspection helper for VLM and OCR workflows. Use when agent needs to help a vision model see an image more clearly before re-analysis: rotate misaligned or sideways text, crop to a relevant region, zoom small details, enhance readability, or convert an image for re-input. Trigger especially when the model cannot confidently read text, cannot tell similar characters apart such as O/0 or I/l/1, says the image is unclear, needs to inspect only one area of the image, or would benefit from a s

作者: admin | 来源: ClawHub

VLM 图像助手

将此技能视为模型的视觉辅助工具，而非通用图像编辑器。

使用 scripts/image_helper.py 创建更清晰的中间图像，然后对该结果重新进行分析。

核心工作流程

1. 从原始图像路径、原始 base64 字符串或数据 URI 开始。
应用最可能消除歧义的最小变换。
除非已知精确区域，否则优先使用语义裁剪预设而非手动坐标。
将处理后的图像以文件或 base64 形式返回，然后重新读取该结果。
如果图像仍不清晰，则进行一次更紧密的裁剪或更强的缩放迭代，而非一次性叠加多次编辑。

快速命令

bash

旋转侧向文本

python scripts/image_helper.py image.png --rotate 90 -o rotated.png

裁剪可能区域并缩放

python scripts/image_helper.py image.png --crop-preset bottom-right --scale-preset x3 -o detail.png

改善低对比度文本

python scripts/image_helper.py image.png --auto-enhance -o enhanced.png

将现有文件路径直接转换为 base64

python scripts/image_helper.py image.png --base64

选择下一步操作

- 文本侧向或倒置：使用 --rotate。
仅关注某个区域：先使用 --crop-preset，然后添加 --scale-preset。
小文本或图标难以辨认：使用 --scale-preset x2 或 x3。
对比度弱或边缘模糊：使用 --auto-enhance，或手动调整 --contrast 和 --sharpness。
其他工具需要内联图像数据而非文件路径：添加 --base64。
源图像以原始 base64 或数据 URI 形式传入：使用 --input-mode auto 或强制指定 --input-mode base64 / data-uri。

输入和输出规则

- 接受文件路径、原始 base64 字符串或数据 URI 作为输入。
使用 -o 返回文件，或使用 --base64 返回内联 base64。
当唯一目标是格式转换或路径转 base64 时，允许无编辑的直通输出。

参考

- 完整 CLI 参考：references/cli-reference.md
裁剪和缩放预设表：references/presets.md

前置条件

如果缺少 Pillow，请安装：

bash
pip install Pillow

或

uv pip install Pillow

vlm-image-helper视觉图像辅助

vlm-image-helper

VLM Image Helper

Core Workflow

Quick Commands

Choose the Next Action

Input and Output Rules

References

Prerequisite

VLM 图像助手

核心工作流程

快速命令

旋转侧向文本

裁剪可能区域并缩放

改善低对比度文本

将现有文件路径直接转换为 base64

选择下一步操作

输入和输出规则

参考

前置条件

或

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

vlm-image-helper视觉图像辅助

vlm-image-helper

VLM Image Helper

Core Workflow

Quick Commands

Choose the Next Action

Input and Output Rules

References

Prerequisite

VLM 图像助手

核心工作流程

快速命令

旋转侧向文本

裁剪可能区域并缩放

改善低对比度文本

将现有文件路径直接转换为 base64

选择下一步操作

输入和输出规则

参考

前置条件

或

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement