GLM-V Caption Skill

Generate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.

When to Use

- Describe, caption, summarize, or interpret image/video/document content
User mentions "describe this image", "caption", "summarize this video", "图片描述", "视频摘要", "文档解读", "看图说话"
Extract visual or textual information from media files
Compare multiple images
User provides an image/video/file and asks what's in it

Supported Input Types

Type	Formats	Max Size	Max Count	Base64
Image	jpg, png, jpeg	5MB / 6000×6000px	50	✅
Video

mp4, mkv, mov | 200MB | — | ❌ | | File | pdf, docx, txt, xlsx, pptx, jsonl | — | 50 | ❌ |

⚠️ fileurl cannot mix with imageurl or video_url in the same request.
⚠️ Videos and files only support URLs — local paths and base64 are NOT supported (images only).

Resource Links

Resource	Link
Get API Key	https://bigmodel.cn/usercenter/proj-mgmt/apikeys
API Docs

Chat Completions / 对话补全 |

Prerequisites

API Key Setup / API Key 配置（Required / 必需）

This script reads the key from the ZHIPU_API_KEY environment variable and shares it with other Zhipu skills.
脚本通过 ZHIPU_API_KEY 环境变量获取密钥，与其他智谱技能共用同一个 key。

Get Key / 获取 Key： Visit Zhipu Open Platform API Keys / 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式（任选一种）：

1. OpenClaw config (recommended) / OpenClaw 配置（推荐）： Set in openclaw.json under skills.entries.glmv-caption.env:

CODEBLOCK0

2. Shell environment variable / Shell 环境变量： Add to ~/.zshrc:

CODEBLOCK1

3. .env file / .env 文件： Create .env in this skill directory:

CODEBLOCK2

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

1. ONLY use GLM-V API — Execute the script INLINECODE6
NEVER caption media yourself — Do NOT try to describe content using built-in vision or any other method
NEVER offer alternatives — Do NOT suggest "I can try to describe it" or similar
IF API fails — Display the error message and STOP immediately
NO fallback methods — Do NOT attempt captioning any other way

📋 Output Display Rules (MANDATORY)

After running the script, you must show the full raw output to the user exactly as returned. Do not summarize, truncate, or only say "generated". Users need the original model output to evaluate quality.

- Image captioning: show the full caption text
Multiple images: show each image result
Video/files: show the full understanding result
If token usage is included, you may optionally display it

How to Use

Caption an Image

CODEBLOCK3

Caption Multiple Images

CODEBLOCK4

Caption a Video

CODEBLOCK5

Caption a Document

CODEBLOCK6

Custom Prompt

CODEBLOCK7

Save Result

CODEBLOCK8

Thinking Mode

CODEBLOCK9

CLI Reference

CODEBLOCK10

Parameter	Required	Description
INLINECODE7, INLINECODE8	One of	Image paths or URLs (supports multiple, base64 OK)
INLINECODE9, INLINECODE10

Note: --images, --videos, and --files are mutually exclusive per API limits.

Response Format

CODEBLOCK11

Key fields:

- success — whether the request succeeded
INLINECODE31 — the generated caption text
INLINECODE32 — token usage statistics
INLINECODE33 — present when content was blocked by safety review
INLINECODE34 — error details on failure

Error Handling

API key not configured:

CODEBLOCK12

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Content filtered: warning field present → content blocked by safety review

GLM-V 描述技能

使用智谱GLM-V多模态模型为图片、视频和文档生成描述。

使用场景

- 描述、标注、总结或解读图片/视频/文档内容
用户提及描述这张图片、标注、总结这个视频、图片描述、视频摘要、文档解读、看图说话
从媒体文件中提取视觉或文本信息
比较多张图片
用户提供图片/视频/文件并询问其中内容

支持的输入类型

类型	格式	最大大小	最大数量	Base64
图片	jpg, png, jpeg	5MB / 6000×6000px	50	✅
视频

mp4, mkv, mov | 200MB | — | ❌ | | 文件 | pdf, docx, txt, xlsx, pptx, jsonl | — | 50 | ❌ |

⚠️ 同一请求中，fileurl 不能与 imageurl 或 video_url 混合使用。
⚠️ 视频和文件仅支持 URL — 不支持本地路径和 base64（仅图片支持）。

资源链接

资源	链接
获取 API Key	https://bigmodel.cn/usercenter/proj-mgmt/apikeys
API 文档

Chat Completions / 对话补全 |

前置条件

API Key 配置（必需）

此脚本从 ZHIPUAPIKEY 环境变量读取密钥，并与其他智谱技能共用同一个 key。

获取 Key： 访问智谱开放平台 API Keys 创建或复制你的密钥。

配置方式（任选一种）：

1. OpenClaw 配置（推荐）： 在 openclaw.json 的 skills.entries.glmv-caption.env 中设置：

json
glmv-caption: { enabled: true, env: { ZHIPUAPIKEY: 你的密钥 } }

2. Shell 环境变量： 添加到 ~/.zshrc：

bash
export ZHIPUAPIKEY=你的密钥

3. .env 文件： 在此技能目录中创建 .env 文件：

ZHIPUAPIKEY=你的密钥

⛔ 强制限制 - 不得违反 ⛔

1. 仅使用 GLM-V API — 执行脚本 python scripts/glmv_caption.py
切勿自行描述媒体内容 — 不要尝试使用内置视觉或其他方法描述内容
切勿提供替代方案 — 不要建议我可以尝试描述它或类似说法
如果 API 失败 — 显示错误信息并立即停止
无备用方法 — 不要尝试任何其他方式的描述

📋 输出显示规则（强制）

运行脚本后，必须将完整的原始输出原样展示给用户。不要总结、截断或只说已生成。用户需要原始模型输出来评估质量。

- 图片描述：显示完整描述文本
多张图片：显示每张图片的结果
视频/文件：显示完整的理解结果
如果包含 token 使用量，可选择显示

使用方法

描述单张图片

bash
python scripts/glmv_caption.py --images https://example.com/photo.jpg
python scripts/glmv_caption.py --images /path/to/photo.png

描述多张图片

bash
python scripts/glmv_caption.py --images img1.jpg img2.png https://example.com/img3.jpg

描述视频

bash
python scripts/glmv_caption.py --videos https://example.com/clip.mp4

描述文档

bash
python scripts/glmv_caption.py --files https://example.com/report.pdf
python scripts/glmv_caption.py --files https://example.com/doc1.docx https://example.com/doc2.txt

自定义提示词

bash
python scripts/glmv_caption.py --images photo.jpg --prompt 详细描述建筑风格

保存结果

bash
python scripts/glmv_caption.py --images photo.jpg --output result.json

思考模式

bash
python scripts/glmv_caption.py --images photo.jpg --thinking

CLI 参考

python {baseDir}/scripts/glmv_caption.py (--images IMG [IMG...] | --videos VID [VID...] | --files FILE [FILE...]) [OPTIONS]

参数	必需	描述
--images, -i	三者选一	图片路径或 URL（支持多个，base64 可用）
--videos, -v

注意： 根据 API 限制，--images、--videos 和 --files 互斥。

响应格式

json
{
success: true,
caption: 一张展现日落时分山脉景观的照片...,
usage: {
prompt_tokens: 128,
completion_tokens: 256,
total_tokens: 384
}
}

关键字段：

- success — 请求是否成功
caption — 生成的描述文本
usage — token 使用统计
warning — 当内容被安全审查拦截时出现
error — 失败时的错误详情

错误处理

API key 未配置：

ZHIPUAPIKEY 未配置。请在以下地址获取 API Key：https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ 向用户显示确切错误，引导其进行配置

认证失败（401/403）： API key 无效/已过期 → 重新配置

速率限制（429）： 配额已用完 → 告知用户等待

文件未找到： 本地文件缺失 → 检查路径

内容被过滤： 出现 warning 字段 → 内容被安全审查拦截

glmv-caption智谱生成描述

glmv-caption

GLM-V Caption Skill

When to Use

Supported Input Types

Resource Links

Prerequisites

API Key Setup / API Key 配置（Required / 必需）

📋 Output Display Rules (MANDATORY)

How to Use

Caption an Image

Caption Multiple Images

Caption a Video

Caption a Document

Custom Prompt

Save Result

Thinking Mode

CLI Reference

Response Format

Error Handling

GLM-V 描述技能

使用场景

支持的输入类型

资源链接

前置条件

API Key 配置（必需）

📋 输出显示规则（强制）

使用方法

描述单张图片

描述多张图片

描述视频

描述文档

自定义提示词

保存结果

思考模式

CLI 参考

响应格式

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement