Agent setup: If your agent doesn't auto-load skills (e.g. Claude Code),
see agent-compatibility.md once per session.

Qwen Vision (Image & Video Understanding)

Analyze images and videos using Qwen VL and QVQ models.
This skill is part of qwencloud/qwencloud-ai.

Skill directory

Use this skill's internal files to execute and learn. Load reference files on demand when the default path fails or you need details.

Location	Purpose
INLINECODE0	Image/video understanding, multi-image, thinking mode
INLINECODE1

Security

NEVER output any API key or credential in plaintext. Always use variable references ($DASHSCOPE_API_KEY in shell, os.environ["DASHSCOPE_API_KEY"] in Python). Any check or detection of credentials must be non-plaintext: report only status (e.g. "set" / "not set", "valid" / "invalid"), never the value. Never display contents of .env or config files that may contain secrets.

When the API key is not configured, NEVER ask the user to provide it directly. Instead, help create a .env file with a placeholder (DASHSCOPE_API_KEY=sk-your-key-here) and instruct the user to replace it with their actual key from the QwenCloud Console. Only write the actual key value if the user explicitly requests it.

Key Compatibility

Scripts require a standard QwenCloud API key (sk-...). Coding Plan keys (sk-sp-...) cannot be used for direct API calls and do not support dedicated vision models (qwen3-vl-plus, qvq-max, etc.). The scripts detect sk-sp- keys at startup and print a warning. If qwencloud-ops-auth is installed, see its references/codingplan.md for full details.

Model Selection

Model	Use Case
qwen3.5-plus	Preferred — unified multimodal (text+image+video). Thinking on by default.
qwen3.5-flash

1. User specified a model → use directly.
Consult the qwencloud-model-selector skill when model choice depends on requirement, scenario, or pricing.
No signal, clear task → qwen3.5-plus. Use qwen3-vl-plus for precise localization or 3D detection.

⚠️ Important: The model list above is a point-in-time snapshot and may be outdated. Model availability
changes frequently. Always check the official model list
for the authoritative, up-to-date catalog before making model decisions.

Execution

Prerequisites

- API Key: Check that DASHSCOPE_API_KEY (or QWEN_API_KEY) is set using a non-plaintext check only (e.g. in shell:

[ -n "$DASHSCOPE_API_KEY" ]; report only "set" or "not set", never the key value). If not set: run the * qwencloud-ops-auth* skill if available; otherwise guide the user to obtain a key from QwenCloud Console and set it via .env file ( echo 'DASHSCOPE_API_KEY=sk-your-key-here' >> .env in project root or current directory) or environment variable. The script searches for .env in the current working directory and the project root. Skills may be installed independently — do not assume qwencloud-ops-auth is present.

- Python 3.9+ (stdlib only, no pip install needed)

Environment Check

Before first execution, verify Python is available:

CODEBLOCK0

If python3 is not found, try python --version or py -3 --version. If Python is unavailable or below 3.9, skip to Path 2 (curl) in execution-guide.md.

Default: Run Script

Script path: Scripts are in the scripts/ subdirectory of this skill's directory (the directory containing this SKILL.md). You MUST first locate this skill's installation directory, then ALWAYS use the full absolute path to execute scripts. Do NOT assume scripts are in the current working directory. Do NOT use cd to switch directories before execution. Shared infrastructure lives in scripts/vision_lib.py.

Execution note: Run all scripts in the foreground — wait for stdout; do not background.

Discovery: Run python3 <this-skill-dir>/scripts/analyze.py --help (or reason.py, ocr.py) first to see all available arguments.

Script	Purpose	Default Model
INLINECODE37	Image understanding, multi-image, video, thinking mode, high-res	INLINECODE38
INLINECODE39

Input type fields (use exactly one in --request JSON):

Field	Use for	Example
INLINECODE44	Single image (URL or local path)	INLINECODE45
INLINECODE46

⚠️ Common mistake: Do NOT use "image" for video files — use "video" instead.

CODEBLOCK1

Argument	Description
INLINECODE54	JSON request body
INLINECODE55

Verify Result

- Exit code 0 + output contains valid JSON with choices field → success
Non-zero exit, HTTP error, empty response, or JSON with "code"/"message" error → fail
Post-execution check: When --output is used, verify the response JSON file exists and contains expected content
MANDATORY — stderr signal check: After confirming the result, scan the command's stderr output for [ACTION_REQUIRED] or [UPDATE_AVAILABLE]. If either signal is present, you MUST follow the instructions in Update Check below before responding to the user.

On Failure

If scripts fail, match the error output against the diagnostic table below to determine the resolution. If no match, read execution-guide.md for alternative paths: curl commands (Path 2), code generation (Path 3), and autonomous resolution (Path 5).

If Python is not available at all → skip directly to Path 2 (curl) in execution-guide.md.

Error Pattern	Diagnosis	Resolution
INLINECODE68	Python not on PATH	Try `python` or `py -3`; install Python 3.9+ if missing
INLINECODE71

File Input

The API accepts: HTTP/HTTPS URL, Base64 data URI, and oss:// URL. Local file paths are NOT directly supported — scripts handle conversion automatically. Pass local paths directly; no manual upload step needed.

Large file rule: If the local file is >= 7 MB, always add --upload-files. Base64 encoding inflates size by ~33% and will exceed the 10 MB API limit. Small files (including short video clips < 7 MB) can use the default base64 path.

Method	When to use	How
Online URL	File already hosted	Pass URL directly — preferred for large files
Base64 (default)

Production: Default temp storage has 48h TTL and 100 QPS upload limit — not suitable for production, high-concurrency, or load-testing. To use your own OSS bucket, set QWEN_TMP_OSS_BUCKET and QWEN_TMP_OSS_REGION in .env, install pip install alibabacloud-oss-v2, and provide credentials via QWEN_TMP_OSS_AK_ID / QWEN_TMP_OSS_AK_SECRET or the standard OSS_ACCESS_KEY_ID / OSS_ACCESS_KEY_SECRET. Use a RAM user with least-privilege (oss:PutObject + oss:GetObject on target bucket only). The --upload-files flag is still required for vision scripts to trigger upload. If qwencloud-ops-auth is installed, see its references/custom-oss.md for the full setup guide.

Input from Other Skills

When the input file comes from another skill's output (e.g., image-gen, video-gen):

- Pass the URL directly (e.g., "image": "<image_url from image-gen>") — do NOT download the URL first
Downloading and re-passing as a local path wastes bandwidth and triggers unnecessary base64 encoding or OSS upload
All URL types are supported: https://, oss://, INLINECODE107

Thinking Mode

Model	Thinking Default	Notes
INLINECODE108 / INLINECODE109	On	Disable with `enable_thinking: false` for simple tasks.
INLINECODE111 / INLINECODE112

See visual-reasoning.md for details.

OCR (qwen-vl-ocr)

Optimized for text extraction. Supports multi-language, skewed images, tables, formulas. See ocr.md for parameters and examples.

Input Limits

Images: BMP/JPEG/PNG/TIFF/WEBP/HEIC. Min 10px sides, aspect ratio <= 200:1. Max 20 MB (URL, Qwen3.5) / 10 MB (others).

Videos: MP4/AVI/MKV/MOV/FLV/WMV. Duration 2s–2h (Qwen3.5) / 2s–10min (others). Max 2 GB (URL) / 10 MB (base64). fps range [0.1, 10], default 2.0.

Error Handling

HTTP	Meaning	Action
401	Invalid or missing API key	Run qwencloud-ops-auth if available; else prompt user to set key (non-plaintext check only)
400

Output Location

Prefer the current working directory. Default subdirectory: ./output/qwencloud-vision/.

Write prohibition: NEVER write output files into this skill's installation directory or any skills/ hierarchy. All generated content must go to output/ under the current working directory or a user-specified path.

Update Check (MANDATORY Post-Execution)

After script execution, check stderr for action signals. The script automatically detects update status and emits structured signals. You MUST act on them. This section also applies when the user explicitly asks to check for updates.

Signal: `[ACTION_REQUIRED] qwencloud-update-check skill is not installed`

When stderr contains this signal, you MUST present the following options to the user and wait for their response before proceeding. Do NOT silently skip this.

The update-check skill is not installed. It keeps you informed about new qwencloud/qwencloud-ai features and bug fixes.
1) Install now — I'll run the install command for you
2) Skip for now — do nothing this time
3) Never remind me — permanently suppress this prompt

After the user responds:

- User chooses 1 → run: npx skills add QwenCloud/qwencloud-ai --skill qwencloud-update-check -y, then run: INLINECODE120
User chooses 2 → run: python3 <this-skill-scripts-dir>/gossamer.py --dismiss, then continue normally
User chooses 3 → run: python3 <this-skill-scripts-dir>/gossamer.py --never-install, then continue normally

Signal: `[UPDATE_AVAILABLE]`

When stderr contains this signal, you MUST append a brief update notice to your response, including the version info and the update command shown in the stderr output.

No signal in stderr

If stderr contains neither [ACTION_REQUIRED] nor [UPDATE_AVAILABLE], no action is needed — the skill is installed and up to date (or cached within 24h).

Explicit user request

When the user explicitly asks to check for updates (e.g. "check for updates", "check version"):

1. Look for qwencloud-update-check/SKILL.md in sibling skill directories.
If found — run: python3 <qwencloud-update-check-dir>/scripts/check_update.py --print-response and report the result.
If not found — present the install options above.

References

- execution-guide.md — Fallback paths (curl, code generation, autonomous)
curl-examples.md — Curl templates (base64, multi-image, video, OCR)
api-guide.md — API supplementary guide
visual-reasoning.md — QVQ visual reasoning guide
ocr.md — Qwen-VL-OCR text extraction guide
sources.md — Official documentation URLs

Agent 设置：如果你的代理不会自动加载技能（例如 Claude Code），请在每个会话中查看一次 agent-compatibility.md。

Qwen 视觉（图像与视频理解）

使用 Qwen VL 和 QVQ 模型分析图像和视频。
此技能是 qwencloud/qwencloud-ai 的一部分。

技能目录

使用此技能的内部文件来执行和学习。当默认路径失败或需要详细信息时，按需加载参考文件。

位置	用途
scripts/analyze.py	图像/视频理解、多图像、思考模式
scripts/reason.py

安全

切勿以明文形式输出任何 API 密钥或凭证。 始终使用变量引用（shell 中使用 $DASHSCOPEAPIKEY，Python 中使用 os.environ[DASHSCOPEAPIKEY]）。任何对凭证的检查或检测必须是非明文的：仅报告状态（例如“已设置”/“未设置”、“有效”/“无效”），切勿报告值本身。切勿显示可能包含机密的 .env 或配置文件内容。

当 API 密钥未配置时，切勿要求用户直接提供。 相反，应帮助创建一个包含占位符（DASHSCOPEAPIKEY=sk-your-key-here）的 .env 文件，并指导用户从 QwenCloud 控制台将其替换为实际密钥。仅当用户明确要求时，才写入实际的密钥值。

密钥兼容性

脚本需要标准的 QwenCloud API 密钥（sk-...）。编码计划密钥（sk-sp-...）不能用于直接 API 调用，也不支持专用的视觉模型（qwen3-vl-plus、qvq-max 等）。脚本在启动时会检测 sk-sp- 密钥并打印警告。如果安装了 qwencloud-ops-auth，请参阅其 references/codingplan.md 以获取完整详情。

模型选择

模型	使用场景
qwen3.5-plus	首选 — 统一多模态（文本+图像+视频）。默认开启思考模式。
qwen3.5-flash

1. 用户指定了模型 → 直接使用。
当模型选择取决于需求、场景或定价时，请咨询 qwencloud-model-selector 技能。
无信号，任务明确 → qwen3.5-plus。对于精确定位或 3D 检测，使用 qwen3-vl-plus。

⚠️ 重要提示：上述模型列表是时间点快照，可能已过时。模型可用性
经常变化。在做出模型决策之前，务必查看官方模型列表
以获取权威的最新目录。

执行

前提条件

- API 密钥：使用非明文检查（例如在 shell 中：[ -n $DASHSCOPEAPIKEY ]；仅报告“已设置”或“未设置”，切勿报告密钥值）来确认 DASHSCOPEAPIKEY（或 QWENAPIKEY）是否已设置。如果未设置：运行 qwencloud-ops-auth 技能（如果可用）；否则指导用户从 QwenCloud 控制台获取密钥，并通过 .env 文件（在项目根目录或当前目录执行 echo DASHSCOPEAPI_KEY=sk-your-key-here >> .env）或环境变量进行设置。脚本会在当前工作目录和项目根目录中搜索 .env。技能可能独立安装 — 不要假定 qwencloud-ops-auth 存在。
Python 3.9+（仅标准库，无需 pip 安装）

环境检查

在首次执行之前，确认 Python 可用：

bash
python3 --version # 必须是 3.9+

如果找不到 python3，请尝试 python --version 或 py -3 --version。如果 Python 不可用或低于 3.9，请跳转到 execution-guide.md 中的路径 2 (curl)。

默认：运行脚本

脚本路径：脚本位于此技能目录（包含此 SKILL.md 的目录）的 scripts/ 子目录中。你必须首先找到此技能的安装目录，然后始终使用完整的绝对路径来执行脚本。 不要假定脚本位于当前工作目录。不要在执行前使用 cd 切换目录。共享基础设施位于 scripts/vision_lib.py。

执行说明： 在前台运行所有脚本 — 等待标准输出；不要后台运行。

发现： 首先运行 python3 <此技能目录>/scripts/analyze.py --help（或 reason.py、ocr.py）以查看所有可用参数。

脚本	用途	默认模型
scripts/analyze.py	图像理解、多图像、视频、思考模式、高分辨率	qwen3.5-plus
scripts/reason.py

输入类型字段（在 --request JSON 中仅使用一个）：

字段	用途	示例
image	单张图像（URL 或本地路径）	image: photo.jpg
images

⚠️ 常见错误：不要对视频文件使用 image — 应使用 video。

bash

图像分析

python3 <此技能目录>/scripts/analyze.py \
--request {prompt:这张图片里有什么？,image:https://example.com/photo.jpg} \
--output output/qwencloud-vision/result.json --print-response

视频分析（本地文件 — 对于 >= 7 MB 的文件添加 --upload-files）

python3 <此技能目录>/scripts/analyze.py \ --request {prompt:描述这个视频中发生了什么,video:clip.mp4} \ --upload-files --print-response

python3 <此技能目录>/scripts/reason.py \
--request {prompt:逐步解决这个数学问题,image:problem.png} \
--print-response

python3 <此技能目录>/scripts/ocr.py \
--request {image

qwencloud-vision通义千问视觉

qwencloud-vision

Qwen Vision (Image & Video Understanding)

Skill directory

Security

Key Compatibility

Model Selection

Execution

Prerequisites

Environment Check

Default: Run Script

Verify Result

On Failure

File Input

Input from Other Skills

Thinking Mode

OCR (qwen-vl-ocr)

Input Limits

Error Handling

Output Location

Update Check (MANDATORY Post-Execution)

Signal: [ACTION_REQUIRED] qwencloud-update-check skill is not installed

Signal: [UPDATE_AVAILABLE]

No signal in stderr

Explicit user request

References

Qwen 视觉（图像与视频理解）

技能目录

安全

密钥兼容性

模型选择

执行

前提条件

环境检查

默认：运行脚本

图像分析

视频分析（本地文件 — 对于 >= 7 MB 的文件添加 --upload-files）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

Signal: `[ACTION_REQUIRED] qwencloud-update-check skill is not installed`

Signal: `[UPDATE_AVAILABLE]`