Category: provider

Model Studio Qwen VL (Image Understanding)

Validation

CODEBLOCK0

Pass criteria: command exits 0 and output/aliyun-qwen-vl/validate.txt is generated.

Output And Evidence

- Save raw model responses and normalized extraction results to output/aliyun-qwen-vl/.
Include input image reference and prompt for traceability.

Use Qwen VL models for image input + text output understanding tasks via DashScope compatible-mode API.

Prerequisites

- Install dependencies (recommended in a venv):

CODEBLOCK1

- Set DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.

Critical model names

Prefer the Qwen3 VL family:

- INLINECODE5
INLINECODE6

When you need explicit "latest" routing or reproducible snapshots, use supported aliases/snapshots from the official model list, such as:

- INLINECODE7
INLINECODE8
INLINECODE9
INLINECODE10

Legacy names still seen in some workloads:

- INLINECODE11
INLINECODE12

For OCR-specialized extraction, prefer skills/ai/multimodal/aliyun-qwen-ocr/ instead of using the general VL skill.

Normalized interface (multimodal.chat)

Request

- prompt (string, required): user question/instruction about image.
INLINECODE15 (string, required): HTTPS URL, local path, or data: URL.
INLINECODE17 (string, optional): default qwen3-vl-plus.
INLINECODE19 (int, optional): default 512.
INLINECODE21 (float, optional): default 0.2.
INLINECODE23 (string, optional): auto/low/high, default auto.
INLINECODE28 (bool, optional): return JSON-only response when possible.
INLINECODE29 (object, optional): JSON Schema for structured extraction.
INLINECODE30 (int, optional): retry count for 429/5xx, default 2.
INLINECODE33 (float, optional): exponential backoff base seconds, default 1.5.

Response

- text (string): primary model answer.
INLINECODE36 (string): model actually used.
INLINECODE37 (object): token usage if returned by backend.

Quickstart

CODEBLOCK2

Using local image:

CODEBLOCK3

Structured extraction (JSON mode):

CODEBLOCK4

Structured extraction (JSON Schema):

CODEBLOCK5

cURL (compatible mode)

CODEBLOCK6

Output location

- If --output is set, JSON response is saved to that file.
Default output dir convention: output/aliyun-qwen-vl/.

Smoke test

CODEBLOCK7

Error handling

Error	Likely cause	Action
401/403	Missing or invalid key	Check `DASHSCOPE_API_KEY` and account permissions.
400

Operational guidance

- For stable production behavior, pin snapshot model IDs instead of pure -latest.
Compress very large images before upload to reduce latency and cost.
Add explicit extraction constraints in prompt (fields, JSON shape, language).
For OCR-like output, ask for confidence notes and unresolved text markers.

Workflow

1) Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
2) Run one minimal read-only query first to verify connectivity and permissions.
3) Execute the target operation with explicit parameters and bounded scope.
4) Verify results and save output/evidence files.

References

- Source list: INLINECODE43
API notes: INLINECODE44

技能名称: aliyun-qwen-vl
详细描述:
分类: 提供者

Model Studio Qwen VL（图像理解）

验证

bash
mkdir -p output/aliyun-qwen-vl
python -m pycompile skills/ai/multimodal/aliyun-qwen-vl/scripts/analyzeimage.py && echo pycompileok > output/aliyun-qwen-vl/validate.txt

通过标准：命令退出码为 0 且已生成 output/aliyun-qwen-vl/validate.txt。

输出与证据

- 将原始模型响应和标准化提取结果保存到 output/aliyun-qwen-vl/。
包含输入图像引用和提示词，以便追溯。

通过 DashScope 兼容模式 API，使用 Qwen VL 模型执行图像输入 + 文本输出的理解任务。

前提条件

- 安装依赖（建议在虚拟环境中进行）：

bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests

- 在环境中设置 DASHSCOPEAPIKEY，或将 dashscopeapikey 添加到 ~/.alibabacloud/credentials。

关键模型名称

优先使用 Qwen3 VL 系列：

- qwen3-vl-plus
qwen3-vl-flash

当需要明确的“最新”路由或可复现的快照时，请使用官方模型列表中支持的别名/快照，例如：

- qwen3-vl-plus-latest
qwen3-vl-plus-2025-12-19
qwen3-vl-flash-2026-01-22
qwen3-vl-flash-latest

某些工作负载中仍可见的旧名称：

- qwen-vl-max-latest
qwen-vl-plus-latest

对于 OCR 专用提取，建议使用 skills/ai/multimodal/aliyun-qwen-ocr/，而非通用 VL 技能。

标准化接口（multimodal.chat）

请求

- prompt（字符串，必填）：用户关于图像的提问/指令。
image（字符串，必填）：HTTPS URL、本地路径或 data: URL。
model（字符串，可选）：默认为 qwen3-vl-plus。
maxtokens（整数，可选）：默认为 512。
temperature（浮点数，可选）：默认为 0.2。
detail（字符串，可选）：auto/low/high，默认为 auto。
jsonmode（布尔值，可选）：尽可能返回纯 JSON 响应。
schema（对象，可选）：用于结构化提取的 JSON Schema。
maxretries（整数，可选）：针对 429/5xx 的重试次数，默认为 2。
retrybackoff_s（浮点数，可选）：指数退避的基础秒数，默认为 1.5。

响应

- text（字符串）：模型的主要回答。
model（字符串）：实际使用的模型。
usage（对象）：如果后端返回，则为令牌使用情况。

快速开始

bash
python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
--request {prompt:总结此图像中的主要内容,image:https://example.com/demo.jpg} \
--print-response

使用本地图像：

bash
python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
--request {prompt:从图像中提取关键信息,image:./samples/invoice.png,model:qwen3-vl-plus} \
--print-response

结构化提取（JSON 模式）：

bash
python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
--request {prompt:提取字段：标题、金额、日期,image:./samples/invoice.png} \
--json-mode \
--print-response

结构化提取（JSON Schema）：

bash
python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
--request {prompt:提取发票字段,image:./samples/invoice.png} \
--schema skills/ai/multimodal/aliyun-qwen-vl/references/examples/invoice.schema.json \
--print-response

cURL（兼容模式）

bash
curl -sS https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H Authorization: Bearer $DASHSCOPEAPIKEY \
-H Content-Type: application/json \
-d {
model:qwen3-vl-plus,
messages:[
{
role:user,
content:[
{type:imageurl,imageurl:{url:https://example.com/demo.jpg}},
{type:text,text:描述此图像并列出可执行的操作}
]
}
],
max_tokens:512,
temperature:0.2
}

输出位置

- 如果设置了 --output，JSON 响应将保存到该文件。
默认输出目录约定：output/aliyun-qwen-vl/。

冒烟测试

bash
python tests/ai/multimodal/aliyun-qwen-vl-test/scripts/smoketestqwen_vl.py \
--image ./tmp/vltestcat.png

错误处理

错误	可能原因	操作
401/403	密钥缺失或无效	检查 DASHSCOPEAPIKEY 和账户权限。
400

操作指南

- 对于稳定的生产行为，请固定使用快照模型 ID，而非纯 -latest。
在上传前压缩非常大的图像，以减少延迟和成本。
在提示词中添加明确的提取约束（字段、JSON 形状、语言）。
对于类似 OCR 的输出，要求提供置信度说明和未解析文本标记。

工作流程

1) 确认用户意图、区域、标识符，以及操作是只读还是变更。
2) 首先运行一个最小的只读查询，以验证连接性和权限。
3) 使用明确的参数和有限的范围执行目标操作。
4) 验证结果并保存输出/证据文件。

参考资料

- 来源列表：references/sources.md
API 说明：references/api_reference.md

aliyun-qwen-vl阿里云Qwen图像理解

aliyun-qwen-vl

Model Studio Qwen VL (Image Understanding)

Validation

Output And Evidence

Prerequisites

Critical model names

Normalized interface (multimodal.chat)

Request

Response

Quickstart

cURL (compatible mode)

Output location

Smoke test

Error handling

Operational guidance

Workflow

References

Model Studio Qwen VL（图像理解）

验证

输出与证据

前提条件

关键模型名称

标准化接口（multimodal.chat）

请求

响应

快速开始

cURL（兼容模式）

输出位置

冒烟测试

错误处理

操作指南

工作流程

参考资料

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

aliyun-qwen-vl阿里云Qwen图像理解

aliyun-qwen-vl

Model Studio Qwen VL (Image Understanding)

Validation

Output And Evidence

Prerequisites

Critical model names

Normalized interface (multimodal.chat)

Request

Response

Quickstart

cURL (compatible mode)

Output location

Smoke test

Error handling

Operational guidance

Workflow

References

Model Studio Qwen VL（图像理解）

验证

输出与证据

前提条件

关键模型名称

标准化接口（multimodal.chat）

请求

响应

快速开始

cURL（兼容模式）

输出位置

冒烟测试

错误处理

操作指南

工作流程

参考资料

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement