Category: provider

Model Studio Qwen OCR

Validation

CODEBLOCK0

Pass criteria: command exits 0 and output/aliyun-qwen-ocr/validate.txt is generated.

Output And Evidence

- Save request payloads, selected OCR task name, and normalized output expectations under output/aliyun-qwen-ocr/.
Keep the exact model, image source, and task configuration with each saved run.

Use Qwen OCR when the task is primarily text extraction or document structure parsing rather than broad visual reasoning.

Critical model names

Use one of these exact model strings:

- INLINECODE2
INLINECODE3
INLINECODE4
INLINECODE5
INLINECODE6
INLINECODE7

Selection guidance:

- Use qwen-vl-ocr for the stable channel.
Use qwen-vl-ocr-latest only when you explicitly want the newest OCR behavior.
Pin qwen-vl-ocr-2025-11-20 when you need reproducible document parsing based on the Qwen3-VL OCR upgrade.

Prerequisites

- Install dependencies (recommended in a venv):

CODEBLOCK1

- Set DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.

Normalized interface (ocr.extract)

Request

- image (string, required): HTTPS URL, local path, or data: URL.
INLINECODE16 (string, optional): default qwen-vl-ocr.
INLINECODE18 (string, optional): use when you want custom extraction instructions.
INLINECODE19 (string, optional): built-in OCR task.
INLINECODE20 (object, optional): configuration for built-in task such as extraction fields.
INLINECODE21 (bool, optional): default false.
INLINECODE23 (int, optional)
INLINECODE24 (int, optional)
INLINECODE25 (int, optional)
INLINECODE26 (float, optional): recommended to keep near default/low values.

Response

- text (string): extracted text or structured markdown/html-style output.
INLINECODE28 (string)
INLINECODE29 (object, optional)

Built-in OCR tasks

Use one of these values in task:

- INLINECODE31
INLINECODE32
INLINECODE33
INLINECODE34
INLINECODE35
INLINECODE36
INLINECODE37

Quick start

Custom prompt:

CODEBLOCK2

Built-in task:

CODEBLOCK3

Operational guidance

- Prefer built-in OCR tasks for standard parsing jobs because they use official task prompts.
For critical business fields, add downstream validation rules after OCR.
INLINECODE38 and older snapshots default to 4096 max output tokens unless higher limits are approved by Alibaba Cloud; qwen-vl-ocr-2025-11-20 follows the model maximum.
Increase max_pixels only when small text is missed; this raises token cost.

Output location

- Default output: INLINECODE42
Override base dir with OUTPUT_DIR.

References

- INLINECODE44
INLINECODE45

技能名称: aliyun-qwen-ocr

详细描述:
类别: 提供方

Model Studio Qwen OCR

验证

bash
mkdir -p output/aliyun-qwen-ocr
python -m pycompile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepareocrrequest.py && echo pycompile_ok > output/aliyun-qwen-ocr/validate.txt

通过标准：命令退出码为0，且生成 output/aliyun-qwen-ocr/validate.txt 文件。

输出与证据

- 将请求负载、所选OCR任务名称以及标准化输出预期保存至 output/aliyun-qwen-ocr/ 目录下。
每次保存运行时，需保留确切的模型、图像来源及任务配置。

当任务主要为文本提取或文档结构解析，而非广泛的视觉推理时，请使用Qwen OCR。

关键模型名称

使用以下确切的模型字符串之一：

- qwen-vl-ocr
qwen-vl-ocr-latest
qwen-vl-ocr-2025-11-20
qwen-vl-ocr-2025-08-28
qwen-vl-ocr-2025-04-13
qwen-vl-ocr-2024-10-28

选择指南：

- 稳定通道请使用 qwen-vl-ocr。
仅当明确需要最新OCR行为时，使用 qwen-vl-ocr-latest。
当需要基于Qwen3-VL OCR升级实现可复现的文档解析时，固定使用 qwen-vl-ocr-2025-11-20。

前置条件

- 安装依赖（建议在虚拟环境中进行）：

bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests

- 在环境中设置 DASHSCOPEAPIKEY，或将 dashscopeapikey 添加到 ~/.alibabacloud/credentials 文件中。

标准化接口 (ocr.extract)

请求

- image（字符串，必填）：HTTPS URL、本地路径或 data: URL。
model（字符串，可选）：默认为 qwen-vl-ocr。
prompt（字符串，可选）：当需要自定义提取指令时使用。
task（字符串，可选）：内置OCR任务。
taskconfig（对象，可选）：内置任务的配置，例如提取字段。
enablerotate（布尔值，可选）：默认为 false。
minpixels（整数，可选）
maxpixels（整数，可选）
max_tokens（整数，可选）
temperature（浮点数，可选）：建议保持在默认/较低值附近。

响应

- text（字符串）：提取的文本或结构化的Markdown/HTML风格输出。
model（字符串）
usage（对象，可选）

内置OCR任务

在 task 中使用以下值之一：

- textrecognition
keyinformationextraction
documentparsing
tableparsing
formularecognition
multilan
advancedrecognition

快速开始

自定义提示：

bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepareocrrequest.py \
--image https://example.com/invoice.png \
--prompt 以JSON格式提取卖家名称、发票日期、金额和税号。

内置任务：

bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepareocrrequest.py \
--image https://example.com/table.png \
--task table_parsing \
--model qwen-vl-ocr-2025-11-20

操作指南

- 对于标准解析任务，优先使用内置OCR任务，因为它们使用官方任务提示。
对于关键业务字段，在OCR之后添加下游验证规则。
qwen-vl-ocr 及较旧快照默认最大输出令牌数为 4096，除非阿里云批准更高限制；qwen-vl-ocr-2025-11-20 遵循模型最大限制。
仅在遗漏小文本时增加 max_pixels；这会提高令牌成本。

输出位置

- 默认输出：output/aliyun-qwen-ocr/request.json
使用 OUTPUT_DIR 覆盖基础目录。

参考资料

- references/api_reference.md
references/sources.md

aliyun-qwen-ocr阿里云Qwen OCR

aliyun-qwen-ocr

Model Studio Qwen OCR

Validation

Output And Evidence

Critical model names

Prerequisites

Normalized interface (ocr.extract)

Request

Response

Built-in OCR tasks

Quick start

Operational guidance

Output location

References

Model Studio Qwen OCR

验证

输出与证据

关键模型名称

前置条件

标准化接口 (ocr.extract)

请求

响应

内置OCR任务

快速开始

操作指南

输出位置

参考资料

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

aliyun-qwen-ocr阿里云Qwen OCR

aliyun-qwen-ocr

Model Studio Qwen OCR

Validation

Output And Evidence

Critical model names

Prerequisites

Normalized interface (ocr.extract)

Request

Response

Built-in OCR tasks

Quick start

Operational guidance

Output location

References

Model Studio Qwen OCR

验证

输出与证据

关键模型名称

前置条件

标准化接口 (ocr.extract)

请求

响应

内置OCR任务

快速开始

操作指南

输出位置

参考资料

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement