Category: provider
Model Studio Qwen OCR
Validation
CODEBLOCK0
Pass criteria: command exits 0 and output/aliyun-qwen-ocr/validate.txt is generated.
Output And Evidence
- - Save request payloads, selected OCR task name, and normalized output expectations under
output/aliyun-qwen-ocr/. - Keep the exact model, image source, and task configuration with each saved run.
Use Qwen OCR when the task is primarily text extraction or document structure parsing rather than broad visual reasoning.
Critical model names
Use one of these exact model strings:
- - INLINECODE2
- INLINECODE3
- INLINECODE4
- INLINECODE5
- INLINECODE6
- INLINECODE7
Selection guidance:
- - Use
qwen-vl-ocr for the stable channel. - Use
qwen-vl-ocr-latest only when you explicitly want the newest OCR behavior. - Pin
qwen-vl-ocr-2025-11-20 when you need reproducible document parsing based on the Qwen3-VL OCR upgrade.
Prerequisites
- - Install dependencies (recommended in a venv):
CODEBLOCK1
- - Set
DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.
Normalized interface (ocr.extract)
Request
- -
image (string, required): HTTPS URL, local path, or data: URL. - INLINECODE16 (string, optional): default
qwen-vl-ocr. - INLINECODE18 (string, optional): use when you want custom extraction instructions.
- INLINECODE19 (string, optional): built-in OCR task.
- INLINECODE20 (object, optional): configuration for built-in task such as extraction fields.
- INLINECODE21 (bool, optional): default
false. - INLINECODE23 (int, optional)
- INLINECODE24 (int, optional)
- INLINECODE25 (int, optional)
- INLINECODE26 (float, optional): recommended to keep near default/low values.
Response
- -
text (string): extracted text or structured markdown/html-style output. - INLINECODE28 (string)
- INLINECODE29 (object, optional)
Built-in OCR tasks
Use one of these values in task:
- - INLINECODE31
- INLINECODE32
- INLINECODE33
- INLINECODE34
- INLINECODE35
- INLINECODE36
- INLINECODE37
Quick start
Custom prompt:
CODEBLOCK2
Built-in task:
CODEBLOCK3
Operational guidance
- - Prefer built-in OCR tasks for standard parsing jobs because they use official task prompts.
- For critical business fields, add downstream validation rules after OCR.
- INLINECODE38 and older snapshots default to
4096 max output tokens unless higher limits are approved by Alibaba Cloud; qwen-vl-ocr-2025-11-20 follows the model maximum. - Increase
max_pixels only when small text is missed; this raises token cost.
Output location
- - Default output: INLINECODE42
- Override base dir with
OUTPUT_DIR.
References
- - INLINECODE44
- INLINECODE45
技能名称: aliyun-qwen-ocr
详细描述:
类别: 提供方
Model Studio Qwen OCR
验证
bash
mkdir -p output/aliyun-qwen-ocr
python -m pycompile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepareocrrequest.py && echo pycompile_ok > output/aliyun-qwen-ocr/validate.txt
通过标准:命令退出码为0,且生成 output/aliyun-qwen-ocr/validate.txt 文件。
输出与证据
- - 将请求负载、所选OCR任务名称以及标准化输出预期保存至 output/aliyun-qwen-ocr/ 目录下。
- 每次保存运行时,需保留确切的模型、图像来源及任务配置。
当任务主要为文本提取或文档结构解析,而非广泛的视觉推理时,请使用Qwen OCR。
关键模型名称
使用以下确切的模型字符串之一:
- - qwen-vl-ocr
- qwen-vl-ocr-latest
- qwen-vl-ocr-2025-11-20
- qwen-vl-ocr-2025-08-28
- qwen-vl-ocr-2025-04-13
- qwen-vl-ocr-2024-10-28
选择指南:
- - 稳定通道请使用 qwen-vl-ocr。
- 仅当明确需要最新OCR行为时,使用 qwen-vl-ocr-latest。
- 当需要基于Qwen3-VL OCR升级实现可复现的文档解析时,固定使用 qwen-vl-ocr-2025-11-20。
前置条件
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
- - 在环境中设置 DASHSCOPEAPIKEY,或将 dashscopeapikey 添加到 ~/.alibabacloud/credentials 文件中。
标准化接口 (ocr.extract)
请求
- - image(字符串,必填):HTTPS URL、本地路径或 data: URL。
- model(字符串,可选):默认为 qwen-vl-ocr。
- prompt(字符串,可选):当需要自定义提取指令时使用。
- task(字符串,可选):内置OCR任务。
- taskconfig(对象,可选):内置任务的配置,例如提取字段。
- enablerotate(布尔值,可选):默认为 false。
- minpixels(整数,可选)
- maxpixels(整数,可选)
- max_tokens(整数,可选)
- temperature(浮点数,可选):建议保持在默认/较低值附近。
响应
- - text(字符串):提取的文本或结构化的Markdown/HTML风格输出。
- model(字符串)
- usage(对象,可选)
内置OCR任务
在 task 中使用以下值之一:
- - textrecognition
- keyinformationextraction
- documentparsing
- tableparsing
- formularecognition
- multilan
- advancedrecognition
快速开始
自定义提示:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepareocrrequest.py \
--image https://example.com/invoice.png \
--prompt 以JSON格式提取卖家名称、发票日期、金额和税号。
内置任务:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepareocrrequest.py \
--image https://example.com/table.png \
--task table_parsing \
--model qwen-vl-ocr-2025-11-20
操作指南
- - 对于标准解析任务,优先使用内置OCR任务,因为它们使用官方任务提示。
- 对于关键业务字段,在OCR之后添加下游验证规则。
- qwen-vl-ocr 及较旧快照默认最大输出令牌数为 4096,除非阿里云批准更高限制;qwen-vl-ocr-2025-11-20 遵循模型最大限制。
- 仅在遗漏小文本时增加 max_pixels;这会提高令牌成本。
输出位置
- - 默认输出:output/aliyun-qwen-ocr/request.json
- 使用 OUTPUT_DIR 覆盖基础目录。
参考资料
- - references/api_reference.md
- references/sources.md