Category: provider
Model Studio Qwen ASR (Non-Realtime)
Validation
CODEBLOCK0
Pass criteria: command exits 0 and output/aliyun-qwen-asr/validate.txt is generated.
Output And Evidence
- - Store transcripts and API responses under
output/aliyun-qwen-asr/. - Keep one command log or sample response per run.
Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.
Critical model names
Use one of these exact model strings:
- - INLINECODE2
- INLINECODE3
- INLINECODE4
- INLINECODE5
- INLINECODE6
Selection guidance:
- - Use
qwen3-asr-flash, qwen3-asr-flash-2026-02-10, or qwen-audio-asr for short/normal recordings (sync). - Use
qwen3-asr-flash-filetrans or qwen3-asr-flash-filetrans-2025-11-17 for long-file transcription (async task workflow).
Prerequisites
- - Install SDK dependencies (script uses Python stdlib only):
CODEBLOCK1
- - Set
DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.
Normalized interface (asr.transcribe)
Request
- -
audio (string, required): public URL or local file path. - INLINECODE16 (string, optional): default
qwen3-asr-flash. - INLINECODE18 (array, optional): e.g.
zh, en. - INLINECODE21 (number, optional)
- INLINECODE22 (string, optional)
- INLINECODE23 (bool, optional)
- INLINECODE24 (array, optional): e.g.
sentence. - INLINECODE26 (bool, optional): default false for sync models, true for
qwen3-asr-flash-filetrans.
Response
- -
text (string): normalized transcript text. - INLINECODE29 (string, optional): present for async submission.
- INLINECODE30 (string):
SUCCEEDED or submission status. - INLINECODE32 (object): original API response.
Quick start (official HTTP API)
Sync transcription (OpenAI-compatible protocol):
CODEBLOCK2
Async long-file transcription (DashScope protocol):
CODEBLOCK3
Poll task result:
CODEBLOCK4
Local helper script
Use the bundled script for URL/local-file input and optional async polling:
CODEBLOCK5
Long-file mode:
CODEBLOCK6
Operational guidance
- - For local files, use
input_audio.data (data URI) when direct URL is unavailable. - Keep
language_hints minimal to reduce recognition ambiguity. - For async tasks, use 5-20s polling interval with max retry guard.
- Save normalized outputs under
output/aliyun-qwen-asr/transcripts/.
Output location
- - Default output: INLINECODE36
- Override base dir with
OUTPUT_DIR.
Workflow
1) Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
2) Run one minimal read-only query first to verify connectivity and permissions.
3) Execute the target operation with explicit parameters and bounded scope.
4) Verify results and save output/evidence files.
References
- - INLINECODE38
- INLINECODE39
- Realtime synthesis is provided by
skills/ai/audio/aliyun-qwen-tts-realtime/.
技能名称: aliyun-qwen-asr
详细描述:
类别: 提供者
Model Studio Qwen ASR(非实时)
验证
bash
mkdir -p output/aliyun-qwen-asr
python -m pycompile skills/ai/audio/aliyun-qwen-asr/scripts/transcribeaudio.py && echo pycompileok > output/aliyun-qwen-asr/validate.txt
通过标准:命令退出码为0,且生成了 output/aliyun-qwen-asr/validate.txt 文件。
输出与证据
- - 将转录文本和API响应存储在 output/aliyun-qwen-asr/ 目录下。
- 每次运行保留一个命令日志或示例响应。
使用Qwen ASR对录制的音频进行转录(非实时),包括短音频同步调用和长音频异步任务。
关键模型名称
使用以下精确的模型字符串之一:
- - qwen3-asr-flash
- qwen3-asr-flash-2026-02-10
- qwen-audio-asr
- qwen3-asr-flash-filetrans
- qwen3-asr-flash-filetrans-2025-11-17
选择指南:
- - 对于短/普通录音(同步),使用 qwen3-asr-flash、qwen3-asr-flash-2026-02-10 或 qwen-audio-asr。
- 对于长文件转录(异步任务工作流),使用 qwen3-asr-flash-filetrans 或 qwen3-asr-flash-filetrans-2025-11-17。
前提条件
- - 安装SDK依赖(脚本仅使用Python标准库):
bash
python3 -m venv .venv
. .venv/bin/activate
- - 在环境中设置 DASHSCOPEAPIKEY,或将 dashscopeapikey 添加到 ~/.alibabacloud/credentials 文件中。
标准化接口(asr.transcribe)
请求
- - audio(字符串,必填):公共URL或本地文件路径。
- model(字符串,可选):默认为 qwen3-asr-flash。
- languagehints(字符串数组,可选):例如 zh、en。
- samplerate(数字,可选)
- vocabularyid(字符串,可选)
- disfluencyremovalenabled(布尔值,可选)
- timestampgranularities(字符串数组,可选):例如 sentence。
- async(布尔值,可选):同步模型默认为false,qwen3-asr-flash-filetrans 默认为true。
响应
- - text(字符串):标准化后的转录文本。
- task_id(字符串,可选):异步提交时存在。
- status(字符串):SUCCEEDED 或提交状态。
- raw(对象):原始API响应。
快速开始(官方HTTP API)
同步转录(兼容OpenAI协议):
bash
curl -sS --location https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
--header Authorization: Bearer $DASHSCOPEAPIKEY \
--header Content-Type: application/json \
--data {
model: qwen3-asr-flash,
messages: [
{
role: user,
content: [
{
type: input_audio,
input_audio: {
data: https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3
}
}
]
}
],
stream: false,
asr_options: {
enable_itn: false
}
}
异步长文件转录(DashScope协议):
bash
curl -sS --location https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription \
--header Authorization: Bearer $DASHSCOPEAPIKEY \
--header X-DashScope-Async: enable \
--header Content-Type: application/json \
--data {
model: qwen3-asr-flash-filetrans,
input: {
file_url: https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3
}
}
轮询任务结果:
bash
curl -sS --location https://dashscope.aliyuncs.com/api/v1/tasks/ \
--header Authorization: Bearer $DASHSCOPEAPIKEY
本地辅助脚本
使用捆绑脚本处理URL/本地文件输入,并可选择异步轮询:
bash
python skills/ai/audio/aliyun-qwen-asr/scripts/transcribe_audio.py \
--audio https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3 \
--model qwen3-asr-flash \
--language-hints zh,en \
--print-response
长文件模式:
bash
python skills/ai/audio/aliyun-qwen-asr/scripts/transcribe_audio.py \
--audio https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3 \
--model qwen3-asr-flash-filetrans \
--async \
--wait
操作指南
- - 对于本地文件,当无法直接使用URL时,请使用 inputaudio.data(数据URI)。
- 保持 languagehints 尽可能少,以减少识别歧义。
- 对于异步任务,使用5-20秒的轮询间隔,并设置最大重试保护。
- 将标准化输出保存在 output/aliyun-qwen-asr/transcripts/ 目录下。
输出位置
- - 默认输出:output/aliyun-qwen-asr/transcripts/
- 通过 OUTPUT_DIR 覆盖基础目录。
工作流程
1) 确认用户意图、区域、标识符,以及操作是只读还是修改。
2) 首先运行一个最小的只读查询,以验证连接和权限。
3) 使用明确的参数和有限的范围执行目标操作。
4) 验证结果并保存输出/证据文件。
参考资料
- - references/api_reference.md
- references/sources.md
- 实时合成由 skills/ai/audio/aliyun-qwen-tts-realtime/ 提供。