Category: provider
Model Studio Qwen TTS
Validation
CODEBLOCK0
Pass criteria: command exits 0 and output/aliyun-qwen-tts/validate.txt is generated.
Output And Evidence
- - Save generated audio links, sample audio files, and request payloads to
output/aliyun-qwen-tts/. - Keep one validation log per execution.
Critical model names
Use one of the recommended models:
- - INLINECODE2
- INLINECODE3
- INLINECODE4
Prerequisites
- - Install SDK (recommended in a venv to avoid PEP 668 limits):
CODEBLOCK1
- - Set
DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials (env takes precedence).
Normalized interface (tts.generate)
Request
- -
text (string, required) - INLINECODE9 (string, required)
- INLINECODE10 (string, optional; default
Auto) - INLINECODE12 (string, optional; recommended for instruct models)
- INLINECODE13 (bool, optional; default false)
Response
- -
audio_url (string, when stream=false) - INLINECODE15 (string, when stream=true)
- INLINECODE16 (int, 24000)
- INLINECODE17 (string, wav or pcm depending on mode)
Quick start (Python + DashScope SDK)
CODEBLOCK2
Streaming notes
- -
stream=True returns Base64-encoded PCM chunks at 24kHz. - Decode chunks and play or concatenate to a pcm buffer.
- The response contains
finish_reason == "stop" when the stream ends.
Operational guidance
- - Keep requests concise; split long text into multiple calls if you hit size or timeout errors.
- Use
language_type consistent with the text to improve pronunciation. - Use
instruction only when you need explicit style/tone control. - Cache by
(text, voice, language_type) to avoid repeat costs.
Output location
- - Default output: INLINECODE23
- Override base dir with
OUTPUT_DIR.
Workflow
1) Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
2) Run one minimal read-only query first to verify connectivity and permissions.
3) Execute the target operation with explicit parameters and bounded scope.
4) Verify results and save output/evidence files.
References
- -
references/api_reference.md for parameter mapping and streaming example. - Realtime mode is provided by
skills/ai/audio/aliyun-qwen-tts-realtime/. - Voice cloning/design are provided by
skills/ai/audio/aliyun-qwen-tts-voice-clone/ and skills/ai/audio/aliyun-qwen-tts-voice-design/.
- - Source list: INLINECODE29
技能名称: aliyun-qwen-tts
详细描述:
分类: provider
Model Studio Qwen TTS
验证
bash
mkdir -p output/aliyun-qwen-tts
python -m pycompile skills/ai/audio/aliyun-qwen-tts/scripts/generatetts.py && echo pycompileok > output/aliyun-qwen-tts/validate.txt
通过标准:命令退出码为0且生成了 output/aliyun-qwen-tts/validate.txt 文件。
输出与证据
- - 将生成的音频链接、示例音频文件和请求负载保存到 output/aliyun-qwen-tts/ 目录下。
- 每次执行保留一份验证日志。
关键模型名称
使用以下推荐模型之一:
- - qwen3-tts-flash
- qwen3-tts-instruct-flash
- qwen3-tts-instruct-flash-2026-01-26
前置条件
- - 安装SDK(建议在虚拟环境中安装以避免PEP 668限制):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
- - 在环境变量中设置 DASHSCOPEAPIKEY,或者将 dashscopeapikey 添加到 ~/.alibabacloud/credentials 文件中(环境变量优先级更高)。
标准化接口 (tts.generate)
请求
- - text(字符串,必填)
- voice(字符串,必填)
- language_type(字符串,可选;默认值为 Auto)
- instruction(字符串,可选;推荐用于指令模型)
- stream(布尔值,可选;默认值为 false)
响应
- - audiourl(字符串,当 stream=false 时)
- audiobase64pcm(字符串,当 stream=true 时)
- samplerate(整数,24000)
- format(字符串,根据模式为 wav 或 pcm)
快速开始 (Python + DashScope SDK)
python
import os
import dashscope
优先使用环境变量进行认证:export DASHSCOPEAPIKEY=...
或者使用 ~/.alibabacloud/credentials 文件,在 [default] 下配置 dashscopeapikey。
北京区域;新加坡区域请使用:https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base
httpapi_url = https://dashscope.aliyuncs.com/api/v1
text = 你好,这是一段简短的语音。
response = dashscope.MultiModalConversation.call(
model=qwen3-tts-instruct-flash,
apikey=os.getenv(DASHSCOPEAPI_KEY),
text=text,
voice=Cherry,
language_type=English,
instruction=温暖平静的语调,语速稍慢。,
stream=False,
)
audio_url = response.output.audio.url
print(audio_url)
流式传输说明
- - stream=True 返回24kHz的Base64编码PCM数据块。
- 解码数据块并播放或拼接成pcm缓冲区。
- 当流结束时,响应中包含 finish_reason == stop。
操作指南
- - 保持请求简洁;如果遇到大小或超时错误,可将长文本拆分为多次调用。
- 使用与文本一致的 languagetype 以提高发音准确性。
- 仅在需要显式控制风格/语调时使用 instruction。
- 通过 (text, voice, languagetype) 进行缓存以避免重复开销。
输出位置
- - 默认输出:output/aliyun-qwen-tts/audio/
- 通过 OUTPUT_DIR 覆盖基础目录。
工作流程
1) 确认用户意图、区域、标识符以及操作是只读还是修改性质。
2) 首先执行一个最小的只读查询以验证连接和权限。
3) 使用明确的参数和有限的范围执行目标操作。
4) 验证结果并保存输出/证据文件。
参考资料
- - 参数映射和流式传输示例请参考 references/api_reference.md。
- 实时模式由 skills/ai/audio/aliyun-qwen-tts-realtime/ 提供。
- 语音克隆/设计分别由 skills/ai/audio/aliyun-qwen-tts-voice-clone/ 和 skills/ai/audio/aliyun-qwen-tts-voice-design/ 提供。
- - 来源列表:references/sources.md