Qwen Audio Lab
Use this skill for text-to-speech on macOS or with Aliyun Qwen.
Choose the backend
- - Use
mac-say for fast local playback, notifications, and low-friction speech on a Mac. - Use
qwen-tts when the user wants better naturalness, reusable output files, custom voices, or voice cloning. - If
DASHSCOPE_API_KEY is missing, fall back to mac-say for local playback.
Environment
- -
DASHSCOPE_API_KEY: required for Qwen synthesis and voice cloning. - INLINECODE5 : optional,
cn (default) or intl. - INLINECODE8 : optional directory for generated audio files. Defaults to
~/.openclaw/data/qwen-audio-lab/output. - INLINECODE10 : optional directory for local state such as remembered voices. Defaults to
~/.openclaw/data/qwen-audio-lab/state.
Commands
Run all commands through:
CODEBLOCK0
Preferred high-level commands
Use these first for most user-facing narration tasks:
CODEBLOCK1
Use the older commands only when you specifically want the legacy workflow names.
Generated audio and remembered voice state now default to ~/.openclaw/data/qwen-audio-lab/ instead of the skill folder.
Local macOS speech
CODEBLOCK2
Qwen TTS from inline text
CODEBLOCK3
Qwen TTS from a text file
CODEBLOCK4
Qwen TTS from stdin
CODEBLOCK5
Clone a voice
CODEBLOCK6
- - Keep the cloning
target-model aligned with the synthesis model family. - Use a clean speech sample with minimal background noise.
- Ask before cloning a third party voice when consent is unclear.
Design a voice from a text prompt
CODEBLOCK7
Legacy command: reuse the latest cloned voice
CODEBLOCK8
High-level narration from any text source
CODEBLOCK9
- - Default voice source is
last-cloned. - Use
--voice-source last-designed to use the latest designed voice instead. - Use
--voice and optionally --model to force a specific voice id and synthesis model.
Legacy command: narrate PPT speaker notes with the latest cloned voice
CODEBLOCK10
High-level PPT narration
CODEBLOCK11
- - Default voice source is
last-cloned. - Use
--voice-source last-designed to switch to the latest designed voice. - Use
--voice and optionally --model to force a specific voice id and synthesis model. - Keep
ppt-own-voice as the backward-compatible alias for the original workflow.
Inspect or manage remembered voices
CODEBLOCK12
Workflow rules
- - Reuse an existing cloned voice before asking for a new sample.
- Ask for a reference recording if the user wants their own voice and no cloned voice exists yet.
- Prefer the
narrate-* commands as the primary high-level interface for narration tasks. - Keep
speak-last-cloned and ppt-own-voice for backward compatibility with older workflows. - Keep only final outputs by default after segmented synthesis unless the user explicitly asks to keep fragments.
Qwen 音频实验室
使用此技能在 macOS 或阿里云通义千问上实现文本转语音。
选择后端
- - 使用 mac-say 在 Mac 上实现快速本地播放、通知和低门槛语音合成。
- 当用户需要更好的自然度、可复用的输出文件、自定义音色或声音克隆时,使用 qwen-tts。
- 如果缺少 DASHSCOPEAPIKEY,则回退到 mac-say 进行本地播放。
环境变量
- - DASHSCOPEAPIKEY:通义千问语音合成和声音克隆所需。
- QWENAUDIOREGION:可选,cn(默认)或 intl。
- QWENAUDIOOUTPUTDIR:生成的音频文件的可选目录。默认为 ~/.openclaw/data/qwen-audio-lab/output。
- QWENAUDIOSTATEDIR:本地状态(如已记忆的音色)的可选目录。默认为 ~/.openclaw/data/qwen-audio-lab/state。
命令
所有命令通过以下方式运行:
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py [...]
推荐的高级命令
对于大多数面向用户的叙述任务,优先使用以下命令:
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text --text 这是要转成语音的正文
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file --text-file /path/to/script.txt
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt --ppt /path/to/file.pptx
仅在需要特定旧版工作流名称时使用旧命令。
生成的音频和已记忆的音色状态现在默认存储在 ~/.openclaw/data/qwen-audio-lab/ 而非技能文件夹中。
本地 macOS 语音合成
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py mac-say \
--text 开会了,别忘了带电脑 \
--voice Tingting
从内联文本进行通义千问 TTS
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
--text 你好,我是你的语音助手。 \
--voice Cherry \
--model qwen3-tts-flash \
--language-type Chinese \
--download
从文本文件进行通义千问 TTS
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
--text-file /path/to/script.txt \
--voice Cherry \
--download
从标准输入进行通义千问 TTS
bash
cat /path/to/script.txt | python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
--stdin \
--voice Cherry \
--download
克隆声音
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py clone-voice \
--audio /path/to/reference.mp3 \
--name claw-voice-01 \
--target-model qwen3-tts-vc-2026-01-22
- - 保持克隆的 target-model 与合成模型系列一致。
- 使用背景噪音最小的清晰语音样本。
- 在克隆第三方声音且同意情况不明确时,先征询用户意见。
从文本提示设计声音
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py design-voice \
--prompt 沉稳的中年男性播音员,音色低沉浑厚,适合纪录片旁白。 \
--name doc-voice-01 \
--target-model qwen3-tts-vd-2026-01-26 \
--preview-format wav
旧版命令:复用最近克隆的声音
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py speak-last-cloned \
--text 你好,这是我的声音测试。 \
--download
从任意文本源进行高级叙述
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text \
--text 这是要转成语音的正文 \
--output narration.wav
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file \
--text-file /path/to/script.txt
- - 默认声音来源为 last-cloned。
- 使用 --voice-source last-designed 切换到最近设计的声音。
- 使用 --voice 和可选的 --model 强制指定特定声音 ID 和合成模型。
旧版命令:使用最近克隆的声音叙述 PPT 演讲者备注
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py ppt-own-voice --ppt /path/to/file.pptx
高级 PPT 叙述
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt --ppt /path/to/file.pptx
- - 默认声音来源为 last-cloned。
- 使用 --voice-source last-designed 切换到最近设计的声音。
- 使用 --voice 和可选的 --model 强制指定特定声音 ID 和合成模型。
- 保留 ppt-own-voice 作为原始工作流的向后兼容别名。
检查或管理已记忆的声音
bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py list-voices
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py show-last-voice --kind cloned
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py delete-voice --voice claw-voice-01
工作流规则
- - 在要求新的声音样本之前,优先复用已有的克隆声音。
- 如果用户想要自己的声音但尚无克隆声音,则请求提供参考录音。
- 对于叙述任务,优先使用 narrate-* 命令作为主要的高级接口。
- 保留 speak-last-cloned 和 ppt-own-voice 以保持与旧版工作流的向后兼容。
- 分段合成后默认只保留最终输出,除非用户明确要求保留片段。