Baidu Intelligent Cloud Speech Synthesis Skill

Triggers

Use this skill when the user mentions:

- "Convert this dialogue to audio using Baidu TTS"
"Generate male-female dialogue, male voice using Duxiaoyao, female voice using Duxiaomei"
"Batch process all dialogues in dialogue.txt"
"Adjust speech rate to 7, pitch to 6"
"View available voice list"
"baidu tts", "dialogue to audio", "multi-speaker speech synthesis"
"baidu speech synthesis", "multi-speaker dialogue", "Baidu TTS"

Chinese triggers (for Chinese users):

- "用百度TTS把这段对话转成音频"
"生成男女对话，男声用度逍遥，女声用度小美"
"批量处理 dialogue.txt 里的所有对话"
"调整语速到7，音调到6"
"查看可用的音色列表"

Overview

This skill calls the Baidu Intelligent Cloud Speech Synthesis API, supporting multi-speaker dialogue synthesis (SSML mode or segment-merge fallback). It provides rich voice selection, speech rate/pitch/volume adjustment, and can automatically convert text dialogues into audio files with character-specific voices.

Installation Dependencies

CODEBLOCK0

Environment Variables Setup

Choose one of three authentication methods:

Method 1: API Key + Secret Key (auto-token)

CODEBLOCK1

Method 2: Direct access_token (starts with `1.`)

CODEBLOCK2

Method 3: IAM Key (starts with `bce-v3/`)

CODEBLOCK3

Required Environment Variables

BAIDU_API_KEY must be set. Whether BAIDU_SECRET_KEY is needed depends on the authentication method:

Method 1: API Key + Secret Key (auto-token)

CODEBLOCK4

Method 2: Direct access_token (starts with `1.`)

CODEBLOCK5

Method 3: IAM Key (starts with `bce-v3/`)

CODEBLOCK6

The skill scripts automatically detect the key format and choose the corresponding authentication method. If not set, the user will be prompted.

Usage

1. Direct script invocation (command line)

CODEBLOCK7

2. Usage in OpenClaw sessions

When the user triggers the above phrases, the skill will:

1. Check environment variable configuration
Ask or automatically identify input text/file
Generate SSML according to default or specified voice assignment scheme
Call the Baidu API and return the audio file (can be played automatically or saved)

File Structure

CODEBLOCK8

Technical Points

- Intelligent Mode Selection: Automatically detects multi-voice requirements, defaults to segment synthesis mode (Baidu API only supports single-voice SSML).
Segment Synthesis Solution: Splits multi-role dialogues into single-voice segments → synthesizes separately → merges with ffmpeg (solves API limitations, compatible with Python 3.13).
SSML Single-Voice Support: Supports single-voice SSML (tex_type=3) for complex speech expressions of individual characters.
Automatic Voice Assignment: Default mapping "老王" → Duxiaoyao (3), "张经理" → Duxiaoyu (1), "小李" → Duyaya (4), customizable via --map.
Error Handling: Friendly prompts for network timeouts, quota exhaustion, audio merge failures, etc.

Notes

- Free Quota: Baidu Speech Synthesis provides 5 million characters/month free quota (2026 latest policy), pay-as-you-go beyond that.
Authentication Methods: Supports three authentication methods (API Key+Secret Key, access_token, IAM Key), automatically detected by skill.
SSML Limitations: SSML text length limited to 1024 bytes (note Chinese character count), recommend each sentence not exceed 120 characters.
Dependencies: Segment merge solution requires ffmpeg installation (skill will detect and prompt). No need to install pydub.
Voice Expressiveness: Baidu's base voices are relatively flat; recommend enhancing dialogue expressiveness through text optimization (adding语气词, emotional descriptions).
Key Security: Do not hardcode API keys in code; always use environment variables or .env files.
Error Handling: Detailed guidance provided for authentication failures; refer to references/api_setup.md for help.

Changelog

- 2026‑03‑31 (v1.2.3): Fixed bare except: statements in audio_merger.py; replaced with proper exception handling to improve debugging and error visibility.
2026‑03‑26 (v1.2.2): Added MIT LICENSE file; updated metadata to declare ffmpeg dependency; addressing ClawHub security warnings.
2026‑03‑26 (v1.2.1): Complete English translation of skill documentation; improved bilingual triggers for both English and Chinese users.
2026‑03‑26 (v1.2): Switched to ffmpeg instead of pydub, solving Python 3.13 compatibility issues; corrected Baidu API limitation description (only supports single-voice SSML); optimized documentation and default voice mapping.
2026‑03‑26 (v1.1): Enhanced authentication support, added IAM Key and direct access_token authentication, updated free quota information, improved error guidance.
2026‑03‑26 (v1.0): Initial release, supporting multi-speaker dialogue synthesis, SSML/segment-merge dual modes.

百度智能云语音合成技能

触发条件

当用户提及以下内容时使用此技能：

- 用百度TTS把这段对话转成音频
生成男女对话，男声用度逍遥，女声用度小美
批量处理 dialogue.txt 里的所有对话
调整语速到7，音调到6
查看可用的音色列表
baidu tts, dialogue to audio, multi-speaker speech synthesis
baidu speech synthesis, multi-speaker dialogue, Baidu TTS

中文触发词：

- 用百度TTS把这段对话转成音频
生成男女对话，男声用度逍遥，女声用度小美
批量处理 dialogue.txt 里的所有对话
调整语速到7，音调到6
查看可用的音色列表

概述

本技能调用百度智能云语音合成API，支持多说话人对话合成（SSML模式或分段合并回退方案）。提供丰富的音色选择、语速/音调/音量调节，可自动将文本对话转换为带角色语音的音频文件。

安装依赖

bash

安装Python依赖

pip install requests

确保已安装ffmpeg（音频合并需要）

Ubuntu/Debian:

sudo apt install ffmpeg

macOS:

brew install ffmpeg

Windows: 从 https://ffmpeg.org/download.html 下载

可选：如果需要pydub（替代合并方案）

pip install pydub

环境变量设置

选择以下三种认证方式之一：

方式1：API Key + Secret Key（自动获取token）

bash export BAIDUAPIKEY=您的API Key（非bce-v3格式） export BAIDUSECRETKEY=您的Secret Key

方式2：直接使用access_token（以1.开头）

bash export BAIDUAPIKEY=1.a6b7dbd428f731035f771b8d

无需BAIDUSECRETKEY

方式3：IAM Key（以bce-v3/开头）

bash export BAIDUAPIKEY=bce-v3/ALTAK-8h6t5Y7uI9o0P1q3W2e4R5t6Y7u8I9o0P

无需BAIDUSECRETKEY

注意：现有的bce-v3/ALTAK-...密钥可能专用于其他服务（如搜索）。

如果认证失败，请创建专用的语音合成应用以获取API Key + Secret Key。

必需的环境变量

必须设置BAIDUAPIKEY。是否需要BAIDUSECRETKEY取决于认证方式：

方式1：API Key + Secret Key（自动获取token）

bash BAIDUAPIKEY=您的API Key（非bce-v3格式） BAIDUSECRETKEY=您的Secret Key

方式2：直接使用access_token（以1.开头）

bash BAIDUAPIKEY=1.a6b7dbd428f731035f771b8d

无需BAIDUSECRETKEY

方式3：IAM Key（以bce-v3/开头）

bash BAIDUAPIKEY=bce-v3/ALTAK-8h6t5Y7uI9o0P1q3W2e4R5t6Y7u8I9o0P

无需BAIDUSECRETKEY

技能脚本会自动检测密钥格式并选择相应的认证方式。如果未设置，将提示用户。

使用方法

1. 直接脚本调用（命令行）

bash

单个对话文件合成

python ~/.openclaw/skills/baidu-speech-synthesis/scripts/baidu_tts.py \ --input dialogue.txt \ --output conversation.mp3

指定音色映射（角色名 → 音色代码）

python scripts/baidu_tts.py \ --input script.txt \ --map 小明:1 小红:0 老师:106

批量处理目录下所有.txt文件

python scripts/baidu_tts.py \ --dir ./dialogues \ --format mp3

调整参数

python scripts/baidu_tts.py \ --input text.txt \ --spd 7 --pit 6 --vol 5 \ --aue 3

2. 在OpenClaw会话中使用

当用户触发上述短语时，技能将：

1. 检查环境变量配置
询问或自动识别输入文本/文件
根据默认或指定的音色分配方案生成SSML
调用百度API并返回音频文件（可自动播放或保存）

文件结构

baidu-speech-synthesis/
├── SKILL.md # 本文件
├── scripts/
│ ├── baidu_tts.py # 主API客户端（token获取、SSML请求、分段合并）
│ ├── dialogue_formatter.py # 对话文本 → SSML转换和音色映射
│ └── audio_merger.py # ffmpeg音频合并工具（分段合并方案）
└── references/
├── voice_list.md # 音色代码表、示例、推荐搭配
├── ssml_guide.md # 百度SSML标签、限制、示例
└── api_setup.md # 如何获取密钥、免费配额（每月500万字符）、认证详情

技术要点

- 智能模式选择：自动检测多音色需求，默认使用分段合成模式（百度API仅支持单音色SSML）。
分段合成方案：将多角色对话拆分为单音色片段 → 分别合成 → 使用ffmpeg合并（解决API限制，兼容Python 3.13）。
SSML单音色支持：支持单音色SSML（tex_type=3），用于单个角色的复杂语音表达。
自动音色分配：默认映射老王→度逍遥(3)，张经理→度小宇(1)，小李→度丫丫(4)，可通过--map自定义。
错误处理：对网络超时、配额耗尽、音频合并失败等情况提供友好提示。

注意事项

- 免费配额：百度语音合成提供每月500万字符免费配额（2026年最新政策），超出部分按量计费。
认证方式：支持三种认证方式（API Key+Secret Key、accesstoken、IAM Key），技能自动检测。
SSML限制：SSML文本长度限制为1024字节（注意中文字符数），建议每句不超过120个字符。
依赖项：分段合并方案需要安装ffmpeg（技能会检测并提示）。无需安装pydub。
语音表现力：百度基础音色较为平淡，建议通过文本优化（添加语气词、情感描述）增强对话表现力。
密钥安全：请勿在代码中硬编码API密钥，始终使用环境变量或.env文件。
错误处理：认证失败时提供详细指导，可参考references/apisetup.md获取帮助。

更新日志

- 2026‑03‑31 (v1.2.3)：修复audiomerger.py中的裸except:语句；替换为正确的异常处理，改进调试和错误可见性。
2026‑03‑26 (v1.2.2)：添加MIT许可证文件；更新元数据声明ffmpeg依赖；解决ClawHub安全警告。
2026‑03‑26 (v1.2.1)：技能文档完整英文翻译；改进中英文用户的双语触发词。
2026‑03‑26 (v1.2)：切换到ffmpeg替代pydub，解决Python 3.13兼容性问题；修正百度API限制描述（仅支持单音色SSML）；优化文档和默认音色映射。
2026‑03‑26 (v1.1)：增强认证支持，添加IAM Key和直接accesstoken认证，更新免费配额信息，改进错误指导。
2026‑03‑26 (v1.0)：初始版本，支持多说话人对话合成，SSML/分段合并双模式。

baidu-speech-synthesis百度语音合成

baidu-speech-synthesis

Baidu Intelligent Cloud Speech Synthesis Skill

Triggers

Overview

Installation Dependencies

Environment Variables Setup

Method 1: API Key + Secret Key (auto-token)

Method 2: Direct access_token (starts with 1.)

Method 3: IAM Key (starts with bce-v3/)

Required Environment Variables

Method 1: API Key + Secret Key (auto-token)

Method 2: Direct access_token (starts with 1.)

Method 3: IAM Key (starts with bce-v3/)

Usage

1. Direct script invocation (command line)

2. Usage in OpenClaw sessions

File Structure

Technical Points

Notes

Changelog

百度智能云语音合成技能

触发条件

概述

安装依赖

安装Python依赖

确保已安装ffmpeg（音频合并需要）

Ubuntu/Debian:

macOS:

Windows: 从 https://ffmpeg.org/download.html 下载

可选：如果需要pydub（替代合并方案）

pip install pydub

环境变量设置

方式1：API Key + Secret Key（自动获取token）

方式2：直接使用access_token（以1.开头）

无需BAIDUSECRETKEY

方式3：IAM Key（以bce-v3/开头）

无需BAIDUSECRETKEY

注意：现有的bce-v3/ALTAK-...密钥可能专用于其他服务（如搜索）。

如果认证失败，请创建专用的语音合成应用以获取API Key + Secret Key。

必需的环境变量

方式1：API Key + Secret Key（自动获取token）

方式2：直接使用access_token（以1.开头）

无需BAIDUSECRETKEY

方式3：IAM Key（以bce-v3/开头）

无需BAIDUSECRETKEY

使用方法

1. 直接脚本调用（命令行）

单个对话文件合成

指定音色映射（角色名 → 音色代码）

批量处理目录下所有.txt文件

调整参数

2. 在OpenClaw会话中使用

文件结构

技术要点

注意事项

更新日志

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

Method 2: Direct access_token (starts with `1.`)

Method 3: IAM Key (starts with `bce-v3/`)

Method 2: Direct access_token (starts with `1.`)

Method 3: IAM Key (starts with `bce-v3/`)