deAPI Audio
Text-to-speech, voice cloning, voice design, and audio transcription via deAPI decentralized GPU network.
Scripts
| Script | Use when... |
|---|
| INLINECODE0 | User wants to convert text to spoken audio |
| INLINECODE1 |
User wants to clone/replicate a voice from a sample audio file |
|
scripts/voice-design.sh | User wants to generate speech with a voice described in natural language |
|
scripts/speech-to-text.sh | User wants to transcribe an audio file (AAC, MP3, OGG, WAV, WebM, FLAC, max 10MB) |
Your config
! cat ${CLAUDE
SKILLDIR}/config.json 2>/dev/null || echo "NOT_CONFIGURED"
If the config above is NOT_CONFIGURED, ask the user:
- - What is your deAPI API key? (get one at https://deapi.ai, free $5 credit)
Then write the answer to ${CLAUDESKILLDIR}/config.json as { "api_key": "their_key" }.
Alternatively, the user can set the DEAPI_API_KEY environment variable directly, which takes priority over config.json.
Gotchas
- - For YouTube/video transcription, use the
deapi-video skill instead. This skill handles audio-only files (.mp3, .wav, .m4a, .flac, .ogg). - Three TTS models:
Kokoro (default), Chatterbox, Qwen3. Use --model Chatterbox or --model Qwen3 to switch. - Kokoro: Voice ID format is
{lang}{gender}_{name}. Language is auto-detected from voice prefix if --lang is omitted. - Chatterbox: voice is always
default, speed is fixed at 1, supports 22 languages. Text limit 10-2000 chars. - Kokoro: text limit 3-10001 chars. Long text may timeout — split into segments and generate separately.
- TTS output format defaults to mp3. WAV files are much larger but lossless.
- Kokoro:
speed range is 0.5-2.0. Values outside this range cause errors. - Qwen3 Voice Clone (
voice-clone.sh): ref audio must be 5-15 seconds. Too short or too long degrades quality. Formats: MP3, WAV, FLAC, OGG, M4A. URLs are downloaded automatically. - Qwen3 Voice Design (
voice-design.sh): quality depends on the --instruct description. Encourage specific details: gender, age, accent, speaking style, emotion. - Qwen3 models use full language names (
English, French, etc.) NOT language codes. 10 supported languages: English, Italian, Spanish, Portuguese, Russian, French, German, Korean, Japanese, Chinese. - Qwen3 TTS (
--model Qwen3): 9 voices available, default Vivian. Chinese language lacks Ryan voice. - Qwen3 text limit is 10-5000 chars. Speed is fixed at 1. Voice Clone and Voice Design use voice=
default. - Audio transcription accepts a local file path or URL (
--audio). Formats: AAC, MP3, OGG, WAV, WebM, FLAC. Max 10 MB. - Result URLs expire in 24 hours. Download promptly.
Quick examples
CODEBLOCK0
For the full voice list and language codes, see references/voices.md.
deAPI 音频
通过 deAPI 去中心化 GPU 网络实现文本转语音、语音克隆、语音设计及音频转录功能。
脚本
| 脚本 | 使用场景 |
|---|
| scripts/text-to-speech.sh | 用户需要将文本转换为语音音频 |
| scripts/voice-clone.sh |
用户需要从样本音频文件中克隆/复制声音 |
| scripts/voice-design.sh | 用户需要根据自然语言描述生成语音 |
| scripts/speech-to-text.sh | 用户需要转录音频文件(支持 AAC、MP3、OGG、WAV、WebM、FLAC 格式,最大 10MB) |
您的配置
! cat ${CLAUDE
SKILLDIR}/config.json 2>/dev/null || echo NOT_CONFIGURED
如果上述配置显示为 NOT_CONFIGURED,请询问用户:
- - 您的 deAPI API 密钥是什么?(可在 https://deapi.ai 获取,免费赠送 5 美元额度)
然后将答案写入 ${CLAUDESKILLDIR}/config.json,格式为 { apikey: theirkey }。
或者,用户可以直接设置 DEAPIAPIKEY 环境变量,该变量优先级高于 config.json。
注意事项
- - 对于 YouTube/视频转录,请使用 deapi-video 技能。本技能仅处理纯音频文件(.mp3、.wav、.m4a、.flac、.ogg)。
- 三种 TTS 模型:Kokoro(默认)、Chatterbox、Qwen3。使用 --model Chatterbox 或 --model Qwen3 切换。
- Kokoro:语音 ID 格式为 {lang}{gender}_{name}。如果省略 --lang,语言将从语音前缀自动检测。
- Chatterbox:语音固定为 default,速度固定为 1,支持 22 种语言。文本限制 10-2000 字符。
- Kokoro:文本限制 3-10001 字符。长文本可能超时——请分段生成。
- TTS 输出格式默认为 mp3。WAV 文件体积更大但为无损格式。
- Kokoro:speed 范围为 0.5-2.0。超出此范围的值会导致错误。
- Qwen3 语音克隆(voice-clone.sh):参考音频必须为 5-15 秒。过短或过长会降低质量。支持格式:MP3、WAV、FLAC、OGG、M4A。URL 将自动下载。
- Qwen3 语音设计(voice-design.sh):质量取决于 --instruct 描述。建议提供具体细节:性别、年龄、口音、说话风格、情绪。
- Qwen3 模型使用完整语言名称(English、French 等),而非语言代码。支持 10 种语言:英语、意大利语、西班牙语、葡萄牙语、俄语、法语、德语、韩语、日语、中文。
- Qwen3 TTS(--model Qwen3):提供 9 种语音,默认为 Vivian。中文语言缺少 Ryan 语音。
- Qwen3 文本限制为 10-5000 字符。速度固定为 1。语音克隆和语音设计使用 voice=default。
- 音频转录接受本地文件路径或 URL(--audio)。支持格式:AAC、MP3、OGG、WAV、WebM、FLAC。最大 10 MB。
- 结果 URL 有效期为 24 小时。请及时下载。
快速示例
bash
基础 TTS
bash scripts/text-to-speech.sh --text Hello world
英式语音
bash scripts/text-to-speech.sh --text Good morning --voice bf_emma
Chatterbox 模型(多语言)
bash scripts/text-to-speech.sh --model Chatterbox --text Bonjour le monde --lang fr
Qwen3 模型
bash scripts/text-to-speech.sh --model Qwen3 --text Hello world --voice Serena --lang English
从样本克隆语音
bash scripts/voice-clone.sh --text Hello, this is my cloned voice --ref-audio /path/to/sample.mp3
使用参考转录文本提高克隆精度
bash scripts/voice-clone.sh --text Welcome to the show --ref-audio /path/to/sample.wav --ref-text This is the original transcript
根据描述设计自定义语音
bash scripts/voice-design.sh --text Good morning everyone --instruct A warm, deep male voice with a slight British accent
其他语言的语音设计
bash scripts/voice-design.sh --text Bonjour tout le monde --instruct A cheerful young female voice --lang French
转录音频文件(本地或 URL)
bash scripts/speech-to-text.sh --audio /path/to/recording.mp3
bash scripts/speech-to-text.sh --audio https://example.com/podcast.mp3
完整语音列表和语言代码,请参阅 references/voices.md。