deAPI Audio

Text-to-speech, voice cloning, voice design, and audio transcription via deAPI decentralized GPU network.

Scripts

Script	Use when...
INLINECODE0	User wants to convert text to spoken audio
INLINECODE1

User wants to clone/replicate a voice from a sample audio file | | scripts/voice-design.sh | User wants to generate speech with a voice described in natural language | | scripts/speech-to-text.sh | User wants to transcribe an audio file (AAC, MP3, OGG, WAV, WebM, FLAC, max 10MB) |

Your config

! cat ${CLAUDESKILLDIR}/config.json 2>/dev/null || echo "NOT_CONFIGURED"

If the config above is NOT_CONFIGURED, ask the user:

- What is your deAPI API key? (get one at https://deapi.ai, free $5 credit)

Then write the answer to ${CLAUDESKILLDIR}/config.json as { "api_key": "their_key" }.

Alternatively, the user can set the DEAPI_API_KEY environment variable directly, which takes priority over config.json.

Gotchas

- For YouTube/video transcription, use the deapi-video skill instead. This skill handles audio-only files (.mp3, .wav, .m4a, .flac, .ogg).
Three TTS models: Kokoro (default), Chatterbox, Qwen3. Use --model Chatterbox or --model Qwen3 to switch.
Kokoro: Voice ID format is {lang}{gender}_{name}. Language is auto-detected from voice prefix if --lang is omitted.
Chatterbox: voice is always default, speed is fixed at 1, supports 22 languages. Text limit 10-2000 chars.
Kokoro: text limit 3-10001 chars. Long text may timeout — split into segments and generate separately.
TTS output format defaults to mp3. WAV files are much larger but lossless.
Kokoro: speed range is 0.5-2.0. Values outside this range cause errors.
Qwen3 Voice Clone (voice-clone.sh): ref audio must be 5-15 seconds. Too short or too long degrades quality. Formats: MP3, WAV, FLAC, OGG, M4A. URLs are downloaded automatically.
Qwen3 Voice Design (voice-design.sh): quality depends on the --instruct description. Encourage specific details: gender, age, accent, speaking style, emotion.
Qwen3 models use full language names (English, French, etc.) NOT language codes. 10 supported languages: English, Italian, Spanish, Portuguese, Russian, French, German, Korean, Japanese, Chinese.
Qwen3 TTS (--model Qwen3): 9 voices available, default Vivian. Chinese language lacks Ryan voice.
Qwen3 text limit is 10-5000 chars. Speed is fixed at 1. Voice Clone and Voice Design use voice=default.
Audio transcription accepts a local file path or URL (--audio). Formats: AAC, MP3, OGG, WAV, WebM, FLAC. Max 10 MB.
Result URLs expire in 24 hours. Download promptly.

Quick examples

CODEBLOCK0

For the full voice list and language codes, see references/voices.md.

deAPI 音频

通过 deAPI 去中心化 GPU 网络实现文本转语音、语音克隆、语音设计及音频转录功能。

脚本

脚本	使用场景
scripts/text-to-speech.sh	用户需要将文本转换为语音音频
scripts/voice-clone.sh

您的配置

! cat ${CLAUDESKILLDIR}/config.json 2>/dev/null || echo NOT_CONFIGURED

如果上述配置显示为 NOT_CONFIGURED，请询问用户：

- 您的 deAPI API 密钥是什么？（可在 https://deapi.ai 获取，免费赠送 5 美元额度）

然后将答案写入 ${CLAUDESKILLDIR}/config.json，格式为 { apikey: theirkey }。

或者，用户可以直接设置 DEAPIAPIKEY 环境变量，该变量优先级高于 config.json。

注意事项

- 对于 YouTube/视频转录，请使用 deapi-video 技能。本技能仅处理纯音频文件（.mp3、.wav、.m4a、.flac、.ogg）。
三种 TTS 模型：Kokoro（默认）、Chatterbox、Qwen3。使用 --model Chatterbox 或 --model Qwen3 切换。
Kokoro：语音 ID 格式为 {lang}{gender}_{name}。如果省略 --lang，语言将从语音前缀自动检测。
Chatterbox：语音固定为 default，速度固定为 1，支持 22 种语言。文本限制 10-2000 字符。
Kokoro：文本限制 3-10001 字符。长文本可能超时——请分段生成。
TTS 输出格式默认为 mp3。WAV 文件体积更大但为无损格式。
Kokoro：speed 范围为 0.5-2.0。超出此范围的值会导致错误。
Qwen3 语音克隆（voice-clone.sh）：参考音频必须为 5-15 秒。过短或过长会降低质量。支持格式：MP3、WAV、FLAC、OGG、M4A。URL 将自动下载。
Qwen3 语音设计（voice-design.sh）：质量取决于 --instruct 描述。建议提供具体细节：性别、年龄、口音、说话风格、情绪。
Qwen3 模型使用完整语言名称（English、French 等），而非语言代码。支持 10 种语言：英语、意大利语、西班牙语、葡萄牙语、俄语、法语、德语、韩语、日语、中文。
Qwen3 TTS（--model Qwen3）：提供 9 种语音，默认为 Vivian。中文语言缺少 Ryan 语音。
Qwen3 文本限制为 10-5000 字符。速度固定为 1。语音克隆和语音设计使用 voice=default。
音频转录接受本地文件路径或 URL（--audio）。支持格式：AAC、MP3、OGG、WAV、WebM、FLAC。最大 10 MB。
结果 URL 有效期为 24 小时。请及时下载。

快速示例

bash

基础 TTS

bash scripts/text-to-speech.sh --text Hello world

英式语音

bash scripts/text-to-speech.sh --text Good morning --voice bf_emma

Chatterbox 模型（多语言）

bash scripts/text-to-speech.sh --model Chatterbox --text Bonjour le monde --lang fr

Qwen3 模型

bash scripts/text-to-speech.sh --model Qwen3 --text Hello world --voice Serena --lang English

从样本克隆语音

bash scripts/voice-clone.sh --text Hello, this is my cloned voice --ref-audio /path/to/sample.mp3

使用参考转录文本提高克隆精度

bash scripts/voice-clone.sh --text Welcome to the show --ref-audio /path/to/sample.wav --ref-text This is the original transcript

根据描述设计自定义语音

bash scripts/voice-design.sh --text Good morning everyone --instruct A warm, deep male voice with a slight British accent

其他语言的语音设计

bash scripts/voice-design.sh --text Bonjour tout le monde --instruct A cheerful young female voice --lang French

转录音频文件（本地或 URL）

bash scripts/speech-to-text.sh --audio /path/to/recording.mp3 bash scripts/speech-to-text.sh --audio https://example.com/podcast.mp3

完整语音列表和语言代码，请参阅 references/voices.md。

deapi-audiodeAPI音频

deapi-audio

deAPI Audio

Scripts

Your config

Gotchas

Quick examples

deAPI 音频

脚本

您的配置

注意事项

快速示例

基础 TTS

英式语音

Chatterbox 模型（多语言）

Qwen3 模型

从样本克隆语音

使用参考转录文本提高克隆精度

根据描述设计自定义语音

其他语言的语音设计

转录音频文件（本地或 URL）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

deapi-audiodeAPI音频

deapi-audio

deAPI Audio

Scripts

Your config

Gotchas

Quick examples

deAPI 音频

脚本

您的配置

注意事项

快速示例

基础 TTS

英式语音

Chatterbox 模型（多语言）

Qwen3 模型

从样本克隆语音

使用参考转录文本提高克隆精度

根据描述设计自定义语音

其他语言的语音设计

转录音频文件（本地或 URL）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement