KittenTTS WhatsApp Voice
Generates WhatsApp-compatible voice notes from text using KittenTTS + ffmpeg. Specifically solves the format mismatch that causes silent failures: KittenTTS outputs 24kHz WAV → converted to 16kHz OGG Opus via ffmpeg → sent as WhatsApp voice note.
⚠️ Read before installing. This skill installs system packages and downloads large ML models. See Setup below.
System Dependencies
| Dependency | Install command | Size | Notes |
|---|
| INLINECODE0 | INLINECODE1 | ~30MB | Available in most distro repos |
| INLINECODE2 |
pip3 install kittentts --break-system-packages | pulls ~25-80MB from Hugging Face on first run | Python package |
|
libopus | bundled with ffmpeg | — | OGG encoding support |
|
soundfile | pulled by kittentts | — | Python package |
Network Calls
- - First run: downloads TTS model (~25-80MB) from
huggingface.co/KittenML based on model size chosen - No API keys required — fully offline capable after model download
- Set
HF_TOKEN env var to avoid unauthenticated rate limits on model download
Model Options
| Model | Parameters | Size | Hugging Face ID |
|---|
| nano (int8) | 15M | 25MB | INLINECODE8 |
| nano |
15M | 56MB |
KittenML/kitten-tts-nano-0.8-fp32 |
| micro | 40M | 41MB |
KittenML/kitten-tts-micro-0.8 |
| mini | 80M | 80MB |
KittenML/kitten-tts-mini-0.8 |
Default: kitten-tts-mini-0.8 (best quality). Change in scripts/tts_walkie.sh.
Setup
Run these manually before the skill is used:
CODEBLOCK0
Restart OpenClaw after installing dependencies so the new packages are in PATH.
Usage
TTS only (no transcription)
CODEBLOCK1
Transcription only (optional — requires whisper)
CODEBLOCK2
Voices
Available: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo
Default: INLINECODE14
Security Notes
- - Audio files are written to a private
/tmp/kittentts-walkie/ directory (mode 700) — only the running user can read them. - WAV intermediates are cleaned up immediately after conversion; only the OGG is kept for sending.
- Set
VOICE_SPEED env var to adjust speech rate (default: 1.0).
Files
CODEBLOCK3
⚠️ Privileged Install Warning
The dependency install commands use --break-system-packages and apt-get install -y. These require root privileges and modify system packages. Review before running if you are on a managed system.
Troubleshooting
Audio sends but is silent or rejected by WhatsApp:
→ Run ffprobe -v quiet -print_format json -show_streams /tmp/walkie_reply.ogg
→ Must show codec_name: opus and sample_rate: 48000 (or 16000). If not, the ffmpeg chain failed.
TTS generation is slow:
→ Switch to a smaller model (nano instead of mini) in scripts/tts_walkie.sh.
Hugging Face download rate limit:
→ Set HF_TOKEN in your environment. Free accounts get lower rate limits.
KittenTTS WhatsApp 语音
使用KittenTTS + ffmpeg将文本生成为WhatsApp兼容的语音笔记。专门解决因格式不匹配导致的静默失败问题:KittenTTS输出24kHz WAV → 通过ffmpeg转换为16kHz OGG Opus → 作为WhatsApp语音笔记发送。
⚠️ 安装前请阅读。 此技能会安装系统包并下载大型机器学习模型。请参阅下方设置部分。
系统依赖
| 依赖项 | 安装命令 | 大小 | 说明 |
|---|
| ffmpeg | apt-get install -y ffmpeg | ~30MB | 大多数发行版仓库中可用 |
| kittentts |
pip3 install kittentts --break-system-packages | 首次运行从Hugging Face拉取约25-80MB | Python包 |
| libopus | 随ffmpeg捆绑 | — | OGG编码支持 |
| soundfile | 由kittentts拉取 | — | Python包 |
网络调用
- - 首次运行:根据所选模型大小从huggingface.co/KittenML下载TTS模型(约25-80MB)
- 无需API密钥 — 模型下载后可完全离线使用
- 设置HF_TOKEN环境变量以避免模型下载时未认证的速率限制
模型选项
| 模型 | 参数 | 大小 | Hugging Face ID |
|---|
| nano (int8) | 1500万 | 25MB | KittenML/kitten-tts-nano-0.8-int8 |
| nano |
1500万 | 56MB | KittenML/kitten-tts-nano-0.8-fp32 |
| micro | 4000万 | 41MB | KittenML/kitten-tts-micro-0.8 |
| mini | 8000万 | 80MB | KittenML/kitten-tts-mini-0.8 |
默认:kitten-tts-mini-0.8(最佳质量)。在scripts/tts_walkie.sh中更改。
设置
在使用此技能前手动运行以下命令:
bash
1. 系统包(需要root/特权权限)
apt-get install -y ffmpeg
2. Python包
pip3 install kittentts --break-system-packages
3. 可选:设置Hugging Face令牌以避免速率限制
echo export HFTOKEN=hfyourtokenhere >> ~/.bashrc
安装依赖后重启OpenClaw,以便新包在PATH中生效。
使用方法
仅TTS(无转录)
bash
bash scripts/tts_walkie.sh 您的消息在这里 Bella
输出:/tmp/walkie_reply.ogg(16kHz OGG Opus,WhatsApp就绪)
仅转录(可选 — 需要whisper)
bash
安装whisper(一次性,根据模型大小约140MB-1.4GB)
pip3 install whisper --break-system-packages
bash scripts/transcribe.sh /path/to/audio.ogg [model]
模型:tiny | base | small | medium | large(默认:base)
语音
可用语音:Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo
默认:Bella
安全说明
- - 音频文件写入私有/tmp/kittentts-walkie/目录(权限700)— 仅运行用户可读取。
- WAV中间文件在转换后立即清理;仅保留OGG用于发送。
- 设置VOICE_SPEED环境变量以调整语速(默认:1.0)。
文件
kittentts-whatsapp/
├── SKILL.md
└── scripts/
├── tts_walkie.sh # TTS + ffmpeg转换(现在使用语速设置)
└── transcribe.sh # whisper转录(可选)
⚠️ 特权安装警告
依赖安装命令使用--break-system-packages和apt-get install -y。这些需要root权限并修改系统包。如果您在受管系统上运行,请先审查。
故障排除
音频已发送但静音或被WhatsApp拒绝:
→ 运行ffprobe -v quiet -printformat json -showstreams /tmp/walkie_reply.ogg
→ 必须显示codecname: opus和samplerate: 48000(或16000)。如果未显示,则ffmpeg链失败。
TTS生成缓慢:
→ 在scripts/tts_walkie.sh中切换到较小的模型(nano代替mini)。
Hugging Face下载速率限制:
→ 在环境中设置HF_TOKEN。免费账户的速率限制较低。