KittenTTS WhatsApp Voice

Generates WhatsApp-compatible voice notes from text using KittenTTS + ffmpeg. Specifically solves the format mismatch that causes silent failures: KittenTTS outputs 24kHz WAV → converted to 16kHz OGG Opus via ffmpeg → sent as WhatsApp voice note.

⚠️ Read before installing. This skill installs system packages and downloads large ML models. See Setup below.

System Dependencies

Dependency	Install command	Size	Notes
INLINECODE0	INLINECODE1	~30MB	Available in most distro repos
INLINECODE2

Network Calls

- First run: downloads TTS model (~25-80MB) from huggingface.co/KittenML based on model size chosen
No API keys required — fully offline capable after model download
Set HF_TOKEN env var to avoid unauthenticated rate limits on model download

Model Options

Model	Parameters	Size	Hugging Face ID
nano (int8)	15M	25MB	INLINECODE8
nano

15M | 56MB | KittenML/kitten-tts-nano-0.8-fp32 | | micro | 40M | 41MB | KittenML/kitten-tts-micro-0.8 | | mini | 80M | 80MB | KittenML/kitten-tts-mini-0.8 |

Default: kitten-tts-mini-0.8 (best quality). Change in scripts/tts_walkie.sh.

Setup

Run these manually before the skill is used:

CODEBLOCK0

Restart OpenClaw after installing dependencies so the new packages are in PATH.

Usage

TTS only (no transcription)

CODEBLOCK1

Transcription only (optional — requires whisper)

CODEBLOCK2

Voices

Available: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

Default: INLINECODE14

Security Notes

- Audio files are written to a private /tmp/kittentts-walkie/ directory (mode 700) — only the running user can read them.
WAV intermediates are cleaned up immediately after conversion; only the OGG is kept for sending.
Set VOICE_SPEED env var to adjust speech rate (default: 1.0).

Files

CODEBLOCK3

⚠️ Privileged Install Warning

The dependency install commands use --break-system-packages and apt-get install -y. These require root privileges and modify system packages. Review before running if you are on a managed system.

Troubleshooting

Audio sends but is silent or rejected by WhatsApp:
→ Run ffprobe -v quiet -print_format json -show_streams /tmp/walkie_reply.ogg
→ Must show codec_name: opus and sample_rate: 48000 (or 16000). If not, the ffmpeg chain failed.

TTS generation is slow:
→ Switch to a smaller model (nano instead of mini) in scripts/tts_walkie.sh.

Hugging Face download rate limit:
→ Set HF_TOKEN in your environment. Free accounts get lower rate limits.

KittenTTS WhatsApp 语音

使用KittenTTS + ffmpeg将文本生成为WhatsApp兼容的语音笔记。专门解决因格式不匹配导致的静默失败问题：KittenTTS输出24kHz WAV → 通过ffmpeg转换为16kHz OGG Opus → 作为WhatsApp语音笔记发送。

⚠️ 安装前请阅读。 此技能会安装系统包并下载大型机器学习模型。请参阅下方设置部分。

系统依赖

依赖项	安装命令	大小	说明
ffmpeg	apt-get install -y ffmpeg	~30MB	大多数发行版仓库中可用
kittentts

网络调用

- 首次运行：根据所选模型大小从huggingface.co/KittenML下载TTS模型（约25-80MB）
无需API密钥 — 模型下载后可完全离线使用
设置HF_TOKEN环境变量以避免模型下载时未认证的速率限制

模型选项

模型	参数	大小	Hugging Face ID
nano (int8)	1500万	25MB	KittenML/kitten-tts-nano-0.8-int8
nano

1500万 | 56MB | KittenML/kitten-tts-nano-0.8-fp32 | | micro | 4000万 | 41MB | KittenML/kitten-tts-micro-0.8 | | mini | 8000万 | 80MB | KittenML/kitten-tts-mini-0.8 |

默认：kitten-tts-mini-0.8（最佳质量）。在scripts/tts_walkie.sh中更改。

设置

在使用此技能前手动运行以下命令：

bash

1. 系统包（需要root/特权权限）

apt-get install -y ffmpeg

2. Python包

pip3 install kittentts --break-system-packages

3. 可选：设置Hugging Face令牌以避免速率限制

echo export HFTOKEN=hfyourtokenhere >> ~/.bashrc

安装依赖后重启OpenClaw，以便新包在PATH中生效。

使用方法

仅TTS（无转录）

bash
bash scripts/tts_walkie.sh 您的消息在这里 Bella

输出：/tmp/walkie_reply.ogg（16kHz OGG Opus，WhatsApp就绪）

仅转录（可选 — 需要whisper）

bash

安装whisper（一次性，根据模型大小约140MB-1.4GB）

pip3 install whisper --break-system-packages

bash scripts/transcribe.sh /path/to/audio.ogg [model]

模型：tiny | base | small | medium | large（默认：base）

语音

可用语音：Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

默认：Bella

安全说明

- 音频文件写入私有/tmp/kittentts-walkie/目录（权限700）— 仅运行用户可读取。
WAV中间文件在转换后立即清理；仅保留OGG用于发送。
设置VOICE_SPEED环境变量以调整语速（默认：1.0）。

文件

kittentts-whatsapp/
├── SKILL.md
└── scripts/
├── tts_walkie.sh # TTS + ffmpeg转换（现在使用语速设置）
└── transcribe.sh # whisper转录（可选）

⚠️ 特权安装警告

依赖安装命令使用--break-system-packages和apt-get install -y。这些需要root权限并修改系统包。如果您在受管系统上运行，请先审查。

故障排除

音频已发送但静音或被WhatsApp拒绝：
→ 运行ffprobe -v quiet -printformat json -showstreams /tmp/walkie_reply.ogg
→ 必须显示codecname: opus和samplerate: 48000（或16000）。如果未显示，则ffmpeg链失败。

TTS生成缓慢：
→ 在scripts/tts_walkie.sh中切换到较小的模型（nano代替mini）。

Hugging Face下载速率限制：
→ 在环境中设置HF_TOKEN。免费账户的速率限制较低。

kittentts-whatsappKittenTTS语音消息