CloneV Skill - Voice Cloning Made Simple
⚠️ CRITICAL INSTRUCTIONS FOR AI MODELS
DO NOT try to use Docker containers directly.
DO NOT try to interact with coqui-xtts container - it is broken and restarting.
DO NOT try to use APIs or servers.
ONLY USE THE SCRIPT: INLINECODE1
The script handles everything automatically. Just call it with text, voice sample, and language.
What This Skill Does
Clones any voice from a short audio sample and generates new speech in that voice.
Input:
- - Text to speak
- Voice sample (WAV file, 6-30 seconds)
- Language code
Output: OGG voice file (cloned voice speaking the text)
Works with: Any voice! Yours, a celebrity, a character, etc.
The ONE Command You Need
CODEBLOCK0
That's it! Nothing else needed.
Step-by-Step Usage (FOR AI MODELS)
Step 1: Get the required inputs
- - Text to speak (from user)
- Path to voice sample WAV file (from user)
- Language code (from user or default to
en)
Step 2: Run the script
CODEBLOCK1
Step 3: Use the output
The variable
$VOICE_FILE now contains the path to the generated OGG file.
Complete Working Examples
Example 1: Clone voice and send to Telegram
CODEBLOCK2
Example 2: Clone voice in Czech
CODEBLOCK3
Example 3: Full workflow with check
#!/bin/bash
# Generate voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en)
# Verify file was created
if [ -f "$VOICE" ]; then
echo "Success! Voice file: $VOICE"
ls -lh "$VOICE"
else
echo "Error: Voice file not created"
fi
Common Language Codes
| Code | Language | Example Usage |
|---|
| INLINECODE4 | English | INLINECODE5 |
| INLINECODE6 |
Czech |
scripts/clonev.sh "Ahoj" sample.wav cs |
|
de | German |
scripts/clonev.sh "Hallo" sample.wav de |
|
fr | French |
scripts/clonev.sh "Bonjour" sample.wav fr |
|
es | Spanish |
scripts/clonev.sh "Hola" sample.wav es |
Full list: en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko
Voice Sample Requirements
- - Format: WAV file
- Length: 6-30 seconds (optimal: 10-15 seconds)
- Quality: Clear audio, no background noise
- Content: Any speech (the actual words don't matter)
Good samples:
- - ✅ Recording of someone speaking clearly
- ✅ No music or noise in background
- ✅ Consistent volume
Bad samples:
- - ❌ Music or songs
- ❌ Heavy background noise
- ❌ Very short (< 6 seconds)
- ❌ Very long (> 30 seconds)
⚠️ Important Notes
Model Download
- - First use downloads ~1.87GB model (one-time)
- Model is stored at: INLINECODE14
- Status: ✅ Already downloaded
Processing Time
- - Takes 20-40 seconds depending on text length
- This is normal - voice cloning is computationally intensive
Troubleshooting
"Command not found"
Make sure you're in the skill directory or use full path:
CODEBLOCK5
"Voice sample not found"
- - Check the path to the WAV file
- Use absolute paths (starting with
/) - Ensure file exists: INLINECODE16
"Model not found"
The model should auto-download. If not:
CODEBLOCK6
Poor voice quality
- - Use clearer voice sample
- Ensure no background noise
- Try different sample (some voices clone better)
Quick Reference Card (FOR AI MODELS)
CODEBLOCK7
CODEBLOCK8
Output Location
Generated files are saved to:
CODEBLOCK9
The script returns this path, so you can use it directly.
Summary
- 1. ONLY use the script: INLINECODE17
- NEVER try to use Docker containers directly
- NEVER try to interact with the
coqui-xtts container - Script handles everything automatically
- Returns path to OGG file ready to send
Simple. Just use the script.
Clone any voice. Speak any language. Just use the script.
CloneV 技能 - 语音克隆如此简单
⚠️ 人工智能模型的重要说明
不要尝试直接使用 Docker 容器。
不要尝试与 coqui-xtts 容器交互——该容器已损坏且正在重启。
不要尝试使用 API 或服务器。
仅使用脚本: scripts/clonev.sh
该脚本会自动处理所有操作。只需传入文本、语音样本和语言即可。
该技能的功能
从短音频样本中克隆任何语音,并以该语音生成新的语音。
输入:
- - 要朗读的文本
- 语音样本(WAV 文件,6-30 秒)
- 语言代码
输出:OGG 语音文件(克隆语音朗读该文本)
适用范围:任何语音!你的声音、名人、角色等。
你需要的唯一命令
bash
$(scripts/clonev.sh 你的文本 /路径/到/语音样本.wav 语言)
就这样!无需其他操作。
分步使用指南(适用于人工智能模型)
第 1 步:获取所需输入
- - 要朗读的文本(来自用户)
- 语音样本 WAV 文件的路径(来自用户)
- 语言代码(来自用户,或默认为 en)
第 2 步:运行脚本
bash
VOICE_FILE=$(scripts/clonev.sh 文本 /路径/到/样本.wav 语言)
第 3 步:使用输出
变量 $VOICE_FILE 现在包含生成的 OGG 文件的路径。
完整工作示例
示例 1:克隆语音并发送到 Telegram
bash
生成克隆语音
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh 你好,这是我的克隆语音! /mnt/c/TEMP/Recording 25.wav en)
发送到 Telegram(作为语音消息)
message action=send channel=telegram asVoice=true filePath=$VOICE
示例 2:克隆捷克语语音
bash
生成捷克语语音
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh Ahoj, tohle je můj hlas /mnt/c/TEMP/Recording 25.wav cs)
发送
message action=send channel=telegram asVoice=true filePath=$VOICE
示例 3:带检查的完整工作流程
bash
#!/bin/bash
生成语音
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh 任务完成! /路径/到/样本.wav en)
验证文件是否已创建
if [ -f $VOICE ]; then
echo 成功!语音文件:$VOICE
ls -lh $VOICE
else
echo 错误:语音文件未创建
fi
常用语言代码
| 代码 | 语言 | 使用示例 |
|---|
| en | 英语 | scripts/clonev.sh Hello sample.wav en |
| cs |
捷克语 | scripts/clonev.sh Ahoj sample.wav cs |
| de | 德语 | scripts/clonev.sh Hallo sample.wav de |
| fr | 法语 | scripts/clonev.sh Bonjour sample.wav fr |
| es | 西班牙语 | scripts/clonev.sh Hola sample.wav es |
完整列表:en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko
语音样本要求
- - 格式:WAV 文件
- 时长:6-30 秒(最佳:10-15 秒)
- 质量:清晰音频,无背景噪音
- 内容:任何语音(实际词语无关紧要)
好的样本:
- - ✅ 某人清晰说话的录音
- ✅ 背景无音乐或噪音
- ✅ 音量一致
差的样本:
- - ❌ 音乐或歌曲
- ❌ 严重背景噪音
- ❌ 太短(< 6 秒)
- ❌ 太长(> 30 秒)
⚠️ 重要说明
模型下载
- - 首次使用下载约 1.87GB 模型(一次性)
- 模型存储在:/mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/
- 状态:✅ 已下载
处理时间
- - 根据文本长度需要 20-40 秒
- 这是正常情况——语音克隆计算密集
故障排除
命令未找到
确保你在技能目录中,或使用完整路径:
bash
/home/bernie/clawd/skills/clonev/scripts/clonev.sh 文本 sample.wav en
语音样本未找到
- - 检查 WAV 文件的路径
- 使用绝对路径(以 / 开头)
- 确保文件存在:ls -la /路径/到/样本.wav
模型未找到
模型应自动下载。如果没有:
bash
cd /mnt/c/TEMP/Docker-containers/coqui-tts
docker run --rm --entrypoint \
-v $(pwd)/models-xtts:/root/.local/share/tts \
ghcr.io/coqui-ai/tts:latest \
python3 -c from TTS.api import TTS; TTS(tts
models/multilingual/multi-dataset/xttsv2)
语音质量差
- - 使用更清晰的语音样本
- 确保无背景噪音
- 尝试不同的样本(有些语音克隆效果更好)
快速参考卡(适用于人工智能模型)
用户:克隆我的声音并说你好
→ 获取:样本路径,文本=你好,语言=en
→ 运行:VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh 你好 /路径/到/样本.wav en)
→ 结果:$VOICE 包含 OGG 文件的路径
→ 发送:message action=send channel=telegram asVoice=true filePath=$VOICE
用户:让我说捷克语
→ 获取:样本路径,文本=Ahoj,语言=cs
→ 运行:VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh Ahoj /路径/到/样本.wav cs)
→ 发送:message action=send channel=telegram asVoice=true filePath=$VOICE
输出位置
生成的文件保存到:
/mnt/c/TEMP/Docker-containers/coqui-tts/output/clonev_output.ogg
脚本返回此路径,因此你可以直接使用。
总结
- 1. 仅使用脚本:scripts/clonev.sh
- 切勿尝试直接使用 Docker 容器
- 切勿尝试与 coqui-xtts 容器交互
- 脚本自动处理所有操作
- 返回准备发送的 OGG 文件路径
简单。只需使用脚本。
克隆任何语音。说任何语言。只需使用脚本。