Video Audio Replace
Replace a video's original audio with TTS-generated voice while maintaining precise timing alignment. Also supports generating subtitles from video using Whisper.
Full Workflow
Step 1: Generate subtitles from video (optional)
If you don't have an SRT file, generate one from the video using the included script:
CODEBLOCK0
Or manually with Python:
CODEBLOCK1
Step 2: Replace audio with TTS
Use the generated SRT to create a new video with TTS voice.
When to use
- - Dubbing videos with AI-generated voice
- Converting subtitle files to voice-over
- Creating multilingual video versions
Requirements
API Keys (choose one)
- - ElevenLabs: Set
ELEVENLABS_API_KEY environment variable - Edge TTS (free, no key needed): Use INLINECODE1
System dependencies
- - ffmpeg
- sox (optional, for advanced processing)
Usage
Basic usage (ElevenLabs)
CODEBLOCK2
Using Edge TTS (free, no API key)
CODEBLOCK3
Options
| Option | Description | Default |
|---|
| INLINECODE2 | Input video file | Required |
| INLINECODE3 |
SRT subtitle file | Required |
|
--output | Output video file | input_tts.mp4 |
|
--voice | Voice ID or name | Liam (ElevenLabs) |
|
--engine | TTS engine: elevenlabs, edge | elevenlabs |
|
--speed-range | Speed adjustment range | 0.85-1.15 |
Examples
English voice (ElevenLabs)
CODEBLOCK4
Chinese voice (Edge TTS)
CODEBLOCK5
How it works
- 1. Extract original audio from video
- Split audio into segments based on subtitle timestamps
- Generate TTS audio for each subtitle segment
- Adjust TTS speed (within 0.85-1.15x) to match original segment duration
- Add silence padding to fill any remaining time gap
- Merge all segments preserving original timing gaps
- Replace video audio with aligned TTS audio
Available Voices
ElevenLabs (require API key)
- -
Liam - Energetic male (recommended) - INLINECODE9 - Professional female
- INLINECODE10 - Deep resonant male
- Run
curl with your API key to list all voices
Edge TTS (free)
- - Chinese:
zh-CN-XiaoxiaoNeural, zh-CN-YunxiNeural, INLINECODE14 - English:
en-US-JennyNeural, INLINECODE16 - Many more languages available
视频音频替换
将视频原始音频替换为TTS生成的语音,同时保持精确的时间对齐。还支持使用Whisper从视频生成字幕。
完整工作流程
步骤1:从视频生成字幕(可选)
如果没有SRT文件,可使用附带的脚本从视频生成字幕:
bash
从视频生成字幕(使用faster-whisper,免费,本地运行)
generate_subtitles.py video.mp4 -o subtitles.srt -l zh
或手动使用Python:
bash
使用faster-whisper(推荐,本地运行,免费)
pip install faster-whisper srt
python3 << EOF
from faster_whisper import WhisperModel
import srt
from datetime import timedelta
model = WhisperModel(base, device=cpu, compute_type=int8)
segments, info = model.transcribe(input_video.mp4, language=zh)
生成SRT
def format_time(seconds):
td = timedelta(seconds=seconds)
return f{td.seconds//3600:02d}:{(td.seconds%3600)//60:02d}:{td.seconds%60:02d},{td.microseconds//1000:03d}
srt_content =
for i, seg in enumerate(segments, 1):
start = format_time(seg.start)
end = format_time(seg.end)
srt_content += f{i}\n{start} --> {end}\n{seg.text.strip()}\n\n
with open(subtitles.srt, w, encoding=utf-8) as f:
f.write(srt_content)
EOF
步骤2:使用TTS替换音频
使用生成的SRT文件创建带有TTS语音的新视频。
适用场景
- - 使用AI生成语音为视频配音
- 将字幕文件转换为配音
- 创建多语言视频版本
系统要求
API密钥(任选其一)
- - ElevenLabs:设置ELEVENLABSAPIKEY环境变量
- Edge TTS(免费,无需密钥):使用--engine edge
系统依赖
使用方法
基本用法(ElevenLabs)
bash
video-audio-replace --video input.mp4 --srt subtitles.srt --output output.mp4 --voice Liam
使用Edge TTS(免费,无需API密钥)
bash
video-audio-replace --video input.mp4 --srt subtitles.srt --output output.mp4 --engine edge --voice zh-CN-YunxiNeural
选项
| 选项 | 描述 | 默认值 |
|---|
| --video | 输入视频文件 | 必需 |
| --srt |
SRT字幕文件 | 必需 |
| --output | 输出视频文件 | input_tts.mp4 |
| --voice | 语音ID或名称 | Liam (ElevenLabs) |
| --engine | TTS引擎:elevenlabs, edge | elevenlabs |
| --speed-range | 速度调整范围 | 0.85-1.15 |
示例
英语语音(ElevenLabs)
bash
video-audio-replace --video 2028.mp4 --srt 2028.srt --output 2028_final.mp4 --voice Liam
中文语音(Edge TTS)
bash
video-audio-replace --video video.mp4 --srt subs.srt --output result.mp4 --engine edge --voice zh-CN-YunxiNeural
工作原理
- 1. 从视频中提取原始音频
- 根据字幕时间戳将音频分割成片段
- 为每个字幕片段生成TTS音频
- 调整TTS速度(在0.85-1.15倍范围内)以匹配原始片段时长
- 添加静音填充以填补剩余时间间隙
- 合并所有片段,保留原始时间间隔
- 用对齐后的TTS音频替换视频音频
可用语音
ElevenLabs(需要API密钥)
- - Liam - 充满活力的男声(推荐)
- Sarah - 专业女声
- Brian - 深沉共鸣男声
- 使用curl命令配合API密钥列出所有语音
Edge TTS(免费)
- - 中文:zh-CN-XiaoxiaoNeural、zh-CN-YunxiNeural、zh-CN-YunyangNeural
- 英语:en-US-JennyNeural、en-US-GuyNeural
- 支持更多语言