Dub YouTube with Voice.ai
This skill follows the Agent Skills specification.
Turn any script into a YouTube-ready voiceover — complete with numbered segments, a stitched master, chapter timestamps, SRT captions, and a review page. Drop the voiceover onto an existing video to dub it in one command.
Built for YouTube creators who want studio-quality narration without the studio. Powered by Voice.ai.
When to use this skill
| Scenario | Why it fits |
|---|
| YouTube long-form | Full narration with chapter markers and captions |
| YouTube Shorts |
Quick hooks with punchy delivery |
|
Course content | Professional narration for educational videos |
|
Screen recordings | Dub a screencast with clean AI voiceover |
|
Quick iteration | Smart caching — edit one section, only that segment re-renders |
|
Batch production | Same voice, consistent quality across every video |
The one-command workflow
Have a script and a video? Dub it in one shot:
CODEBLOCK0
This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:
- -
out/my-youtube-video/muxed.mp4 — your video dubbed with the AI voiceover - INLINECODE1 — the standalone audio
- INLINECODE2 — listen and review each segment
- INLINECODE3 — paste directly into your YouTube description
- INLINECODE4 — upload to YouTube as subtitles
- INLINECODE5 — ready-made YouTube description with chapters
Use --sync pad if the audio is shorter than the video, or --sync trim to cut it to match.
Requirements
- - Node.js 20+ — runtime (no npm install needed — the CLI is a single bundled file)
- VOICEAIAPIKEY — set as environment variable or in a
.env file in the skill root. Get a key at voice.ai/dashboard. - ffmpeg (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video dubbing. The pipeline still produces individual segments, the review page, chapters, and captions without it.
Configuration
Set VOICE_AI_API_KEY as an environment variable before running:
CODEBLOCK1
The skill does not read .env files or access any files for credentials — only the environment variable.
Use --mock on any command to run the full pipeline without an API key (produces placeholder audio).
Commands
build — Generate a YouTube voiceover from a script
CODEBLOCK2
What it does:
- 1. Reads the script and splits it into segments (by
## headings for .md, or by sentence boundaries for .txt) - Optionally prepends/appends YouTube intro/outro segments
- Renders each segment via Voice.ai TTS
- Stitches a master audio file (if ffmpeg is available)
- Generates YouTube chapters, SRT captions, a review page, and a ready-made description
- Optionally dubs your video with the voiceover
Full options:
| Option | Description |
|---|
| INLINECODE16 | Script file (.txt or .md) — required |
| INLINECODE17 |
Voice alias or UUID —
required |
|
-t, --title <title> | Video title (defaults to filename) |
|
--template youtube | Auto-inject YouTube intro/outro |
|
--mode <mode> |
headings or
auto (default: headings for .md) |
|
--max-chars <n> | Max characters per auto-chunk (default: 1500) |
|
--language <code> | Language code (default: en) |
|
--video <path> | Input video to dub |
|
--mux | Enable video dubbing (requires --video) |
|
--sync <policy> |
shortest,
pad, or
trim (default: shortest) |
|
--force | Re-render all segments (ignore cache) |
|
--mock | Mock mode — no API calls, placeholder audio |
|
-o, --out <dir> | Custom output directory |
replace-audio — Dub an existing video
CODEBLOCK3
Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.
| Sync policy | Behavior |
|---|
| INLINECODE35 (default) | Output ends when the shorter track ends |
| INLINECODE36 |
Pad audio with silence to match video duration |
|
trim | Trim audio to match video duration |
Video stream is copied without re-encoding (-c:v copy). Audio is encoded as AAC for YouTube compatibility.
Privacy: Video processing is entirely local. Only script text is sent to Voice.ai for TTS. Your video files never leave your machine.
voices — List available voices
CODEBLOCK4
Available voices
Use short aliases or full UUIDs with --voice:
| Alias | Voice | Gender | Best for YouTube |
|---|
| INLINECODE41 | Ellie | F | Vlogs, lifestyle, social content |
| INLINECODE42 |
Oliver | M | Tutorials, narration, explainers |
|
lilith | Lilith | F | ASMR, calm walkthroughs |
|
smooth | Smooth Calm Voice | M | Documentaries, long-form essays |
|
corpse | Corpse Husband | M | Gaming, entertainment |
|
skadi | Skadi | F | Anime, character content |
|
zhongli| Zhongli | M | Gaming, dramatic intros |
|
flora | Flora | F | Kids content, upbeat videos |
|
chief | Master Chief | M | Gaming, action trailers |
The voices command also returns any additional voices available on the API. Voice list is cached for 10 minutes.
Build outputs
After a build, the output directory contains everything you need to publish on YouTube:
CODEBLOCK5
YouTube workflow
- 1. Run the build command
- Upload
muxed.mp4 (or your original video + master.mp3 as audio) - Paste
chapters.txt content into your YouTube description - Upload
captions.srt as subtitles in YouTube Studio - Done — professional narration, chapters, and captions in minutes
YouTube template
Use --template youtube to auto-inject a branded intro and outro:
| Segment | Source file |
|---|
| Intro (prepended) | INLINECODE56 |
| Outro (appended) |
templates/youtube_outro.txt |
Edit the files in templates/ to customize your channel's branding.
Caching
Segments are cached by a hash of: text content + voice ID + language.
- - Unchanged segments are skipped on rebuild — fast iteration
- Modified segments are re-rendered automatically
- Use
--force to re-render everything - Cache manifest is stored in INLINECODE61
Multilingual dubbing
Voice.ai supports 11 languages — dub your YouTube videos for global audiences:
INLINECODE62 , es, fr, de, it, pt, pl, ru, nl, sv, INLINECODE72
CODEBLOCK6
The pipeline auto-selects the multilingual TTS model for non-English languages.
Troubleshooting
| Issue | Solution |
|---|
| ffmpeg missing | Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for stitching and video dubbing. |
| Rate limits (429) |
Segments render sequentially, which stays under most limits. Wait and retry. |
|
Insufficient credits (402) | Top up at
voice.ai/dashboard. Cached segments won't re-use credits on retry. |
|
Long scripts | Caching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls. |
|
Windows paths | Wrap paths with spaces in quotes:
--input "C:\My Scripts\script.md" |
See references/TROUBLESHOOTING.md for more.
References
使用Voice.ai为YouTube视频配音
本技能遵循Agent Skills规范。
将任何脚本转化为可直接用于YouTube的配音——包含编号分段、拼接主文件、章节时间戳、SRT字幕和审阅页面。将配音拖放到现有视频上,一条命令即可完成配音。
专为希望获得工作室级旁白但无需工作室的YouTube创作者打造。由Voice.ai提供支持。
何时使用本技能
| 场景 | 适用原因 |
|---|
| YouTube长视频 | 带章节标记和字幕的完整旁白 |
| YouTube短视频 |
快速吸引注意力的有力表达 |
|
课程内容 | 教育视频的专业旁白 |
|
屏幕录制 | 用干净的AI配音为录屏内容配音 |
|
快速迭代 | 智能缓存——编辑一个部分,仅该分段重新渲染 |
|
批量制作 | 每个视频使用相同声音,保持一致质量 |
一键工作流程
有脚本和视频?一键配音:
bash
node voiceai-vo.cjs build \
--input my-script.md \
--voice oliver \
--title 我的YouTube视频 \
--video ./my-recording.mp4 \
--mux \
--template youtube
这会渲染配音、拼接主音频并将其放置到你的视频上——全部一条命令完成。输出:
- - out/my-youtube-video/muxed.mp4 — 你的视频已配上AI配音
- out/my-youtube-video/master.wav — 独立音频文件
- out/my-youtube-video/review.html — 收听和审阅每个分段
- out/my-youtube-video/chapters.txt — 直接粘贴到你的YouTube描述中
- out/my-youtube-video/captions.srt — 作为字幕上传到YouTube
- out/my-youtube-video/description.txt — 带章节的现成YouTube描述
如果音频比视频短,使用--sync pad;要裁剪以匹配,使用--sync trim。
要求
- - Node.js 20+ — 运行时(无需npm安装——CLI是单个打包文件)
- VOICEAIAPIKEY — 设置为环境变量或放在技能根目录的.env文件中。在voice.ai/dashboard获取密钥。
- ffmpeg(可选)— 用于主文件拼接、MP3编码、响度归一化和视频配音。没有它,流程仍可生成单个分段、审阅页面、章节和字幕。
配置
在运行前将VOICEAIAPI_KEY设置为环境变量:
bash
export VOICEAIAPI_KEY=your-key-here
本技能不读取.env文件或访问任何凭据文件——仅使用环境变量。
在任何命令上使用--mock可在没有API密钥的情况下运行完整流程(生成占位音频)。
命令
build — 从脚本生成YouTube配音
bash
node voiceai-vo.cjs build \
--input \
--voice \
--title 我的YouTube视频 \
[--template youtube] \
[--video input.mp4 --mux --sync shortest] \
[--force] [--mock]
功能:
- 1. 读取脚本并将其分割为分段(对于.md按##标题分割,对于.txt按句子边界分割)
- 可选地在开头/结尾添加YouTube片头/片尾分段
- 通过Voice.ai TTS渲染每个分段
- 拼接主音频文件(如果ffmpeg可用)
- 生成YouTube章节、SRT字幕、审阅页面和现成描述
- 可选地用配音为你的视频配音
完整选项:
| 选项 | 描述 |
|---|
| -i, --input <路径> | 脚本文件(.txt或.md)— 必填 |
| -v, --voice <ID> |
声音别名或UUID —
必填 |
| -t, --title <标题> | 视频标题(默认为文件名) |
| --template youtube | 自动注入YouTube片头/片尾 |
| --mode <模式> | headings或auto(对于.md默认为headings) |
| --max-chars
| 每个自动分块的最大字符数(默认:1500) |
| --language <代码> | 语言代码(默认:en) |
| --video <路径> | 要配音的输入视频 |
| --mux | 启用视频配音(需要--video) |
| --sync <策略> | shortest、pad或trim(默认:shortest) |
| --force | 重新渲染所有分段(忽略缓存) |
| --mock | 模拟模式——无API调用,占位音频 |
| -o, --out <目录> | 自定义输出目录 |
replace-audio — 为现有视频配音
bash
node voiceai-vo.cjs replace-audio \
--video ./my-video.mp4 \
--audio ./out/my-video/master.wav \
[--out ./out/my-video/dubbed.mp4] \
[--sync shortest|pad|trim]
需要ffmpeg。如果未安装,则生成辅助shell/PowerShell脚本代替。
| 同步策略 | 行为 |
|---|
| shortest(默认) | 输出在较短轨道结束时结束 |
| pad |
用静音填充音频以匹配视频时长 |
| trim | 裁剪音频以匹配视频时长 |
视频流被复制而不重新编码(-c:v copy)。音频编码为AAC以兼容YouTube。
隐私: 视频处理完全在本地进行。只有脚本文本被发送到Voice.ai进行TTS。你的视频文件永远不会离开你的机器。
voices — 列出可用声音
bash
node voiceai-vo.cjs voices [--limit 20] [--query deep] [--mock]
可用声音
使用简短别名或完整UUID与--voice一起使用:
| 别名 | 声音 | 性别 | 最适合YouTube |
|---|
| ellie | Ellie | 女 | Vlog、生活方式、社交内容 |
| oliver |
Oliver | 男 | 教程、旁白、解说 |
| lilith | Lilith | 女 | ASMR、平静的引导 |
| smooth | Smooth Calm Voice | 男 | 纪录片、长文内容 |
| corpse | Corpse Husband | 男 | 游戏、娱乐 |
| skadi | Skadi | 女 | 动漫、角色内容 |
| zhongli | Zhongli | 男 | 游戏、戏剧性开场 |
| flora | Flora | 女 | 儿童内容、积极向上的视频 |
| chief | Master Chief | 男 | 游戏、动作预告片 |
voices命令也会返回API上可用的任何其他声音。声音列表缓存10分钟。
构建输出
构建后,输出目录包含在YouTube上发布所需的一切:
out/<标题-slug>/
segments/ # 编号WAV文件(001-intro.wav, 002-section.wav, …)
master.wav # 拼接的配音(需要ffmpeg)
master.mp3 # 用于上传的MP3(需要ffmpeg)
muxed.mp4 # 配音后的视频(如果使用了--video --mux)
chapters.txt # 粘贴到YouTube描述中
captions.srt # 作为YouTube字幕上传
description.txt # 带章节的现成YouTube描述
review.html # 带音频播放器的交互式审阅页面
manifest.json # 构建元数据:声音、模板、分段列表
timeline.json # 分段时长和开始时间
YouTube工作流程
- 1. 运行构建命令
- 上传muxed.mp4(或你的原始视频 + master.mp3作为音频)
- 将chapters.txt内容粘贴到你的YouTube描述中
- 在YouTube Studio中上传captions.srt作为字幕
- 完成——几分钟内获得专业旁白、章节和字幕
YouTube模板
使用--template youtube自动注入品牌片头和片尾:
| 分段 | 源文件 |
|---|
| 片头(前置) | templates/youtubeintro.txt |
| 片尾(后置) |
templates/youtubeoutro.txt |
编辑templates/中的文件以自定义你的频道品牌。
缓存
分段通过以下内容的哈希值进行缓存:文本内容 + 声音ID + 语言。