Dub YouTube with Voice.ai

This skill follows the Agent Skills specification.

Turn any script into a YouTube-ready voiceover — complete with numbered segments, a stitched master, chapter timestamps, SRT captions, and a review page. Drop the voiceover onto an existing video to dub it in one command.

Built for YouTube creators who want studio-quality narration without the studio. Powered by Voice.ai.

When to use this skill

Scenario	Why it fits
YouTube long-form	Full narration with chapter markers and captions
YouTube Shorts

The one-command workflow

Have a script and a video? Dub it in one shot:

CODEBLOCK0

This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:

- out/my-youtube-video/muxed.mp4 — your video dubbed with the AI voiceover
INLINECODE1 — the standalone audio
INLINECODE2 — listen and review each segment
INLINECODE3 — paste directly into your YouTube description
INLINECODE4 — upload to YouTube as subtitles
INLINECODE5 — ready-made YouTube description with chapters

Use --sync pad if the audio is shorter than the video, or --sync trim to cut it to match.

Requirements

- Node.js 20+ — runtime (no npm install needed — the CLI is a single bundled file)
VOICEAIAPIKEY — set as environment variable or in a .env file in the skill root. Get a key at voice.ai/dashboard.
ffmpeg (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video dubbing. The pipeline still produces individual segments, the review page, chapters, and captions without it.

Configuration

Set VOICE_AI_API_KEY as an environment variable before running:

CODEBLOCK1

The skill does not read .env files or access any files for credentials — only the environment variable.

Use --mock on any command to run the full pipeline without an API key (produces placeholder audio).

Commands

build — Generate a YouTube voiceover from a script

CODEBLOCK2

What it does:

1. Reads the script and splits it into segments (by ## headings for .md, or by sentence boundaries for .txt)
Optionally prepends/appends YouTube intro/outro segments
Renders each segment via Voice.ai TTS
Stitches a master audio file (if ffmpeg is available)
Generates YouTube chapters, SRT captions, a review page, and a ready-made description
Optionally dubs your video with the voiceover

Full options:

Option Description
INLINECODE16 Script file (.txt or .md) — required
INLINECODE17
Voice alias or UUID — required |
| -t, --title <title> | Video title (defaults to filename) |
| --template youtube | Auto-inject YouTube intro/outro |
| --mode <mode> | headings or auto (default: headings for .md) |
| --max-chars <n> | Max characters per auto-chunk (default: 1500) |
| --language <code> | Language code (default: en) |
| --video <path> | Input video to dub |
| --mux | Enable video dubbing (requires --video) |
| --sync <policy> | shortest, pad, or trim (default: shortest) |
| --force | Re-render all segments (ignore cache) |
| --mock | Mock mode — no API calls, placeholder audio |
| -o, --out <dir> | Custom output directory |

replace-audio — Dub an existing video

CODEBLOCK3

Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.

Sync policy Behavior
INLINECODE35 (default) Output ends when the shorter track ends
INLINECODE36
Pad audio with silence to match video duration |
| trim | Trim audio to match video duration |

Video stream is copied without re-encoding (-c:v copy). Audio is encoded as AAC for YouTube compatibility.

Privacy: Video processing is entirely local. Only script text is sent to Voice.ai for TTS. Your video files never leave your machine.

voices — List available voices

CODEBLOCK4

Available voices

Use short aliases or full UUIDs with --voice:

Alias Voice Gender Best for YouTube
INLINECODE41 Ellie F Vlogs, lifestyle, social content
INLINECODE42
Oliver | M | Tutorials, narration, explainers |
| lilith | Lilith | F | ASMR, calm walkthroughs |
| smooth | Smooth Calm Voice | M | Documentaries, long-form essays |
| corpse | Corpse Husband | M | Gaming, entertainment |
| skadi | Skadi | F | Anime, character content |
| zhongli| Zhongli | M | Gaming, dramatic intros |
| flora | Flora | F | Kids content, upbeat videos |
| chief | Master Chief | M | Gaming, action trailers |

The voices command also returns any additional voices available on the API. Voice list is cached for 10 minutes.

Build outputs

After a build, the output directory contains everything you need to publish on YouTube:

CODEBLOCK5

YouTube workflow

1. Run the build command
Upload muxed.mp4 (or your original video + master.mp3 as audio)
Paste chapters.txt content into your YouTube description
Upload captions.srt as subtitles in YouTube Studio
Done — professional narration, chapters, and captions in minutes

YouTube template

Use --template youtube to auto-inject a branded intro and outro:

Segment Source file
Intro (prepended) INLINECODE56
Outro (appended)
templates/youtube_outro.txt |

Edit the files in templates/ to customize your channel's branding.

Caching

Segments are cached by a hash of: text content + voice ID + language.

- Unchanged segments are skipped on rebuild — fast iteration
Modified segments are re-rendered automatically
Use --force to re-render everything
Cache manifest is stored in INLINECODE61

Multilingual dubbing

Voice.ai supports 11 languages — dub your YouTube videos for global audiences:

INLINECODE62, es, fr, de, it, pt, pl, ru, nl, sv, INLINECODE72

CODEBLOCK6

The pipeline auto-selects the multilingual TTS model for non-English languages.

Troubleshooting

Issue Solution
ffmpeg missing Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for stitching and video dubbing.
Rate limits (429)
Segments render sequentially, which stays under most limits. Wait and retry. |
| Insufficient credits (402) | Top up at voice.ai/dashboard. Cached segments won't re-use credits on retry. |
| Long scripts | Caching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls. |
| Windows paths | Wrap paths with spaces in quotes: --input "C:\My Scripts\script.md" |

See references/TROUBLESHOOTING.md for more.

References

- Agent Skills Specification
Voice.ai
references/VOICEAI_API.md — API endpoints, audio formats, models
references/TROUBLESHOOTING.md — Common issues and fixes

Option	Description
INLINECODE16	Script file (.txt or .md) — required
INLINECODE17

Sync policy	Behavior
INLINECODE35 (default)	Output ends when the shorter track ends
INLINECODE36

Alias	Voice	Gender	Best for YouTube
INLINECODE41	Ellie	F	Vlogs, lifestyle, social content
INLINECODE42

Segment	Source file
Intro (prepended)	INLINECODE56
Outro (appended)

Issue	Solution
ffmpeg missing	Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for stitching and video dubbing.
Rate limits (429)

使用Voice.ai为YouTube视频配音

本技能遵循Agent Skills规范。

将任何脚本转化为可直接用于YouTube的配音——包含编号分段、拼接主文件、章节时间戳、SRT字幕和审阅页面。将配音拖放到现有视频上，一条命令即可完成配音。

专为希望获得工作室级旁白但无需工作室的YouTube创作者打造。由Voice.ai提供支持。

何时使用本技能

场景适用原因
YouTube长视频 带章节标记和字幕的完整旁白
YouTube短视频
快速吸引注意力的有力表达 |
| 课程内容 | 教育视频的专业旁白 |
| 屏幕录制 | 用干净的AI配音为录屏内容配音 |
| 快速迭代 | 智能缓存——编辑一个部分，仅该分段重新渲染 |
| 批量制作 | 每个视频使用相同声音，保持一致质量 |

一键工作流程

有脚本和视频？一键配音：

bash
node voiceai-vo.cjs build \
--input my-script.md \
--voice oliver \
--title 我的YouTube视频 \
--video ./my-recording.mp4 \
--mux \
--template youtube

这会渲染配音、拼接主音频并将其放置到你的视频上——全部一条命令完成。输出：

- out/my-youtube-video/muxed.mp4 — 你的视频已配上AI配音
out/my-youtube-video/master.wav — 独立音频文件
out/my-youtube-video/review.html — 收听和审阅每个分段
out/my-youtube-video/chapters.txt — 直接粘贴到你的YouTube描述中
out/my-youtube-video/captions.srt — 作为字幕上传到YouTube
out/my-youtube-video/description.txt — 带章节的现成YouTube描述

如果音频比视频短，使用--sync pad；要裁剪以匹配，使用--sync trim。

要求

- Node.js 20+ — 运行时（无需npm安装——CLI是单个打包文件）
VOICEAIAPIKEY — 设置为环境变量或放在技能根目录的.env文件中。在voice.ai/dashboard获取密钥。
ffmpeg（可选）— 用于主文件拼接、MP3编码、响度归一化和视频配音。没有它，流程仍可生成单个分段、审阅页面、章节和字幕。

配置

在运行前将VOICEAIAPI_KEY设置为环境变量：

bash
export VOICEAIAPI_KEY=your-key-here

本技能不读取.env文件或访问任何凭据文件——仅使用环境变量。

在任何命令上使用--mock可在没有API密钥的情况下运行完整流程（生成占位音频）。

命令

build — 从脚本生成YouTube配音

bash
node voiceai-vo.cjs build \
--input \
--voice \
--title 我的YouTube视频 \
[--template youtube] \
[--video input.mp4 --mux --sync shortest] \
[--force] [--mock]

功能：

1. 读取脚本并将其分割为分段（对于.md按##标题分割，对于.txt按句子边界分割）
可选地在开头/结尾添加YouTube片头/片尾分段
通过Voice.ai TTS渲染每个分段
拼接主音频文件（如果ffmpeg可用）
生成YouTube章节、SRT字幕、审阅页面和现成描述
可选地用配音为你的视频配音

完整选项：

选项描述
-i, --input <路径> 脚本文件（.txt或.md）— 必填
-v, --voice <ID>
声音别名或UUID — 必填 |
| -t, --title <标题> | 视频标题（默认为文件名） |
| --template youtube | 自动注入YouTube片头/片尾 |
| --mode <模式> | headings或auto（对于.md默认为headings） |
| --max-chars | 每个自动分块的最大字符数（默认：1500） |
| --language <代码> | 语言代码（默认：en） |
| --video <路径> | 要配音的输入视频 |
| --mux | 启用视频配音（需要--video） |
| --sync <策略> | shortest、pad或trim（默认：shortest） |
| --force | 重新渲染所有分段（忽略缓存） |
| --mock | 模拟模式——无API调用，占位音频 |
| -o, --out <目录> | 自定义输出目录 |

replace-audio — 为现有视频配音

bash
node voiceai-vo.cjs replace-audio \
--video ./my-video.mp4 \
--audio ./out/my-video/master.wav \
[--out ./out/my-video/dubbed.mp4] \
[--sync shortest|pad|trim]

需要ffmpeg。如果未安装，则生成辅助shell/PowerShell脚本代替。

同步策略行为
shortest（默认）输出在较短轨道结束时结束
pad
用静音填充音频以匹配视频时长 |
| trim | 裁剪音频以匹配视频时长 |

视频流被复制而不重新编码（-c:v copy）。音频编码为AAC以兼容YouTube。

隐私： 视频处理完全在本地进行。只有脚本文本被发送到Voice.ai进行TTS。你的视频文件永远不会离开你的机器。

voices — 列出可用声音

bash
node voiceai-vo.cjs voices [--limit 20] [--query deep] [--mock]

可用声音

使用简短别名或完整UUID与--voice一起使用：

别名声音性别最适合YouTube
ellie Ellie 女 Vlog、生活方式、社交内容
oliver
Oliver | 男 | 教程、旁白、解说 |
| lilith | Lilith | 女 | ASMR、平静的引导 |
| smooth | Smooth Calm Voice | 男 | 纪录片、长文内容 |
| corpse | Corpse Husband | 男 | 游戏、娱乐 |
| skadi | Skadi | 女 | 动漫、角色内容 |
| zhongli | Zhongli | 男 | 游戏、戏剧性开场 |
| flora | Flora | 女 | 儿童内容、积极向上的视频 |
| chief | Master Chief | 男 | 游戏、动作预告片 |

voices命令也会返回API上可用的任何其他声音。声音列表缓存10分钟。

构建输出

构建后，输出目录包含在YouTube上发布所需的一切：

out/<标题-slug>/
segments/ # 编号WAV文件（001-intro.wav, 002-section.wav, …）
master.wav # 拼接的配音（需要ffmpeg）
master.mp3 # 用于上传的MP3（需要ffmpeg）
muxed.mp4 # 配音后的视频（如果使用了--video --mux）
chapters.txt # 粘贴到YouTube描述中
captions.srt # 作为YouTube字幕上传
description.txt # 带章节的现成YouTube描述
review.html # 带音频播放器的交互式审阅页面
manifest.json # 构建元数据：声音、模板、分段列表
timeline.json # 分段时长和开始时间

YouTube工作流程

1. 运行构建命令
上传muxed.mp4（或你的原始视频 + master.mp3作为音频）
将chapters.txt内容粘贴到你的YouTube描述中
在YouTube Studio中上传captions.srt作为字幕
完成——几分钟内获得专业旁白、章节和字幕

YouTube模板

使用--template youtube自动注入品牌片头和片尾：

分段源文件
片头（前置） templates/youtubeintro.txt
片尾（后置）
templates/youtubeoutro.txt |

编辑templates/中的文件以自定义你的频道品牌。

缓存

分段通过以下内容的哈希值进行缓存：文本内容 + 声音ID + 语言。

- 未更改

dub-youtube-with-voiceaiYouTube配音AI