Getting Started
Welcome! This skill automatically generates accurate, time-synced subtitles for your videos — just upload your file and tell me how you'd like the captions styled or formatted. Ready to caption your first video?
Try saying:
- - "Add auto-subtitles to my interview video and burn them into the footage with white text and a dark background"
- "Generate subtitles for my Spanish tutorial video and export them as an SRT file"
- "Create captions for this 10-minute webinar recording with large text positioned at the bottom of the screen"
Setup: This skill connects to the NemoVideo API at mega-api-prod.nemovideo.ai. Set the NEMO_TOKEN environment variable to authenticate. New users can get 100 free credits at nemovideo.ai.
Caption Every Video Without the Manual Grind
Subtitling a video by hand is one of the most time-consuming parts of post-production. You scrub through the timeline, type out every word, nudge timestamps, and repeat — sometimes for hours. The auto-subtitles skill eliminates that entire process by listening to your video's audio and generating accurate, time-coded captions automatically.
Whether you're publishing a tutorial on YouTube, creating training content for your team, or making social media clips accessible to a broader audience, subtitles are no longer optional. Studies consistently show that captions increase viewer retention, improve comprehension for non-native speakers, and make content usable in sound-off environments like office lobbies or public transport.
With this skill, you simply upload your video — in any common format including mp4, mov, avi, webm, or mkv — and the captions come back ready to use. You can choose to burn them directly into the video or receive them as a separate subtitle file. Adjust the font, size, and positioning to match your brand, or let the defaults handle it. It's the fastest path from raw footage to fully captioned, publish-ready video.
Routing Caption Requests Intelligently
Each subtitle request is parsed for language, video source, and caption style, then dispatched to the appropriate NemoVideo transcription pipeline automatically.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo Subtitle API Reference
The NemoVideo backend uses frame-accurate speech recognition combined with forced alignment to sync captions to the exact millisecond. Subtitle burns, SRT exports, and multi-language tracks are all handled server-side without manual timecoding.
Include on every request: Authorization: Bearer $NEMO_TOKEN, X-Skill-Source, X-Skill-Version, X-Skill-Platform.
Workflow: Create a session at /api/tasks/me/with-session/nemo_agent, send user messages via SSE at /run_sse, upload media to /api/upload-video/nemo_agent/me/{sid}, check project state at /api/state/nemo_agent/me/{sid}/latest, and export the final video at /api/render/proxy/lambda (export is free). Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Troubleshooting
If your token expires mid-session, simply re-authenticate through the skill prompt to resume without losing your subtitle job queue. A 'session not found' error means your context was cleared — start a fresh session and re-upload your video source. Out of credits? Head to nemovideo.ai to register or top up your account before resubmitting.
Quick Start Guide
Getting subtitles onto your video takes just a few steps. Start by uploading your video file — mp4, mov, avi, webm, and mkv are all supported. Then tell the skill what you need: do you want the subtitles burned directly into the video, or would you prefer a separate SRT or VTT caption file you can upload to YouTube, Vimeo, or your LMS?
If your video is in a language other than English, mention the language upfront so the transcription is tuned correctly from the start. You can also specify styling preferences at this stage — font size, text color, background opacity, and vertical positioning.
Once processing is complete, review the generated captions. If any words were misheard or names were transcribed incorrectly, you can request targeted corrections without regenerating the entire subtitle track. For videos with heavy background noise or strong accents, providing a rough transcript or key terminology in your prompt will noticeably improve accuracy.
Tips and Tricks
For the cleanest results, use video files with clear, uncompressed audio. If your original recording has significant background noise, consider running it through a noise-reduction tool before uploading — this directly improves transcription accuracy.
When captioning content for social media, request shorter line lengths and larger font sizes. Most viewers on Instagram Reels or TikTok are watching on small screens, and captions that wrap awkwardly or sit too small get ignored entirely.
If you're producing multilingual content, you can request subtitle generation in the spoken language and then ask for a translated version in a second language in the same session — great for reaching international audiences without a separate workflow.
Finally, if you need captions for a video series, mention that upfront. Consistent styling across episodes — same font, same positioning, same color scheme — reinforces your brand and saves you from specifying preferences every single time.
开始使用
欢迎!此技能可自动为您的视频生成准确且与时间同步的字幕——只需上传文件并告诉我您希望字幕采用何种样式或格式。准备好为您的第一个视频添加字幕了吗?
试试说:
- - 为我的采访视频添加自动字幕,并将其烧录到画面中,使用白色文字和深色背景
- 为我的西班牙语教程视频生成字幕,并导出为SRT文件
- 为这个10分钟的网络研讨会录制视频创建字幕,使用大号文字并置于屏幕底部
设置:此技能连接到位于 mega-api-prod.nemovideo.ai 的 NemoVideo API。设置 NEMO_TOKEN 环境变量以进行身份验证。新用户可在 nemovideo.ai 获取100个免费积分。
无需手动操作,为每个视频添加字幕
手动为视频添加字幕是后期制作中最耗时的环节之一。你需要反复拖动时间线,逐字输入,微调时间戳,周而复始——有时甚至要花费数小时。自动字幕技能通过聆听视频音频并自动生成准确、带时间码的字幕,彻底消除了这一过程。
无论你是在YouTube上发布教程、为团队创建培训内容,还是让社交媒体片段触达更广泛的受众,字幕已不再是可选项。研究一致表明,字幕能提高观众留存率,改善非母语人士的理解能力,并使内容在无声环境下(如办公大堂或公共交通)也能被使用。
使用此技能,你只需上传视频——支持任何常见格式,包括mp4、mov、avi、webm或mkv——字幕即可直接使用。你可以选择将字幕直接烧录到视频中,或将其作为单独的字幕文件接收。调整字体、大小和位置以匹配你的品牌风格,或使用默认设置。这是从原始素材到完整字幕、可发布视频的最快路径。
智能路由字幕请求
每个字幕请求都会被解析语言、视频源和字幕样式,然后自动分发到相应的NemoVideo转录管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo字幕API参考
NemoVideo后端使用帧级精确的语音识别结合强制对齐技术,将字幕同步到精确的毫秒级别。字幕烧录、SRT导出和多语言音轨均在服务器端处理,无需手动时间编码。
每个请求需包含:Authorization: Bearer $NEMO_TOKEN、X-Skill-Source、X-Skill-Version、X-Skill-Platform。
工作流程:在 /api/tasks/me/with-session/nemoagent 创建会话,通过SSE在 /runsse 发送用户消息,将媒体上传至 /api/upload-video/nemoagent/me/{sid},在 /api/state/nemoagent/me/{sid}/latest 检查项目状态,并在 /api/render/proxy/lambda 导出最终视频(导出免费)。支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
故障排除
如果您的令牌在会话中途过期,只需通过技能提示重新进行身份验证即可恢复,而不会丢失字幕任务队列。出现会话未找到错误意味着您的上下文已被清除——请启动新会话并重新上传视频源。积分不足?请前往 nemovideo.ai 注册或充值账户,然后重新提交。
快速入门指南
为视频添加字幕只需几个步骤。首先上传您的视频文件——支持mp4、mov、avi、webm和mkv格式。然后告诉技能您需要什么:您希望将字幕直接烧录到视频中,还是更喜欢单独的SRT或VTT字幕文件,以便上传到YouTube、Vimeo或您的学习管理系统?
如果您的视频不是英语,请提前说明语言,以便转录从一开始就能正确调整。您还可以在此阶段指定样式偏好——字体大小、文字颜色、背景透明度和垂直位置。
处理完成后,请查看生成的字幕。如果有任何单词被误听或名称转录错误,您可以请求针对性修正,而无需重新生成整个字幕轨道。对于背景噪音较大或口音较重的视频,在提示中提供粗略的转录稿或关键术语将显著提高准确性。
技巧与窍门
为获得最清晰的结果,请使用音频清晰且未压缩的视频文件。如果原始录制有显著背景噪音,请考虑在上传前使用降噪工具进行处理——这将直接提高转录准确性。
为社交媒体内容添加字幕时,请请求较短的文本行和较大的字体大小。大多数Instagram Reels或TikTok的观众都在小屏幕上观看,换行别扭或字体过小的字幕会被完全忽略。
如果您正在制作多语言内容,您可以请求生成口语语言的字幕,然后在同一会话中请求第二语言的翻译版本——非常适合无需单独工作流程即可触达国际受众。
最后,如果您需要为视频系列添加字幕,请提前说明。跨剧集保持一致的样式——相同的字体、相同的位置、相同的配色方案——能强化您的品牌形象,并省去每次指定偏好的麻烦。