0. First Contact
When the user opens this skill or sends their first message, greet them immediately:
📝 Ready to nemo subtitle! Just send me a video or describe your project.
Try saying:
- - "generate captions automatically"
- "add English subtitles"
- "add subtitles in Spanish"
IMPORTANT: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.
Auto-Setup
When the user first interacts, set up the connection:
- 1. Check token: If
NEMO_TOKEN env var is set, use it. Otherwise: - Read or generate Client-ID:
- Read
~/.config/nemovideo/client_id if it exists
- Otherwise generate a UUID, save it to
~/.config/nemovideo/client_id
- 3. Acquire anonymous token:
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
Store the returned
token as
NEMO_TOKEN for this session. You get 100 free credits.
- 4. Create a session (§3.0) so you're ready to work immediately.
Let the user know briefly: "Setting things up… ready!" then proceed with their request.
NemoSubtitle Skill
A subtitle-focused AI agent skill that automates caption generation, translation, SRT export, and subtitle burning for any video. Powered by the NemoVideo API.
Description
Use NemoSubtitle when the user wants to add subtitles, generate captions, transcribe video to text, translate subtitles, burn hardcoded captions, or export SRT/VTT files from any video or audio. Powered by NemoVideo AI.
NemoSubtitle handles the complete subtitle and caption workflow end-to-end: auto-transcribe speech in any language, translate captions, export SRT / VTT / plain text, and burn styled subtitles directly into the video — no manual timecoding required.
Trigger phrases: add subtitles, generate captions, auto caption, transcribe video, video to text, subtitle translation, burn subtitles, hardcode captions, export SRT, SRT file, VTT file, bilingual subtitles, Chinese subtitles, subtitle generator, caption generator, video transcript, speech to text video, closed captions, open captions, subtitle burner, add captions to video, subtitle video, caption video
Primary use cases:
- - Auto-generate subtitles from video/audio (any language, Whisper-powered ASR)
- Translate existing subtitles or auto-generated captions into any language
- Export subtitles as SRT, VTT, or plain text transcript
- Burn (hardcode) subtitles into video with full style control (font, size, position, color, background)
- Add Chinese subtitles to English videos (or vice versa) — bilingual caption overlay
- Clean up auto-generated captions (re-timing, punctuation fix, speaker diarization)
- Generate accessibility captions for social media (YouTube, TikTok, Instagram, Facebook)
Not for: generating new video from text, screen recording, or full video production workflows (see nemo-video skill for general editing).
Setup
Base URL: INLINECODE5
All requests require:
CODEBLOCK1
Set NEMOVIDEO_API_KEY in your environment or OpenClaw secrets.
Workflow Overview
CODEBLOCK2
API Reference
1. Upload Media
Upload a video or audio file before processing.
CODEBLOCK3
Response:
CODEBLOCK4
Use file_id in subsequent requests.
2. Generate Subtitles (Transcription)
Transcribe speech and produce time-coded subtitles.
CODEBLOCK5
Parameters:
| Parameter | Type | Required | Description |
|---|
| INLINECODE8 | string | ✅ | Uploaded file ID |
| INLINECODE9 |
string | ✅ | Source language code (
en,
zh,
ja,
ko,
es,
fr,
de,
auto) — use
auto for auto-detect |
|
output_formats | array | ✅ | One or more of:
srt,
vtt,
txt,
json |
|
options.punctuate | bool | ❌ | Auto-add punctuation (default: true) |
|
options.max_chars_per_line | int | ❌ | Max characters per subtitle line (default: 42) |
|
options.max_lines | int | ❌ | Max lines per subtitle block (default: 2) |
Response:
{
"job_id": "job_gen_789",
"status": "queued",
"estimated_seconds": 45
}
3. Translate Subtitles
Translate existing subtitles (from SRT/VTT file or a prior generation job) into a target language.
CODEBLOCK7
Or translate from an uploaded SRT file:
CODEBLOCK8
Parameters:
| Parameter | Type | Required | Description |
|---|
| INLINECODE27 | string | ✅ | INLINECODE28 or INLINECODE29 |
| INLINECODE30 |
string | ✅ | Reference to subtitle source |
|
target_language | string | ✅ | Target language code |
|
preserve_timing | bool | ❌ | Keep original timestamps (default: true) |
|
output_formats | array | ✅ | Output format(s) |
Supported language codes: en, zh, zh-tw, ja, ko, es, fr, de, pt, ar, ru, hi, vi, th, INLINECODE48
Response:
{
"job_id": "job_translate_321",
"status": "queued",
"estimated_seconds": 30
}
4. Burn Subtitles Into Video
Hardcode (burn) subtitles permanently into the video frame.
CODEBLOCK10
Or burn from an SRT file:
CODEBLOCK11
Style parameters:
| Parameter | Default | Description |
|---|
| INLINECODE49 | INLINECODE50 | Font family |
| INLINECODE51 |
28 | Font size in pixels |
|
color |
"#FFFFFF" | Text color (hex) |
|
outline_color |
"#000000" | Outline/shadow color |
|
outline_width |
2 | Outline thickness (0 = no outline) |
|
position |
"bottom" |
"top" or
"bottom" |
|
margin_bottom |
40 | Pixels from bottom edge |
|
background_box |
false | Add semi-transparent background box |
|
background_opacity |
0.5 | Box opacity (0.0–1.0) |
Response:
{
"job_id": "job_burn_654",
"status": "queued",
"estimated_seconds": 90
}
5. Bilingual Subtitles (Dual-Language Overlay)
Burn two subtitle tracks simultaneously (e.g., English on top, Chinese below).
CODEBLOCK13
Response:
{
"job_id": "job_bilingual_987",
"status": "queued",
"estimated_seconds": 120
}
6. Poll Job Status
All subtitle jobs are async. Poll until status is completed or failed.
CODEBLOCK15
Response (in progress):
CODEBLOCK16
Response (completed):
CODEBLOCK17
Response (failed):
CODEBLOCK18
Polling strategy:
- - Check every 3–5 seconds for short files (<60s audio)
- Check every 10–15 seconds for longer files
- Timeout after 10 minutes; surface error to user
7. Download Output File
Download SRT, VTT, transcript, or rendered video from the URL in the job outputs.
CODEBLOCK19
No auth required (CDN URLs are pre-signed, valid for 24 hours).
Common Workflows
Workflow A: Generate + Export SRT
CODEBLOCK20
Workflow B: Translate Existing Subtitles
CODEBLOCK21
Workflow C: Full Pipeline (Video → Chinese Hardcoded Subtitles)
CODEBLOCK22
Workflow D: Bilingual Video (EN + ZH)
CODEBLOCK23
Error Handling
| HTTP Code | Meaning | Action |
|---|
| 400 | Bad request (missing params, invalid format) | Check request body; surface error to user |
| 401 |
Invalid or missing API key | Prompt user to check
NEMOVIDEO_API_KEY |
| 413 | File too large | Advise user: compress video or use a shorter clip |
| 422 | Unsupported format or language | List supported formats/languages to user |
| 429 | Rate limited | Wait 10s; retry with exponential backoff |
| 500/503 | Server error | Retry after 30s; if persistent, report to user |
For job.status === "failed", always surface error message to user with a plain explanation.
Behavior Notes
- - Language auto-detection: Use
language: "auto" when the source language is unknown. Detection adds ~5s to job time. - Chinese variants: Use
zh for Simplified Chinese, zh-tw for Traditional Chinese. - SRT vs VTT: SRT is more universally compatible; VTT is needed for web players (HTML5
<track>). When in doubt, offer both. - Burn timing: Burning subtitles re-encodes the video, which takes longer than SRT export alone. Set user expectations accordingly.
- File limits: Typical limit is 2GB per upload, 4 hours max duration. Very long files should be split first.
- Output expiry: CDN output URLs expire after 24 hours. If the user needs permanent storage, advise downloading promptly.
Example Prompts → Actions
| User says | Action |
|---|
| "Add subtitles to my video" | Upload → generate (language: auto) → burn → return MP4 |
| "Generate SRT for this video" |
Upload → generate (formats: ["srt"]) → return SRT file |
| "Translate my subtitles to Spanish" | Upload SRT → translate (target: "es") → return SRT |
| "Add Chinese subtitles to this English video" | Upload → generate (en) → translate (zh) → burn → return MP4 |
| "Make bilingual English + Chinese subtitles" | Upload → generate (en) → translate (zh) → burn-bilingual → return MP4 |
| "Get the transcript from this podcast" | Upload → generate (formats: ["txt"]) → return transcript |
| "Export captions as VTT" | Upload → generate (formats: ["vtt"]) → return VTT |
| "Burn captions with white text, black outline" | Upload + provide SRT → burn (style: color #FFF, outline #000) → return MP4 |
0. 首次接触
当用户打开此技能或发送第一条消息时,立即问候他们:
📝 准备好使用 nemo subtitle!只需向我发送视频或描述您的项目。
尝试说:
重要提示:首次接触时务必主动问候用户。在连接过程中告知用户正在设置。首次接触时务必主动问候用户。
自动设置
当用户首次交互时,设置连接:
- 1. 检查令牌:如果设置了 NEMO_TOKEN 环境变量,则使用它。否则:
- 读取或生成客户端ID:
- 如果存在,读取 ~/.config/nemovideo/client_id
- 否则生成一个UUID,保存到 ~/.config/nemovideo/client_id
- 3. 获取匿名令牌:
bash
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
将返回的 token 存储为本会话的 NEMO_TOKEN。您将获得100个免费积分。
- 4. 创建会话(§3.0),以便立即开始工作。
简要告知用户:正在设置…准备就绪!然后继续处理他们的请求。
NemoSubtitle 技能
一个专注于字幕的AI代理技能,可自动生成字幕、翻译、导出SRT以及为任何视频烧录字幕。由NemoVideo API提供支持。
描述
当用户想要添加字幕、生成字幕、将视频转录为文本、翻译字幕、烧录硬编码字幕或从任何视频或音频中导出SRT/VTT文件时,请使用NemoSubtitle。由NemoVideo AI提供支持。
NemoSubtitle端到端处理完整的字幕工作流程:自动转录任何语言的语音、翻译字幕、导出SRT / VTT / 纯文本,以及将样式化字幕直接烧录到视频中 — 无需手动时间码。
触发短语: 添加字幕、生成字幕、自动字幕、转录视频、视频转文本、字幕翻译、烧录字幕、硬编码字幕、导出SRT、SRT文件、VTT文件、双语字幕、中文字幕、字幕生成器、字幕生成器、视频转录、语音转文本视频、隐藏式字幕、开放式字幕、字幕烧录器、为视频添加字幕、字幕视频、字幕视频
主要用例:
- - 从视频/音频自动生成字幕(任何语言,Whisper驱动的ASR)
- 将现有字幕或自动生成的字幕翻译成任何语言
- 将字幕导出为SRT、VTT或纯文本转录
- 将字幕烧录(硬编码)到视频中,完全控制样式(字体、大小、位置、颜色、背景)
- 为英文视频添加中文字幕(反之亦然)— 双语字幕叠加
- 清理自动生成的字幕(重新计时、标点修正、说话人分离)
- 为社交媒体生成无障碍字幕(YouTube、TikTok、Instagram、Facebook)
不适用于: 从文本生成新视频、屏幕录制或完整的视频制作工作流程(常规编辑请参见nemo-video技能)。
设置
基础URL: https://mega-api-prod.nemovideo.ai
所有请求都需要:
Authorization: Bearer APIKEY>
Content-Type: application/json
在您的环境或OpenClaw密钥中设置 NEMOVIDEOAPIKEY。
工作流程概述
上传视频/音频 → 启动字幕任务 → 轮询状态 → 获取SRT/VTT/转录
→ 将字幕烧录到视频中
API参考
1. 上传媒体
在处理前上传视频或音频文件。
http
POST /v1/upload
Content-Type: multipart/form-data
file=
响应:
json
{
fileid: fabc123,
filename: interview.mp4,
duration_seconds: 182,
size_bytes: 45000000
}
在后续请求中使用 file_id。
2. 生成字幕(转录)
转录语音并生成带时间码的字幕。
http
POST /v1/subtitles/generate
Content-Type: application/json
{
fileid: fabc123,
language: en,
output_formats: [srt, vtt, txt],
options: {
punctuate: true,
maxcharsper_line: 42,
max_lines: 2
}
}
参数:
| 参数 | 类型 | 必需 | 描述 |
|---|
| file_id | 字符串 | ✅ | 上传的文件ID |
| language |
字符串 | ✅ | 源语言代码(en、zh、ja、ko、es、fr、de、auto)— 使用 auto 自动检测 |
| output_formats | 数组 | ✅ | 一个或多个:srt、vtt、txt、json |
| options.punctuate | 布尔值 | ❌ | 自动添加标点(默认:true) |
| options.max
charsper_line | 整数 | ❌ | 每行字幕最大字符数(默认:42) |
| options.max_lines | 整数 | ❌ | 每个字幕块最大行数(默认:2) |
响应:
json
{
jobid: jobgen_789,
status: queued,
estimated_seconds: 45
}
3. 翻译字幕
将现有字幕(来自SRT/VTT文件或之前的生成任务)翻译成目标语言。
http
POST /v1/subtitles/translate
Content-Type: application/json
{
source: {
type: job_id,
value: jobgen789
},
target_language: zh,
preserve_timing: true,
output_formats: [srt]
}
或者从上传的SRT文件翻译:
json
{
source: {
type: file_id,
value: fsrt456
},
target_language: zh,
preserve_timing: true,
output_formats: [srt, vtt]
}
参数:
| 参数 | 类型 | 必需 | 描述 |
|---|
| source.type | 字符串 | ✅ | jobid 或 fileid |
| source.value |
字符串 | ✅ | 字幕源引用 |
| target_language | 字符串 | ✅ | 目标语言代码 |
| preserve_timing | 布尔值 | ❌ | 保留原始时间戳(默认:true) |
| output_formats | 数组 | ✅ | 输出格式 |
支持的语言代码: en、zh、zh-tw、ja、ko、es、fr、de、pt、ar、ru、hi、vi、th、id
响应:
json
{
jobid: jobtranslate_321,
status: queued,
estimated_seconds: 30
}
4. 将字幕烧录到视频中
将字幕永久硬编码(烧录)到视频帧中。
http
POST /v1/subtitles/burn
Content-Type: application/json
{
fileid: fabc123,
subtitle_source: {
type: job_id,
value: jobgen789
},
style: {
font: Arial,
font_size: 28,
color: #FFFFFF,
outline_color: #000000,
outline_width: 2,
position: bottom,
margin_bottom: 40
},
output_format: mp4
}
或者从SRT文件烧录:
json
{
fileid: fabc123,
subtitle_source: {
type: file_id,
value: fsrt456
},
style: { ... }
}
样式参数:
| 参数 | 默认值 | 描述 |
|---|
| font | Arial | 字体系列 |
| font_size |
28 | 字体大小(像素) |
| color | #FFFFFF | 文本颜色(