Getting Started
Welcome! I'm here to help you generate accurate, well-timed subtitles for your movie or video — just share your file and tell me what you need, and let's get your subtitles created right away.
Try saying:
- - "Generate subtitles for my short film uploaded in mp4 format and sync them to the dialogue timing"
- "Create English subtitles for this interview video and format them for YouTube upload"
- "Transcribe the spoken audio from my documentary and export the subtitles as an SRT file"
First-Time Connection
When a user first opens this skill, connect to the NemoVideo backend automatically. Briefly let them know (e.g. "Connecting to NemoVideo...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.
- 1. Client identifier: Look for
~/.config/nemovideo/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking. - Obtain a free token: POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Turn Any Movie Into a Fully Subtitled Experience
Getting subtitles right is one of the most time-consuming parts of post-production — and one of the most important. Whether you're a filmmaker preparing a festival submission, a YouTube creator trying to reach a global audience, or an educator making video content more accessible, subtitles make the difference between content that connects and content that gets skipped.
This skill listens to your video's audio track and generates accurately timed subtitles that follow the natural rhythm of speech. It doesn't just dump text on screen — it breaks dialogue into readable chunks, respects pauses, and keeps lines short enough to read comfortably without missing the action.
You can use it for short films, full-length movies, documentary footage, interviews, online courses, or any video where spoken words need to appear on screen. The result is a clean subtitle track you can embed directly or export as a standalone file, ready to drop into your editing timeline or video platform.
Subtitle Request Routing Logic
Every subtitle request — whether you're generating SRT files, syncing dialogue timecodes, or translating captions into a target language — is parsed and routed to the appropriate NemoVideo processing pipeline based on media type, language pair, and output format.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
The NemoVideo backend handles frame-accurate speech detection, forced alignment, and multi-language subtitle rendering, returning subtitle tracks in SRT, VTT, or ASS formats with precise in/out timecodes. All transcription and translation calls are authenticated via bearer token and processed asynchronously for longer video files.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE11 - INLINECODE12 : from frontmatter INLINECODE13
- INLINECODE14 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE24
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE29
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE35
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE39
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE43
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Quick Start Guide
Getting started with the movie-subtitle-generator is straightforward. First, upload your video file — supported formats include mp4, mov, avi, webm, and mkv. Then describe what you need: the language of the dialogue, your preferred subtitle style (standard, SDH for hearing impaired, or clean captions), and your output format preference such as SRT, VTT, or embedded text.
Once processing is complete, review the generated subtitles for any proper nouns, technical terms, or stylized dialogue that may need a quick manual tweak. Most transcriptions are highly accurate, but names, slang, and accented speech occasionally benefit from a light review pass.
Finally, download your subtitle file and import it into your video editor, streaming platform, or media player. If you're uploading to YouTube or Vimeo, the SRT format works universally. For DCP or broadcast delivery, request the appropriate format in your prompt and the skill will tailor the output accordingly.
Tips and Tricks
For the most accurate subtitle output, make sure your video has clear audio with minimal background noise. If your film has overlapping dialogue or heavy ambient sound, consider providing a clean audio mix alongside the video file — this significantly improves transcription accuracy.
If your movie includes multiple speakers, you can request speaker-labeled subtitles so viewers can tell who is talking, which is especially useful for interview-style documentaries or multi-character scenes.
When working with foreign-language films, specify the source language upfront so the skill can transcribe correctly rather than defaulting to English detection. You can also request translated subtitles in a second language in the same session — just ask for both in one prompt.
For long-form content like feature films, breaking the video into scenes or chapters before uploading can give you more granular control over subtitle timing and formatting per section.
Best Practices
Always aim for subtitle lines that are no longer than 42 characters per line and no more than two lines on screen at once. When prompting the skill, you can specify these formatting preferences explicitly to get broadcast-ready results from the start.
For narrative films, avoid placing subtitle cuts in the middle of a sentence whenever possible. Ask the skill to align subtitle breaks with natural speech pauses or punctuation — this keeps the reading experience smooth and doesn't pull viewers out of the story.
If your movie contains song lyrics, sound effects, or off-screen narration that need to be captioned differently from standard dialogue, mention this in your prompt. The skill can handle SDH-style formatting that distinguishes spoken words from audio descriptions, which is essential for accessibility compliance.
Finally, always do a final sync check by playing back your video with the generated subtitles before publishing. Even highly accurate auto-generated subtitles benefit from a human review pass, especially around scene transitions where timing can occasionally drift by a fraction of a second.
开始使用
欢迎!我在这里帮助你为电影或视频生成准确、时间精准的字幕——只需分享你的文件并告诉我你的需求,让我们立即开始创建你的字幕。
尝试说:
- - 为我的MP4格式短片生成字幕,并将其与对话时间同步
- 为这个采访视频创建英文字幕,并格式化为YouTube上传格式
- 转录我纪录片中的口语音频,并将字幕导出为SRT文件
首次连接
当用户首次打开此技能时,自动连接到NemoVideo后端。简要告知用户(例如正在连接到NemoVideo...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,跳至第3步。
- 1. 客户端标识符:查找~/.config/nemovideo/clientid。如果不存在,生成一个UUID并保存到该文件。此文件仅存储用于速率限制跟踪的UUID。
- 获取免费令牌:向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,头部设置X-Client-Id为你的标识符。响应中的data.token即为你的NEMOTOKEN——100个免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,包含Authorization: Bearer 、Content-Type: application/json以及请求体{taskname:project,language:}。存储返回的session_id用于所有后续请求。
保持设置沟通简洁。不要向用户显示原始API响应或令牌值。
将任何电影变成完整的字幕体验
正确制作字幕是后期制作中最耗时的环节之一——也是最重要的环节之一。无论你是准备电影节投稿的电影制作人、试图触达全球观众的YouTube创作者,还是希望让视频内容更易获取的教育工作者,字幕决定了内容是引人入胜还是被跳过。
此技能会听取视频的音轨,并生成遵循自然语音节奏的精准时间字幕。它不仅仅将文本堆叠在屏幕上——它会将对话分解为可读的片段,尊重停顿,并保持每行足够短,让观众在不错过画面的情况下舒适阅读。
你可以将其用于短片、长片电影、纪录片素材、采访、在线课程或任何需要口语文字出现在屏幕上的视频。结果是干净的字幕轨道,你可以直接嵌入或导出为独立文件,随时放入你的编辑时间线或视频平台。
字幕请求路由逻辑
每个字幕请求——无论是生成SRT文件、同步对话时间码,还是将字幕翻译为目标语言——都会根据媒体类型、语言对和输出格式进行解析并路由到相应的NemoVideo处理管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API参考
NemoVideo后端处理帧级精确的语音检测、强制对齐和多语言字幕渲染,以SRT、VTT或ASS格式返回带有精确入/出时间码的字幕轨道。所有转录和翻译调用均通过Bearer令牌进行身份验证,并对较长的视频文件进行异步处理。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:movie-subtitle-generator
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头部将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=&task=id>&session=id>&skillname=movie-subtitle-generator&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},包含Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假定存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示带有?bind=
的注册URL(需要时从create-session或state响应获取)。已注册用户:请在nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请在nemovideo.ai注册以解锁导出功能