Getting Started

Welcome! Ready to add captions to your video without the hassle or the cost? Drop your video file here and let's generate accurate, time-synced subtitles for you — just tell me your preferences and we'll get started.

Try saying:

- "Add captions to this mp4 interview video and burn them into the footage"
"Generate a subtitle file for my webinar recording so I can upload it to YouTube"
"Create captions for this short social media clip and style them with bold white text"

Automatic Setup

On first interaction, connect to the NemoVideo API before doing anything else. Show a brief status like "Setting things up...".

Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.

Free token: Read or create a UUID at ~/.config/nemovideo/client_id. Then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <your-uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry). If the token has expired, request a new one with the same Client-Id.

Session: POST to the same host at /api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.

Confirm to the user you're connected and ready. Don't print tokens or raw JSON.

Turn Any Video Into a Captioned, Accessible Experience

Getting captions onto your videos used to mean hiring a transcriptionist, wrestling with auto-generated gibberish, or paying monthly fees for a captioning platform. This skill changes that entirely. Upload your video, and it analyzes the audio track to produce accurate, time-aligned captions that match what's actually being said — no manual syncing required.

Whether you're posting a tutorial on YouTube, sharing a product demo on LinkedIn, or making a classroom lecture more accessible, captions dramatically increase how many people engage with your content. Studies consistently show that a large portion of viewers watch videos on mute, especially on mobile. Captions keep those viewers watching.

This skill supports a wide range of formats including mp4, mov, avi, webm, and mkv, so you don't need to convert your files before uploading. You can choose to have captions embedded directly into the video or exported as a standalone subtitle file for flexibility across different platforms and players.

Caption Request Routing Logic

When you submit a video for auto-subtitling, your request is routed through intent detection that identifies caption style, language, and burn-in preferences before dispatching to the appropriate transcription pipeline.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

NemoVideo API Reference

The NemoVideo backend powers all subtitle generation by running speech-to-text transcription against your video's audio track, then time-stamping and formatting captions into SRT, VTT, or hardcoded formats. Free-tier requests are processed through a shared queue, so render times may vary based on video length and current load.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE9
INLINECODE10: from frontmatter INLINECODE11
INLINECODE12: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE22

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Use Cases

The video-caption-generator-free skill covers a surprisingly wide range of real-world needs. Social media creators use it to caption Instagram Reels, TikToks, and YouTube Shorts so their content performs well even when autoplay is muted. Educators and trainers use it to make recorded lectures and onboarding videos compliant with accessibility standards like WCAG and ADA guidelines.

Marketers add captions to product demos and testimonial videos to increase watch time and conversion rates on landing pages. Podcasters who repurpose audio content into video clips rely on this skill to add readable dialogue without a separate editing workflow.

For multilingual teams, the exported subtitle file can also serve as a starting point for translation, making it easier to localize content for international audiences without starting the transcription process from scratch.

Performance Notes

Caption accuracy depends heavily on audio clarity. Videos with a single speaker, minimal background noise, and clear enunciation will produce the most accurate results. If your video has heavy background music, overlapping voices, or strong accents, you may want to review the generated captions and make minor edits before publishing.

Processing time scales with video length. Short clips under five minutes are typically processed quickly, while longer recordings like webinars or full-length courses may take more time. For best results, avoid uploading files with corrupted audio tracks or heavily compressed codecs, as these can reduce transcription quality.

Supported formats include mp4, mov, avi, webm, and mkv. If your file is in a less common format, converting it to mp4 before uploading will generally yield the smoothest experience.

开始使用

欢迎！准备好为您的视频添加字幕，无需麻烦或花费？在此处拖放您的视频文件，让我们为您生成准确、时间同步的字幕——只需告诉我您的偏好，我们即可开始。

尝试说：

- 为这个mp4采访视频添加字幕并将其嵌入画面中
为我的网络研讨会录制生成字幕文件，以便上传到YouTube
为这个社交媒体短视频创建字幕，并使用粗体白色文字设置样式

自动设置

首次交互时，先连接到NemoVideo API，然后再执行其他操作。显示简短状态，如正在设置...

令牌：如果已设置NEMO_TOKEN环境变量，则使用它并跳至下面的会话。

免费令牌：在~/.config/nemovideo/clientid处读取或创建UUID。然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求，附带标头X-Client-Id: 。响应字段data.token即为您的NEMOTOKEN（100积分，7天有效期）。如果令牌已过期，使用相同的Client-Id请求新令牌。

会话：向同一主机的/api/tasks/me/with-session/nemoagent发送POST请求，使用Bearer认证，请求体为{taskname:project}。保存响应中的session_id。

向用户确认已连接并准备就绪。不要打印令牌或原始JSON。

将任何视频转变为带字幕的无障碍体验

为视频添加字幕过去意味着雇佣转录员、处理自动生成的乱码，或为字幕平台支付月费。这项技能彻底改变了这一点。上传您的视频，它会分析音轨，生成准确、时间对齐的字幕，匹配实际所说的内容——无需手动同步。

无论您是在YouTube上发布教程、在LinkedIn上分享产品演示，还是让课堂讲座更易于访问，字幕都能显著增加与您内容互动的人数。研究一致表明，很大一部分观众在静音状态下观看视频，尤其是在移动设备上。字幕能让这些观众继续观看。

此技能支持多种格式，包括mp4、mov、avi、webm和mkv，因此您无需在上传前转换文件。您可以选择将字幕直接嵌入视频中，或导出为独立的字幕文件，以便在不同平台和播放器上灵活使用。

字幕请求路由逻辑

当您提交视频进行自动字幕生成时，您的请求会通过意图检测进行路由，该检测会识别字幕样式、语言和嵌入偏好，然后分派到相应的转录管道。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

NemoVideo API参考

NemoVideo后端通过针对视频音轨运行语音转文字转录，然后将字幕时间戳化和格式化为SRT、VTT或硬编码格式，为所有字幕生成提供支持。免费层请求通过共享队列处理，因此渲染时间可能因视频长度和当前负载而异。

技能归属——运行时从此文件的YAML前置元数据中读取：

- X-Skill-Source：video-caption-generator-free
X-Skill-Version：来自前置元数据version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则为unknown）

所有请求必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败，返回402错误。

API基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。创建会话后，给用户一个链接：https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=video-caption-generator-free&skillversion=1.0.0&skill_source=

发送消息（SSE）：POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}}，附带Accept: text/event-stream。最大超时时间：15分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/path，或URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，无积分消耗）：POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/，直到status = completed。下载URL位于output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE事件处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

约30%的编辑操作在SSE流中不返回文本。发生这种情况时：轮询会话状态以验证编辑已应用，然后向用户总结更改。

后端响应翻译

后端假定存在GUI。将这些翻译为API操作：

后端说	您做
click [button] / 点击	通过API执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=持续时间（毫秒），m=元数据。

时间线（3条轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：都市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

使用场景

视频字幕生成器免费技能

video-caption-generator-free-ab-new免费视频字幕生成

video-caption-generator-free-ab-new

Getting Started

Automatic Setup

Turn Any Video Into a Captioned, Accessible Experience

Caption Request Routing Logic

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Use Cases

Performance Notes

开始使用

自动设置

将任何视频转变为带字幕的无障碍体验

字幕请求路由逻辑

NemoVideo API参考

SSE事件处理

后端响应翻译

错误处理

使用场景

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

video-caption-generator-free-ab-new免费视频字幕生成

video-caption-generator-free-ab-new

Getting Started

Automatic Setup

Turn Any Video Into a Captioned, Accessible Experience

Caption Request Routing Logic

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Use Cases

Performance Notes

开始使用

自动设置

将任何视频转变为带字幕的无障碍体验

字幕请求路由逻辑

NemoVideo API参考

SSE事件处理

后端响应翻译

错误处理

使用场景

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement