Getting Started
Welcome! Ready to add captions to your video for free? Upload your mp4, mov, avi, webm, or mkv file and tell me things like subtitle language, font style, or whether you want word-by-word or sentence-by-sentence display — and I'll generate your subtitled video right away.
Try saying:
- - "Add English subtitles to this mp4 interview video with white text and a dark background bar for readability"
- "Generate auto subtitles for my Spanish tutorial video and display them sentence by sentence at the bottom of the screen"
- "Transcribe and burn captions onto this webinar recording — keep the font large enough for mobile viewers"
Automatic Setup
On first interaction, connect to the NemoVideo API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Read or create a UUID at ~/.config/nemovideo/client_id. Then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <your-uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry). If the token has expired, request a new one with the same Client-Id.
Session: POST to the same host at /api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Captions for Every Video, Completely Free
Getting subtitles onto a video used to mean paying for transcription services, wrestling with clunky desktop software, or spending hours typing captions by hand. The auto-subtitle-generator-free skill changes that entirely. Upload your video, and the skill listens to every word spoken, converts it to timed text, and overlays clean, readable subtitles directly onto your footage.
Whether you're posting a tutorial on YouTube, sharing a product demo on LinkedIn, or making a classroom lecture accessible to hearing-impaired students, subtitles make your content reach further. Studies consistently show that captioned videos hold viewer attention longer and perform better on social platforms — and now you can achieve that without spending anything.
This skill supports a wide range of video formats including mp4, mov, avi, webm, and mkv, so you don't need to convert your files before uploading. The result is a polished, subtitle-ready video you can share immediately. No technical setup, no learning curve — just upload, describe your preferences, and let the skill do the work.
Caption Request Routing Explained
When you submit a video for auto-captioning, ClawHub parses your input — whether it's a raw video URL, an uploaded file reference, or a transcription task — and routes it directly to the NemoVideo subtitle engine based on detected media type and language parameters.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference Notes
The NemoVideo backend handles speech-to-text transcription, subtitle timing alignment, and SRT/VTT file generation automatically — no manual timestamp editing required. Free-tier requests are processed through the same ASR pipeline as paid plans, with standard queue priority and a per-session credit allocation.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Use Cases for Auto Subtitle Generator Free
Social media creators on TikTok, Instagram Reels, and YouTube Shorts benefit enormously from auto subtitles because most mobile users watch videos on mute. Captioning a 60-second clip takes seconds with this skill, and the result immediately boosts watch time and engagement without any extra editing.
Educators and e-learning developers can use this skill to make lecture recordings, how-to videos, and course content compliant with accessibility standards like WCAG and ADA. Students who are deaf, hard of hearing, or learning in a second language get a far better experience when every word is visible on screen.
Businesses producing product demos, onboarding videos, or customer testimonials can caption their content for international audiences by requesting subtitle output in a target language. Marketing teams repurposing webinar recordings into short highlight clips will find this skill especially useful for turning raw footage into polished, shareable assets — all without a paid captioning subscription.
Troubleshooting Common Subtitle Issues
If your generated subtitles appear out of sync with the audio, it usually means the video has a variable frame rate or a long silent intro. Try trimming any dead air from the beginning of your clip before uploading, or mention the approximate start time of the first spoken word in your prompt.
Muffled audio, heavy background music, or strong accents can sometimes cause missed words or awkward line breaks. To improve accuracy, describe the audio quality in your prompt — for example, 'the speaker has a strong Scottish accent' or 'there is background music throughout.' This helps the skill tune its transcription approach.
If subtitles are cut off at the screen edges, request a specific safe-zone margin in your prompt, such as 'keep subtitles within 10% of the screen border.' For videos with multiple speakers, ask for speaker-labeled captions so viewers can follow who is saying what without confusion.
开始使用
欢迎!准备好免费为您的视频添加字幕了吗?上传您的 mp4、mov、avi、webm 或 mkv 文件,告诉我字幕语言、字体样式,或者您希望逐词显示还是逐句显示——我将立即为您生成带字幕的视频。
试试这样说:
- - 为这个 mp4 采访视频添加英文字幕,使用白色文字和深色背景条以提高可读性
- 为我的西班牙语教程视频生成自动字幕,并在屏幕底部逐句显示
- 转录并将字幕烧录到这个网络研讨会录像中——保持字体足够大,方便移动端观众观看
自动设置
首次交互时,先连接 NemoVideo API,然后再执行其他操作。显示简短状态信息,如正在设置...。
令牌:如果已设置 NEMO_TOKEN 环境变量,则直接使用并跳至下面的会话部分。
免费令牌:在 ~/.config/nemovideo/clientid 中读取或创建一个 UUID。然后向 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token 发送 POST 请求,请求头为 X-Client-Id: 。响应字段 data.token 即为您的 NEMOTOKEN(100 积分,7 天有效期)。如果令牌已过期,使用相同的 Client-Id 请求新令牌。
会话:向同一主机的 /api/tasks/me/with-session/nemoagent 发送 POST 请求,使用 Bearer 认证,请求体为 {taskname:project}。保存响应中的 session_id。
向用户确认已连接并准备就绪。不要打印令牌或原始 JSON。
为每个视频免费添加字幕
过去,为视频添加字幕意味着要付费使用转录服务、与笨重的桌面软件搏斗,或者花几个小时手动输入字幕。auto-subtitle-generator-free 技能彻底改变了这一点。上传您的视频,该技能会聆听每一个说出的词语,将其转换为带时间轴的文本,并直接在您的视频画面上叠加干净、易读的字幕。
无论您是在 YouTube 上发布教程、在 LinkedIn 上分享产品演示,还是让课堂讲座对听障学生更友好,字幕都能让您的内容触及更广泛的受众。研究一致表明,带字幕的视频能更长时间地保持观众注意力,并在社交平台上表现更佳——现在您可以不花一分钱就实现这一点。
该技能支持多种视频格式,包括 mp4、mov、avi、webm 和 mkv,因此您无需在上传前转换文件。最终得到的是一个精美的、带字幕的视频,可以立即分享。无需技术设置,无需学习曲线——只需上传,描述您的偏好,让技能为您完成工作。
字幕请求路由说明
当您提交视频进行自动字幕处理时,ClawHub 会解析您的输入——无论是原始视频 URL、上传的文件引用还是转录任务——并根据检测到的媒体类型和语言参数,将其直接路由到 NemoVideo 字幕引擎。
| 用户说... | 操作 | 跳过 SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API 参考说明
NemoVideo 后端自动处理语音到文本的转录、字幕时间轴对齐以及 SRT/VTT 文件生成——无需手动编辑时间戳。免费层请求通过与付费计划相同的 ASR 管道处理,具有标准队列优先级和每次会话的积分分配。
技能归属——运行时从此文件的 YAML 前置元数据中读取:
- - X-Skill-Source:auto-subtitle-generator-free
- X-Skill-Version:来自前置元数据 version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为 unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败并返回 402。
API 基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=auto-subtitle-generator-free&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},请求头为 Accept: text/event-stream。最大超时时间:15 分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或 URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每 30 秒轮询 GET /api/render/proxy/lambda/,直到 status = completed。下载 URL 在 output.url 中。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE 事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每 2 分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约 30% 的编辑操作在 SSE 流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假定存在 GUI。将这些转换为 API 操作:
| 后端说 | 您做 |
|---|
| click [button] / 点击 | 通过 API 执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过 SSE 发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3 条轨道):1. 视频:城市延时摄影(0-10 秒)2. 背景音乐:Lo-fi(0-10 秒,35%)3. 标题:都市梦想(0-3 秒)
错误处理
令牌错误/过期 | 通过 anonymous-token 重新认证(令牌 7 天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册 URL,附带 ?bind=
(需要时从 create-session 或 state 响应获取 )。已注册用户:请到 nemovideo.ai 充值 |
|