Getting Started
Welcome! Ready to add subtitles to your video without spending anything? Upload your file and tell me what you need — whether it's burning captions directly into the video or exporting a separate subtitle file — and the auto-subtitle-generator-free skill will take care of the rest.
Try saying:
- - "Generate subtitles for this mp4 interview video and export them as an SRT file."
- "Add burned-in captions to my YouTube tutorial — the speaker has a slight accent so please be accurate."
- "Create subtitles for this webinar recording and split lines so no caption stays on screen longer than 3 seconds."
First-Time Connection
When a user first opens this skill, connect to the NemoVideo backend automatically. Briefly let them know (e.g. "Connecting to NemoVideo...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.
- 1. Client identifier: Look for
~/.config/nemovideo/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking. - Obtain a free token: POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Captions for Every Video, Zero Cost Attached
Most subtitle tools hide their best features behind a paywall. This skill flips that entirely. The auto-subtitle-generator-free skill listens to the spoken content in your video, converts it into precisely timed captions, and gives you ready-to-use subtitles — all without asking for a credit card.
Upload your video, and the skill gets to work detecting speech, segmenting it into readable chunks, and aligning each line to the correct timestamp. The result is a clean subtitle track that matches the natural rhythm of conversation, not awkward walls of text that disappear before you can read them.
This is especially useful for content creators who publish across multiple platforms, educators building accessible course materials, and small business owners who can't justify expensive captioning services. Whether your video is a 30-second Instagram clip or a 90-minute webinar recording, the skill handles the heavy lifting so you can focus on the content itself.
Routing Your Caption Requests
When you submit a video for auto subtitling, ClawHub parses your input—whether it's a direct upload, a URL, or a file path—and routes the transcription job to the appropriate NemoVideo processing endpoint based on file format, language hint, and subtitle output preference.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
The NemoVideo backend uses speech-to-text inference to generate frame-accurate SRT, VTT, or burned-in subtitle tracks from your video's audio stream. Subtitle timing, punctuation restoration, and multi-language detection are all handled server-side, so no local processing power is required on your end.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE11 - INLINECODE12 : from frontmatter INLINECODE13
- INLINECODE14 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE24
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE29
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE35
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE39
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE43
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Troubleshooting
If captions appear out of sync with the audio, the most common cause is background music or overlapping speech competing with the main voice track. Try uploading a version of the video with reduced background noise, or specify which speaker the subtitles should follow if there are multiple voices.
For videos with heavy accents, technical jargon, or industry-specific terminology, subtitle accuracy improves when you provide a brief context note — for example, mentioning that the video covers medical procedures or software engineering topics helps the skill prioritize correct terminology.
If subtitle lines feel too long or flash by too quickly, request a specific maximum character count per line (typically 42 characters works well for most screens) or a minimum display duration per caption. These small adjustments make a significant difference in readability across different screen sizes.
MKV and AVI files occasionally have audio tracks encoded in less common formats. If a file fails to process, converting it to mp4 first using any free converter usually resolves the issue immediately.
Use Cases
Content creators use this skill to make Reels, TikToks, and YouTube Shorts more engaging — studies consistently show that captioned videos hold viewer attention longer, especially when autoplay is muted.
Educators and e-learning developers rely on the auto-subtitle-generator-free skill to meet accessibility requirements without purchasing expensive transcription software. Captioned lecture recordings and training videos are often legally required in academic and corporate settings.
Small business owners producing product demos, testimonial videos, or explainer content can caption their entire video library without outsourcing to a transcription service. Marketers repurposing long-form webinar content into short clips also benefit, since each clip gets its own accurate subtitle track automatically.
Non-English speakers producing content in languages other than English will find the skill useful for generating subtitles that can later be translated, making international content distribution significantly more straightforward.
Common Workflows
The most common workflow starts with uploading a video file — mp4, mov, avi, webm, or mkv — and choosing whether you want subtitles burned into the video itself or exported as a standalone SRT or VTT file. Burned-in captions are ideal for social media platforms that don't support external subtitle tracks, while exported files work better for YouTube, Vimeo, or video players that handle them natively.
Another popular workflow involves batch processing: uploading multiple short clips from the same series and generating consistent subtitle styling across all of them. This is common for podcast highlight reels, course module videos, and social content repurposing.
For longer recordings like webinars or interviews, users often request subtitle segmentation adjustments — breaking lines at natural pauses rather than at fixed word counts. You can specify preferences like maximum characters per line or maximum seconds per caption block, and the skill will apply those rules throughout the entire video.
开始使用
欢迎!准备好免费为你的视频添加字幕了吗?上传文件并告诉我你的需求——无论是将字幕直接嵌入视频,还是导出独立的字幕文件——自动字幕生成器免费技能将为你处理其余工作。
试试这样说:
- - 为这个MP4采访视频生成字幕,并导出为SRT文件。
- 为我的YouTube教程添加嵌入字幕——说话者带有轻微口音,请确保准确。
- 为这个网络研讨会录制视频创建字幕,并拆分字幕行,确保每条字幕在屏幕上停留不超过3秒。
首次连接
当用户首次打开此技能时,自动连接到NemoVideo后端。简要告知用户(例如:正在连接到NemoVideo...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,则跳至步骤3。
- 1. 客户端标识符:查找~/.config/nemovideo/clientid。如果不存在,生成一个UUID并保存到该文件中。此文件仅存储用于速率限制跟踪的UUID。
- 获取免费令牌:向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头设置X-Client-Id为你的标识符。响应中的data.token即为你的NEMOTOKEN——100个免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,设置Authorization: Bearer 、Content-Type: application/json,请求体为{taskname:project,language:<检测到的语言>}。保存返回的session_id用于所有后续请求。
保持设置沟通简洁。不要向用户显示原始API响应或令牌值。
为每个视频添加字幕,零成本
大多数字幕工具将其最佳功能隐藏在付费墙后。此技能彻底颠覆了这一点。自动字幕生成器免费技能会听取视频中的语音内容,将其转换为精确计时的字幕,并为你提供可直接使用的字幕——全程无需信用卡。
上传你的视频,技能便开始工作:检测语音、将其分割为可读的片段,并将每一行对齐到正确的时间戳。最终得到一条干净的字幕轨道,与对话的自然节奏相匹配,而不是在你阅读之前就消失的笨重文本块。
这对于在多个平台发布内容的内容创作者、构建无障碍课程材料的教育工作者,以及无法承担昂贵字幕服务的小企业主尤其有用。无论你的视频是30秒的Instagram片段还是90分钟的网络研讨会录制,该技能都能处理繁重的工作,让你专注于内容本身。
路由你的字幕请求
当你提交视频进行自动字幕生成时,ClawHub会解析你的输入——无论是直接上传、URL还是文件路径——并根据文件格式、语言提示和字幕输出偏好,将转录任务路由到相应的NemoVideo处理端点。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API参考
NemoVideo后端使用语音转文本推理,从视频的音频流生成帧精确的SRT、VTT或嵌入字幕轨道。字幕计时、标点恢复和多语言检测均在服务器端处理,因此你无需本地处理能力。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:auto-subtitle-generator-free
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=auto-subtitle-generator-free&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}},设置Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/路径,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假定存在GUI。将其翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过匿名令牌重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从创建会话或状态响应获取)。已注册用户:请前往nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成客户端ID并重试(参见§1) |
| 402 | 免费计划导出