Getting Started
Welcome! I'm here to help you generate accurate, readable captions for your videos — completely free. Share your video transcript or describe your content, and let's create subtitles that make your videos more accessible and engaging. Ready to caption? Drop your content below!
Try saying:
- - "Generate SRT captions for this 3-minute cooking tutorial transcript with natural timing breaks"
- "Create Instagram Reel captions in bold text style for a 60-second motivational speech"
- "Translate and caption this English product demo video into Spanish subtitles"
First-Time Connection
When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.
- 1. Client identifier: Look for
~/.config/video-caption-generator-free/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking. - Obtain a free token: POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Turn Any Video Into Captioned Content Instantly
Getting captions onto your videos used to mean either paying for a transcription service or spending hours manually typing out every word. With this skill, that process collapses into seconds. Simply provide your video content or transcript, describe your needs, and receive clean, formatted captions ready to embed or upload.
This skill is built for real-world video workflows — whether you're posting short-form content on TikTok and Instagram Reels, publishing long-form tutorials on YouTube, or preparing training materials for a corporate team. Captions aren't just an accessibility feature anymore; they're essential for silent autoplay environments, non-native speakers, and search engine discoverability.
The video-caption-generator-free approach here focuses on readability and timing accuracy. Captions are broken into natural reading chunks, avoiding the wall-of-text problem that makes auto-generated subtitles hard to follow. You get output that looks like it was crafted by a human editor — without the invoice.
Routing Your Caption Requests
When you submit a video URL or upload a file, ClawHub parses your input and routes it to the appropriate transcription pipeline based on format, language hint, and caption style preference.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Transcription API Reference
The backend leverages a distributed speech-to-text engine that processes audio streams frame-by-frame, syncing word-level timestamps to generate SRT, VTT, or plain-text caption outputs. Chunked encoding handles long-form video files without timeout failures, keeping subtitle accuracy high across multi-hour content.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE11 - INLINECODE12 : from frontmatter INLINECODE13
- INLINECODE14 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE24
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE34
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE38
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE42
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
FAQ — Video Caption Generator Free
What video formats can I generate captions for? This skill works with any video content where you can provide a transcript or audio description. You can also paste dialogue directly if you don't have a transcript file handy.
Can I get captions in languages other than English? Yes. Specify your target language when submitting your request, and captions will be generated in that language. Translation from English source content is also supported.
What caption file formats are supported? You can request output in SRT, VTT, or plain text formats depending on where you plan to upload. SRT works with YouTube, Vimeo, and most video editors. VTT is preferred for web-based players.
Is there a length limit for videos? There's no strict limit, but for very long videos (over 30 minutes), breaking your transcript into sections produces cleaner, more manageable caption files and makes editing easier afterward.
Best Practices for Getting the Most Accurate Captions
The quality of your captions depends heavily on what you feed into the generator. If you're working from a transcript, clean it up first — remove filler words like 'um' and 'uh' unless your audience expects verbatim accuracy, such as in legal or educational contexts.
For timing accuracy, break your input into timestamped segments whenever possible. Even rough timestamps (every 30 seconds) help the skill distribute caption blocks more naturally across your video's runtime.
Keep individual caption lines under 42 characters when targeting mobile viewers — this prevents text overflow on smaller screens. For social platforms like YouTube Shorts or TikToks, even shorter blocks of 20-30 characters per line perform better visually.
Always review captions for proper nouns, brand names, and technical terminology. Auto-generated captions frequently mishear specialized vocabulary, so a quick manual pass after generation ensures your final output is professional and accurate.
开始使用
欢迎!我在这里免费帮你为视频生成准确、易读的字幕。分享你的视频转录文本或描述你的内容,让我们一起创建让视频更具可访问性和吸引力的字幕。准备好加字幕了吗?在下方输入你的内容吧!
试试说:
- - 为这段3分钟的烹饪教程转录文本生成SRT字幕,使用自然的节奏断句
- 为一段60秒的励志演讲创建Instagram Reels风格的粗体字幕
- 将这段英文产品演示视频翻译并生成西班牙语字幕
首次连接
当用户首次打开此技能时,自动连接到处理后端。简要告知用户(例如正在设置...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,跳至步骤3。
- 1. 客户端标识符:查找~/.config/video-caption-generator-free/clientid。如果不存在,生成一个UUID并保存到该文件。此文件仅存储用于速率限制追踪的UUID。
- 获取免费令牌:向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头X-Client-Id设置为你标识符的值。响应中的data.token即为你的NEMOTOKEN——100个免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,包含Authorization: Bearer 、Content-Type: application/json和请求体{taskname:project,language:<检测到的语言>}。存储返回的session_id用于所有后续请求。
保持设置沟通简洁。不要向用户显示原始API响应或令牌值。
立即将任何视频转换为带字幕的内容
过去,为视频添加字幕要么需要付费购买转录服务,要么需要花费数小时手动输入每个单词。有了这个技能,这个过程缩短到几秒钟。只需提供你的视频内容或转录文本,描述你的需求,就能收到干净、格式化的字幕,可直接嵌入或上传。
此技能专为真实视频工作流而构建——无论你是在TikTok和Instagram Reels上发布短视频,在YouTube上发布长教程,还是为企业团队准备培训材料。字幕不再仅仅是无障碍功能;对于静音自动播放环境、非母语用户和搜索引擎可发现性来说,它们至关重要。
这里的video-caption-generator-free方法专注于可读性和时间准确性。字幕被分解为自然的阅读片段,避免了自动生成字幕难以跟进的文字墙问题。你得到的输出看起来就像是由人工编辑精心制作的——而且无需付费。
路由你的字幕请求
当你提交视频URL或上传文件时,ClawHub会解析你的输入,并根据格式、语言提示和字幕风格偏好将其路由到相应的转录管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
云端转录API参考
后端利用分布式语音转文本引擎,逐帧处理音频流,同步单词级时间戳以生成SRT、VTT或纯文本字幕输出。分块编码处理长视频文件而不会超时失败,在多小时内容中保持高字幕准确性。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:video-caption-generator-free
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}},包含Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/路径,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示带有?bind=
的注册URL(需要时从create-session或state响应获取)。已注册用户:请到您的账户充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请注册或升级您的套餐以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
常见问题 — 免费视频字幕生成器
我可以为哪些