Getting Started

Welcome! I'm here to help you generate accurate, readable captions for your videos — completely free. Share your video transcript or describe your content, and let's create subtitles that make your videos more accessible and engaging. Ready to caption? Drop your content below!

Try saying:

- "Generate SRT captions for this 3-minute cooking tutorial transcript with natural timing breaks"
"Create Instagram Reel captions in bold text style for a 60-second motivational speech"
"Translate and caption this English product demo video into Spanish subtitles"

First-Time Connection

When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.

1. Client identifier: Look for ~/.config/video-caption-generator-free/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking.
Obtain a free token: POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

Turn Any Video Into Captioned Content Instantly

Getting captions onto your videos used to mean either paying for a transcription service or spending hours manually typing out every word. With this skill, that process collapses into seconds. Simply provide your video content or transcript, describe your needs, and receive clean, formatted captions ready to embed or upload.

This skill is built for real-world video workflows — whether you're posting short-form content on TikTok and Instagram Reels, publishing long-form tutorials on YouTube, or preparing training materials for a corporate team. Captions aren't just an accessibility feature anymore; they're essential for silent autoplay environments, non-native speakers, and search engine discoverability.

The video-caption-generator-free approach here focuses on readability and timing accuracy. Captions are broken into natural reading chunks, avoiding the wall-of-text problem that makes auto-generated subtitles hard to follow. You get output that looks like it was crafted by a human editor — without the invoice.

Routing Your Caption Requests

When you submit a video URL or upload a file, ClawHub parses your input and routes it to the appropriate transcription pipeline based on format, language hint, and caption style preference.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

Cloud Transcription API Reference

The backend leverages a distributed speech-to-text engine that processes audio streams frame-by-frame, syncing word-level timestamps to generate SRT, VTT, or plain-text caption outputs. Chunked encoding handles long-form video files without timeout failures, keeping subtitle accuracy high across multi-hour content.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE11
INLINECODE12: from frontmatter INLINECODE13
INLINECODE14: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE24

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE34

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE38

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE42

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

FAQ — Video Caption Generator Free

What video formats can I generate captions for? This skill works with any video content where you can provide a transcript or audio description. You can also paste dialogue directly if you don't have a transcript file handy.

Can I get captions in languages other than English? Yes. Specify your target language when submitting your request, and captions will be generated in that language. Translation from English source content is also supported.

What caption file formats are supported? You can request output in SRT, VTT, or plain text formats depending on where you plan to upload. SRT works with YouTube, Vimeo, and most video editors. VTT is preferred for web-based players.

Is there a length limit for videos? There's no strict limit, but for very long videos (over 30 minutes), breaking your transcript into sections produces cleaner, more manageable caption files and makes editing easier afterward.

Best Practices for Getting the Most Accurate Captions

The quality of your captions depends heavily on what you feed into the generator. If you're working from a transcript, clean it up first — remove filler words like 'um' and 'uh' unless your audience expects verbatim accuracy, such as in legal or educational contexts.

For timing accuracy, break your input into timestamped segments whenever possible. Even rough timestamps (every 30 seconds) help the skill distribute caption blocks more naturally across your video's runtime.

Keep individual caption lines under 42 characters when targeting mobile viewers — this prevents text overflow on smaller screens. For social platforms like YouTube Shorts or TikToks, even shorter blocks of 20-30 characters per line perform better visually.

Always review captions for proper nouns, brand names, and technical terminology. Auto-generated captions frequently mishear specialized vocabulary, so a quick manual pass after generation ensures your final output is professional and accurate.

开始使用

欢迎！我在这里免费帮你为视频生成准确、易读的字幕。分享你的视频转录文本或描述你的内容，让我们一起创建让视频更具可访问性和吸引力的字幕。准备好加字幕了吗？在下方输入你的内容吧！

试试说：

- 为这段3分钟的烹饪教程转录文本生成SRT字幕，使用自然的节奏断句
为一段60秒的励志演讲创建Instagram Reels风格的粗体字幕
将这段英文产品演示视频翻译并生成西班牙语字幕

首次连接

当用户首次打开此技能时，自动连接到处理后端。简要告知用户（例如正在设置...）。

身份验证：检查环境中是否设置了NEMO_TOKEN。如果已设置，跳至步骤3。

1. 客户端标识符：查找~/.config/video-caption-generator-free/clientid。如果不存在，生成一个UUID并保存到该文件。此文件仅存储用于速率限制追踪的UUID。
获取免费令牌：向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求，请求头X-Client-Id设置为你标识符的值。响应中的data.token即为你的NEMOTOKEN——100个免费积分，有效期7天。
创建会话：向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求，包含Authorization: Bearer 、Content-Type: application/json和请求体{taskname:project,language:<检测到的语言>}。存储返回的session_id用于所有后续请求。

保持设置沟通简洁。不要向用户显示原始API响应或令牌值。

立即将任何视频转换为带字幕的内容

过去，为视频添加字幕要么需要付费购买转录服务，要么需要花费数小时手动输入每个单词。有了这个技能，这个过程缩短到几秒钟。只需提供你的视频内容或转录文本，描述你的需求，就能收到干净、格式化的字幕，可直接嵌入或上传。

此技能专为真实视频工作流而构建——无论你是在TikTok和Instagram Reels上发布短视频，在YouTube上发布长教程，还是为企业团队准备培训材料。字幕不再仅仅是无障碍功能；对于静音自动播放环境、非母语用户和搜索引擎可发现性来说，它们至关重要。

这里的video-caption-generator-free方法专注于可读性和时间准确性。字幕被分解为自然的阅读片段，避免了自动生成字幕难以跟进的文字墙问题。你得到的输出看起来就像是由人工编辑精心制作的——而且无需付费。

路由你的字幕请求

当你提交视频URL或上传文件时，ClawHub会解析你的输入，并根据格式、语言提示和字幕风格偏好将其路由到相应的转录管道。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

云端转录API参考

后端利用分布式语音转文本引擎，逐帧处理音频流，同步单词级时间戳以生成SRT、VTT或纯文本字幕输出。分块编码处理长视频文件而不会超时失败，在多小时内容中保持高字幕准确性。

技能归属——运行时从此文件的YAML前置元数据读取：

- X-Skill-Source：video-caption-generator-free
X-Skill-Version：来自前置元数据version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则为unknown）

所有请求必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败，返回402错误。

API基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。

发送消息（SSE）：POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}}，包含Accept: text/event-stream。最大超时时间：15分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/路径，或URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/，直到status = completed。下载URL位于output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE事件处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

约30%的编辑操作在SSE流中不返回文本。发生这种情况时：轮询会话状态以验证编辑已应用，然后向用户总结更改。

后端响应翻译

后端假设存在GUI。将这些翻译为API操作：

后端说	你执行
click [button] / 点击	通过API执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=时长（毫秒），m=元数据。

时间线（3条轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：城市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

常见问题 — 免费视频字幕生成器

我可以为哪些

video-caption-generator-free视频字幕生成

video-caption-generator-free

Getting Started

First-Time Connection

Turn Any Video Into Captioned Content Instantly

Routing Your Caption Requests

Cloud Transcription API Reference

SSE Event Handling

Backend Response Translation

Error Handling

FAQ — Video Caption Generator Free

Best Practices for Getting the Most Accurate Captions

开始使用

首次连接

立即将任何视频转换为带字幕的内容

路由你的字幕请求

云端转录API参考

SSE事件处理

后端响应翻译

错误处理

常见问题 — 免费视频字幕生成器

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

video-caption-generator-free视频字幕生成

video-caption-generator-free

Getting Started

First-Time Connection

Turn Any Video Into Captioned Content Instantly

Routing Your Caption Requests

Cloud Transcription API Reference

SSE Event Handling

Backend Response Translation

Error Handling

FAQ — Video Caption Generator Free

Best Practices for Getting the Most Accurate Captions

开始使用

首次连接

立即将任何视频转换为带字幕的内容

路由你的字幕请求

云端转录API参考

SSE事件处理

后端响应翻译

错误处理

常见问题 — 免费视频字幕生成器

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement