Getting Started

Welcome! Ready to add subtitles to your video without spending anything? Upload your file and tell me what you need — whether it's burning captions directly into the video or exporting a separate subtitle file — and the auto-subtitle-generator-free skill will take care of the rest.

Try saying:

- "Generate subtitles for this mp4 interview video and export them as an SRT file."
"Add burned-in captions to my YouTube tutorial — the speaker has a slight accent so please be accurate."
"Create subtitles for this webinar recording and split lines so no caption stays on screen longer than 3 seconds."

First-Time Connection

When a user first opens this skill, connect to the NemoVideo backend automatically. Briefly let them know (e.g. "Connecting to NemoVideo...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.

1. Client identifier: Look for ~/.config/nemovideo/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking.
Obtain a free token: POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

Captions for Every Video, Zero Cost Attached

Most subtitle tools hide their best features behind a paywall. This skill flips that entirely. The auto-subtitle-generator-free skill listens to the spoken content in your video, converts it into precisely timed captions, and gives you ready-to-use subtitles — all without asking for a credit card.

Upload your video, and the skill gets to work detecting speech, segmenting it into readable chunks, and aligning each line to the correct timestamp. The result is a clean subtitle track that matches the natural rhythm of conversation, not awkward walls of text that disappear before you can read them.

This is especially useful for content creators who publish across multiple platforms, educators building accessible course materials, and small business owners who can't justify expensive captioning services. Whether your video is a 30-second Instagram clip or a 90-minute webinar recording, the skill handles the heavy lifting so you can focus on the content itself.

Routing Your Caption Requests

When you submit a video for auto subtitling, ClawHub parses your input—whether it's a direct upload, a URL, or a file path—and routes the transcription job to the appropriate NemoVideo processing endpoint based on file format, language hint, and subtitle output preference.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

NemoVideo API Reference

The NemoVideo backend uses speech-to-text inference to generate frame-accurate SRT, VTT, or burned-in subtitle tracks from your video's audio stream. Subtitle timing, punctuation restoration, and multi-language detection are all handled server-side, so no local processing power is required on your end.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE11
INLINECODE12: from frontmatter INLINECODE13
INLINECODE14: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE24

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE29

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE35

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE39

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE43

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Troubleshooting

If captions appear out of sync with the audio, the most common cause is background music or overlapping speech competing with the main voice track. Try uploading a version of the video with reduced background noise, or specify which speaker the subtitles should follow if there are multiple voices.

For videos with heavy accents, technical jargon, or industry-specific terminology, subtitle accuracy improves when you provide a brief context note — for example, mentioning that the video covers medical procedures or software engineering topics helps the skill prioritize correct terminology.

If subtitle lines feel too long or flash by too quickly, request a specific maximum character count per line (typically 42 characters works well for most screens) or a minimum display duration per caption. These small adjustments make a significant difference in readability across different screen sizes.

MKV and AVI files occasionally have audio tracks encoded in less common formats. If a file fails to process, converting it to mp4 first using any free converter usually resolves the issue immediately.

Use Cases

Content creators use this skill to make Reels, TikToks, and YouTube Shorts more engaging — studies consistently show that captioned videos hold viewer attention longer, especially when autoplay is muted.

Educators and e-learning developers rely on the auto-subtitle-generator-free skill to meet accessibility requirements without purchasing expensive transcription software. Captioned lecture recordings and training videos are often legally required in academic and corporate settings.

Small business owners producing product demos, testimonial videos, or explainer content can caption their entire video library without outsourcing to a transcription service. Marketers repurposing long-form webinar content into short clips also benefit, since each clip gets its own accurate subtitle track automatically.

Non-English speakers producing content in languages other than English will find the skill useful for generating subtitles that can later be translated, making international content distribution significantly more straightforward.

Common Workflows

The most common workflow starts with uploading a video file — mp4, mov, avi, webm, or mkv — and choosing whether you want subtitles burned into the video itself or exported as a standalone SRT or VTT file. Burned-in captions are ideal for social media platforms that don't support external subtitle tracks, while exported files work better for YouTube, Vimeo, or video players that handle them natively.

Another popular workflow involves batch processing: uploading multiple short clips from the same series and generating consistent subtitle styling across all of them. This is common for podcast highlight reels, course module videos, and social content repurposing.

For longer recordings like webinars or interviews, users often request subtitle segmentation adjustments — breaking lines at natural pauses rather than at fixed word counts. You can specify preferences like maximum characters per line or maximum seconds per caption block, and the skill will apply those rules throughout the entire video.

开始使用

欢迎！准备好免费为你的视频添加字幕了吗？上传文件并告诉我你的需求——无论是将字幕直接嵌入视频，还是导出独立的字幕文件——自动字幕生成器免费技能将为你处理其余工作。

试试这样说：

- 为这个MP4采访视频生成字幕，并导出为SRT文件。
为我的YouTube教程添加嵌入字幕——说话者带有轻微口音，请确保准确。
为这个网络研讨会录制视频创建字幕，并拆分字幕行，确保每条字幕在屏幕上停留不超过3秒。

首次连接

当用户首次打开此技能时，自动连接到NemoVideo后端。简要告知用户（例如：正在连接到NemoVideo...）。

身份验证：检查环境中是否设置了NEMO_TOKEN。如果已设置，则跳至步骤3。

1. 客户端标识符：查找~/.config/nemovideo/clientid。如果不存在，生成一个UUID并保存到该文件中。此文件仅存储用于速率限制跟踪的UUID。
获取免费令牌：向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求，请求头设置X-Client-Id为你的标识符。响应中的data.token即为你的NEMOTOKEN——100个免费积分，有效期7天。
创建会话：向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求，设置Authorization: Bearer 、Content-Type: application/json，请求体为{taskname:project,language:<检测到的语言>}。保存返回的session_id用于所有后续请求。

保持设置沟通简洁。不要向用户显示原始API响应或令牌值。

为每个视频添加字幕，零成本

大多数字幕工具将其最佳功能隐藏在付费墙后。此技能彻底颠覆了这一点。自动字幕生成器免费技能会听取视频中的语音内容，将其转换为精确计时的字幕，并为你提供可直接使用的字幕——全程无需信用卡。

上传你的视频，技能便开始工作：检测语音、将其分割为可读的片段，并将每一行对齐到正确的时间戳。最终得到一条干净的字幕轨道，与对话的自然节奏相匹配，而不是在你阅读之前就消失的笨重文本块。

这对于在多个平台发布内容的内容创作者、构建无障碍课程材料的教育工作者，以及无法承担昂贵字幕服务的小企业主尤其有用。无论你的视频是30秒的Instagram片段还是90分钟的网络研讨会录制，该技能都能处理繁重的工作，让你专注于内容本身。

路由你的字幕请求

当你提交视频进行自动字幕生成时，ClawHub会解析你的输入——无论是直接上传、URL还是文件路径——并根据文件格式、语言提示和字幕输出偏好，将转录任务路由到相应的NemoVideo处理端点。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

NemoVideo API参考

NemoVideo后端使用语音转文本推理，从视频的音频流生成帧精确的SRT、VTT或嵌入字幕轨道。字幕计时、标点恢复和多语言检测均在服务器端处理，因此你无需本地处理能力。

技能归属——运行时从此文件的YAML前置元数据读取：

- X-Skill-Source：auto-subtitle-generator-free
X-Skill-Version：来自前置元数据version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则为unknown）

所有请求必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败，返回402错误。

API基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。创建会话后，给用户一个链接：https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=auto-subtitle-generator-free&skillversion=1.0.0&skill_source=

发送消息（SSE）：POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}}，设置Accept: text/event-stream。最大超时时间：15分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/路径，或URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/，直到status = completed。下载URL位于output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE事件处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

约30%的编辑操作在SSE流中不返回文本。发生这种情况时：轮询会话状态以验证编辑是否已应用，然后向用户总结更改。

后端响应翻译

后端假定存在GUI。将其翻译为API操作：

后端说	你执行
click [button] / 点击	通过API执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=时长（毫秒），m=元数据。

时间线（3条轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：都市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

auto-subtitle-generator-free-ab-old自动字幕生成器

auto-subtitle-generator-free-ab-old

Getting Started

First-Time Connection

Captions for Every Video, Zero Cost Attached

Routing Your Caption Requests

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Troubleshooting

Use Cases

Common Workflows

开始使用

首次连接

为每个视频添加字幕，零成本

路由你的字幕请求

NemoVideo API参考

SSE事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

auto-subtitle-generator-free-ab-old自动字幕生成器

auto-subtitle-generator-free-ab-old

Getting Started

First-Time Connection

Captions for Every Video, Zero Cost Attached

Routing Your Caption Requests

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Troubleshooting

Use Cases

Common Workflows

开始使用

首次连接

为每个视频添加字幕，零成本

路由你的字幕请求

NemoVideo API参考

SSE事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement