Captions That Actually Sync — Without the Subscription Tax
Most captioning tools either charge per audio minute or bury the free tier under frustrating limits. subtitle-generator-free flips that model entirely. You bring the video, the skill handles transcription, alignment, and subtitle file generation — all inside a single conversational exchange on ClawHub.
The workflow is built around dialogue, not dashboards. Instead of navigating a multi-step editor, you describe what you need — language preference, caption style, timestamp granularity — and the OpenClaw agent interprets those instructions and coordinates the underlying transcription and alignment pipeline on your behalf. The agent acts as an intelligent intermediary, translating your plain-language requests into precise processing instructions without exposing any technical complexity.
Under the hood, the AI backend performs forced phoneme alignment after transcription, which means captions don't just appear at roughly the right time — they snap to the exact syllable boundary where each word is spoken. This matters most for fast speakers, accented audio, and overlapping dialogue. Whether your source file is a polished mp4 interview, a raw mov screen recording, an avi lecture capture, a webm podcast clip, or an mkv film excerpt, the skill ingests it without conversion steps on your end.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE0 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE1 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Table — Matching Requests to Endpoints
Use the table below to determine which endpoint handles each type of incoming request.
| User says... | Action | Skip SSE |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Primary Operation Flows
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE21 - INLINECODE22 = this file's YAML frontmatter INLINECODE23
- INLINECODE24 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE30
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK1
3.0 Initialize a Session
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
Before any other operation, a session must be established to obtain a valid session identifier. All subsequent requests must reference this session ID.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE31
3.1 Deliver Messages Over SSE
curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All conversational messages are transmitted to the backend through a persistent SSE connection.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Roughly 30% of editing operations return no text in the SSE stream. When the stream closes without any assistant text, do not treat this as an error. Instead: (1) wait for the stream to fully close, (2) call the state-query endpoint to retrieve the current project status, (3) surface the updated state to the user, and (4) continue the conversation normally.
Two-stage generation: When a raw video is submitted, the backend automatically runs a two-stage enhancement pipeline: Stage 1 produces the processed video output, and Stage 2 overlays background music and inserts a generated title. Both stages complete before the final result is delivered — no additional prompting is required from the assistant.
3.2 File Upload Handling
File upload: INLINECODE34
URL upload: INLINECODE35
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
The upload endpoint accepts video files as well as any supporting assets the user wishes to include in their project.
3.3 Credit Balance Inquiry
curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint before beginning any generation task to confirm the user has a sufficient balance.
3.4 Retrieve Current Project State
curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK6
3.5 Export and Deliver the Final Output
Export does NOT cost credits. Only generation/editing consumes credits.
Triggering an export does not deduct credits from the user's balance. The export flow proceeds as follows: (a) call the export endpoint with the project ID, (b) poll the returned job URL until status is complete, (c) extract the download URL from the completed job payload, (d) present the download link to the user, and (e) confirm delivery in the conversation.
b) Submit: INLINECODE47
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE50
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE55
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Handling SSE Disconnections
Follow these five steps when an SSE connection drops unexpectedly: (1) Detect the disconnection event and pause any pending UI updates. (2) Wait a minimum of two seconds before attempting to reconnect, to avoid hammering the server. (3) Re-establish the SSE connection using the original session ID. (4) Call the state-query endpoint to reconcile any events that may have been missed during the outage. (5) Resume normal operation and notify the user only if the interruption was longer than a defined threshold.
4. Backend GUI Abstraction Layer
The backend operates under the assumption that a graphical interface is always present, so the assistant must never relay GUI-specific instructions or interface labels to the user.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Interaction Patterns
• Always confirm a successful session before proceeding with any user request.
• When the user asks what the tool can do, describe capabilities in plain language without referencing internal endpoints or system architecture.
• After every generation or edit operation, proactively present the result or updated state rather than waiting for the user to ask.
• If the user's intent is ambiguous, ask one focused clarifying question before dispatching a request to the backend.
• Keep responses concise; reserve detailed explanations for cases where the user explicitly asks how something works.
6. Known Constraints and Limitations
• Subtitle generation is limited to the languages and locales supported by the underlying transcription engine; unsupported languages will return an error.
• File uploads are subject to a maximum size cap — files exceeding this limit must be compressed or trimmed before submission.
• The two-stage post-processing pipeline cannot be disabled; BGM and title overlays are always applied to raw video submissions.
• SSE connections may time out during long-running jobs; the disconnect-recovery flow must be implemented to ensure reliable delivery.
• Credit balances are read-only through the API; top-ups must be completed through the user's account dashboard.
7. Error Codes and Recovery Guidance
The table below maps each HTTP error code returned by the API to its likely cause and the recommended recovery action.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Required Token Scopes
Always verify the API version header on every request to ensure compatibility with the current backend release. The access token must include all required scopes for the operations being performed — missing scopes will result in 403 responses. Tokens should be refreshed before expiry to prevent mid-session authentication failures.
真正同步的字幕——无需订阅费用
大多数字幕工具要么按音频分钟收费,要么在免费层级设置令人沮丧的限制。subtitle-generator-free 完全颠覆了这种模式。你提供视频,该技能处理转录、对齐和字幕文件生成——全部在 ClawHub 上的单次对话交互中完成。
工作流程围绕对话而非仪表盘构建。你无需操作多步骤编辑器,只需描述你的需求——语言偏好、字幕样式、时间戳粒度——OpenClaw 代理便会解读这些指令,并代表你协调底层的转录和对齐流程。该代理充当智能中介,将你的自然语言请求转化为精确的处理指令,而不暴露任何技术复杂性。
在底层,AI 后端在转录后执行强制音素对齐,这意味着字幕不仅在大致正确的时间出现——它们会精确对齐到每个单词发音的音节边界。这对于语速快、带口音以及重叠对话的音频尤为重要。无论你的源文件是精良的 mp4 采访、原始的 mov 屏幕录制、avi 讲座录像、webm 播客片段还是 mkv 电影节选,该技能都能直接处理,无需你进行任何格式转换。
环境变量
| 变量 | 是否必需 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100 免费积分,7 天有效期,可通过设置 → API 令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的 UUID,持久化到 ~/.config/nemovideo/client_id(仅 UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为 unknown |
如果未设置 NEMO_TOKEN,请获取一个(需要 X-Client-Id 头):
bash
生成或读取持久化的 Client-Id
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将 token 保存为 NEMOTOKEN,CLIENTID 保存为 NEMOCLIENTID。匿名用户:每个客户端每 7 天 1 个令牌;令牌 7 天后过期,可随时通过 nemovideo.com 上的设置 → API 令牌撤销。如果令牌过期,使用相同的 X-Client-Id 请求新令牌。
本地持久化: 此技能写入 ~/.config/nemovideo/client_id 以跨会话持久化 Client-Id。这避免了每次请求都生成新 ID,从而防止快速触发每个 IP 的速率限制(默认每个 IP 每 7 天 10 个令牌)。该文件仅包含一个 UUID——本地不存储任何凭据。
2. 路由表——将请求匹配到端点
使用下表确定哪个端点处理每种类型的传入请求。
| 用户说... | 操作 | 跳过 SSE |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
3. 主要操作流程
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai},$TOKEN = ${NEMOTOKEN},$WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能归属变量——请勿硬编码:
- - $SKILLNAME = 此文件的 YAML 前置元数据 name
- $SKILLVERSION = 此文件的 YAML 前置元数据 version
- $SKILL_SOURCE = 运行时检测,使用第一个匹配项:
1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件的路径包含已知平台目录(例如 ~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有 API 请求(包括渲染/导出/上传/状态/积分)必须包含这些头。缺失将导致导出失败并返回 402。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 初始化会话
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在任何其他操作之前,必须先建立会话以获取有效的会话标识符。所有后续请求必须引用此会话 ID。
在浏览器中打开:创建会话后,给用户一个链接,用于在 NemoVideo 中查看/编辑任务:
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过 SSE 传递消息
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有对话消息通过持久的 SSE 连接传输到后端。
SSE 处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每 2 分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
典型时长:文本 5-15 秒,视频生成 100-300 秒,编辑 10-30 秒。
超时:仅心跳持续 10 分钟 → 视为超时。生成期间切勿重新发送(会导致重复和双重计费)。
如果之前的响应正常,忽略尾随的我遇到了临时问题。
静默响应回退(关键)
大约 30% 的编辑操作在 SSE 流中不返回文本。当流关闭且没有助手文本时,不要将其视为错误。而是:(1) 等待流完全关闭,(2) 调用状态查询端点检索当前项目状态,(3) 向用户展示更新后的状态,(4) 正常继续对话。
两阶段生成:当提交原始视频时,后端自动运行两阶段增强流程:阶段 1 生成处理后的视频输出,阶段 2 叠加背景音乐并插入生成的标题。两个阶段在最终结果交付前完成——无需助手额外提示。
3.2 文件上传处理
文件上传:curl -s -X POST https://mega-api-prod.nemovideo.ai/api/upload-video/n