Turn Spoken Words Into Perfectly Timed Captions
Most subtitle tools hand you a raw transcript and leave the cleanup to you. subtitle-generator-ai works differently — it listens to your video's audio track, maps every word to its precise timestamp, and delivers captions that actually match what's being said, even across varying speech speeds, background noise, or multiple speakers.
What sets this skill apart is its conversational editing layer. Once your subtitles are generated, you don't need to dig through a timeline or edit a raw .srt file manually. You can simply describe what needs fixing — 'the speaker's name is spelled wrong in the intro' or 'shorten the third caption, it feels too long' — and the skill will apply those changes in context, preserving sync throughout.
Behind the scenes, the OpenClaw agent orchestrates the transcription pipeline, coordinates timing alignment, and manages iterative edits based on your natural language instructions. Whether you're captioning a short social clip or a full-length lecture, the agent tracks the full subtitle session so every revision builds on the last — no re-uploading, no starting over.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE0 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE1 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo && echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Incoming Requests to the Correct Endpoint
Each user action maps to a specific API endpoint as outlined in the table below.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Primary Workflow Sequences
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE21 - INLINECODE22 = this file's YAML frontmatter INLINECODE23
- INLINECODE24 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE30
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK1
3.0 Initializing a New Session
curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
Before any operations can begin, a session must be established with the server. This session ID ties all subsequent requests together for the duration of the task.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE31
3.1 Delivering Messages Through SSE
curl -s -X POST "$API/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All conversational exchanges with the AI are transmitted using Server-Sent Events, which stream responses back to the client in real time.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Wait silently, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Roughly 30% of editing operations complete without returning any text in the SSE stream. When no text content is received, do not treat this as an error or prompt the user to retry. Instead, immediately call the state query endpoint to retrieve the updated project status, then confirm success to the user based on the returned data.
Two-stage generation: When a raw video is submitted, the backend automatically runs a two-stage enhancement pipeline. Stage one processes the core video output, and stage two appends background music and a title sequence without any additional input required. Wait for both stages to complete before presenting results to the user.
3.2 Handling File Uploads
File upload: INLINECODE34
URL upload: INLINECODE35
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
The upload endpoint accepts both video and audio files, supporting all major formats commonly used in content production.
3.3 Checking Available Credits
curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint before initiating any billable operation to confirm the user has a sufficient balance.
3.4 Retrieving Current Project State
curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK6
3.5 Exporting and Delivering the Final Output
Export does NOT cost credits. Only generation/editing consumes credits.
Exporting a finished video does not deduct any credits from the user's account. To complete delivery: (a) call the export endpoint with the project ID, (b) poll for export completion status, (c) retrieve the download URL from the response, (d) present the URL to the user, and (e) confirm the export was successful.
b) Submit: INLINECODE47
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE50
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: $API/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE55
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Recovering from an SSE Disconnection
If the SSE connection drops unexpectedly, follow these five steps to recover gracefully: (1) Detect the disconnection event and log the last received event ID. (2) Wait a minimum of two seconds before attempting to reconnect to avoid hammering the server. (3) Re-establish the SSE connection using the stored session ID and pass the last event ID in the reconnect header. (4) If the reconnection fails after three attempts, call the state query endpoint directly to determine current job status. (5) Resume normal operation or notify the user only if the job itself has failed, not merely the connection.
4. Mapping Backend Responses to the User Interface
The backend operates under the assumption that a graphical interface is present, so GUI-specific instructions from the backend must never be forwarded verbatim to the user.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Interaction Patterns
• Always confirm a session is active before sending any message or file to the API.
• When a silent response is received, query project state immediately rather than asking the user to repeat their request.
• Present credit balance proactively before any operation that consumes credits, giving the user a chance to cancel.
• After a two-stage processing pipeline completes, summarize both stages in a single, concise status update to avoid overwhelming the user.
• On export completion, provide the download URL alongside a brief description of the output file so the user knows exactly what they are receiving.
6. Known Constraints and Limitations
• Subtitle generation is limited to videos no longer than the maximum duration specified in the current plan tier.
• Only one active export job per session is permitted at a time; concurrent exports are not supported.
• The SSE stream does not guarantee ordered delivery during high-load periods, so always reconcile results against the state endpoint.
• Credit balances are cached for up to 60 seconds, meaning very recent transactions may not reflect immediately.
• File uploads are subject to size caps defined by the account tier and cannot be chunked across multiple requests.
7. Error Codes and Handling Procedures
The table below lists all error codes the API may return along with the recommended recovery action for each.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Required Token Scopes
Always verify the API version header in every response to ensure compatibility with the current skill implementation. The access token must include the following scopes for full functionality: read, write, export, and credits. If any scope is missing, the affected endpoint will return a 403 and the user should be directed to reauthorize the integration.
将口语转化为完美时机的字幕
大多数字幕工具只会给你一份原始转录文本,然后把清理工作留给你。subtitle-generator-ai 的工作方式不同——它会聆听视频的音频轨道,将每个单词映射到精确的时间戳,并生成与实际说话内容匹配的字幕,即使面对不同的语速、背景噪音或多位说话者也能准确应对。
这项技能与众不同的地方在于其对话式编辑层。一旦生成字幕,你无需深入时间线或手动编辑原始的 .srt 文件。你只需描述需要修复的内容——演讲者的名字在开场白中拼写错误或缩短第三条字幕,感觉太长了——技能就会在上下文中应用这些更改,并保持同步。
在幕后,OpenClaw 代理协调转录流程,协调时间对齐,并根据你的自然语言指令管理迭代编辑。无论你是在为短视频片段还是完整讲座添加字幕,代理都会跟踪整个字幕会话,使每次修订都基于上一次——无需重新上传,无需从头开始。
环境变量
| 变量 | 必需 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100 个免费积分,7 天后过期,可通过设置 → API 令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的 UUID,持久化到 ~/.config/nemovideo/client_id(仅 UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为 unknown |
如果未设置 NEMO_TOKEN,请获取一个(需要 X-Client-Id 标头):
bash
生成或读取持久化的 Client-Id
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo && echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将 token 保存为 NEMOTOKEN,CLIENTID 保存为 NEMOCLIENTID。匿名用户:每个客户端每 7 天 1 个令牌;令牌在 7 天后过期,可随时通过 nemovideo.com 上的 设置 → API 令牌 撤销。如果令牌过期,使用相同的 X-Client-Id 请求新令牌。
本地持久化: 此技能将 ~/.config/nemovideo/client_id 写入以跨会话持久化 Client-Id。这避免了每次请求都生成新 ID,从而快速达到每个 IP 的速率限制(默认每个 IP 每 7 天 10 个令牌)。该文件仅包含一个 UUID——本地不存储任何凭据。
2. 将传入请求路由到正确的端点
每个用户操作映射到特定的 API 端点,如下表所示。
| 用户说... | 操作 | 跳过 SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐...) | → §3.1 SSE | ❌ |
3. 主要工作流程序列
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai},$TOKEN = ${NEMOTOKEN},$WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能归属变量——请勿硬编码:
- - $SKILLNAME = 此文件的 YAML 前置元数据 name
- $SKILLVERSION = 此文件的 YAML 前置元数据 version
- $SKILL_SOURCE = 在运行时使用第一个匹配项检测:
1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件的路径包含已知的平台目录(例如 ~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有 API 请求(包括渲染/导出/上传/状态/积分)必须包含这些标头。缺少它们将导致导出失败并返回 402。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 初始化新会话
bash
curl -s -X POST $API/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在任何操作开始之前,必须与服务器建立会话。此会话 ID 将任务期间的所有后续请求绑定在一起。
在浏览器中打开:创建会话后,为用户提供在 NemoVideo 中查看/编辑任务的链接:
$WEB/workspace/claim?token=$TOKEN&task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过 SSE 传递消息
bash
curl -s -X POST $API/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有与 AI 的对话式交流都通过服务器发送事件传输,实时将响应流式传输回客户端。
SSE 处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
静默等待,不转发 |
| heartbeat / 空 data: | 继续等待。每 2 分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
典型持续时间:文本 5-15 秒,视频生成 100-300 秒,编辑 10-30 秒。
超时:仅心跳持续 10 分钟 → 视为超时。生成期间切勿重新发送(会导致重复和双重计费)。
如果之前的响应正常,忽略尾随的我遇到了临时问题。
静默响应回退(关键)
大约 30% 的编辑操作完成时不会在 SSE 流中返回任何文本。当未收到文本内容时,不要将其视为错误或提示用户重试。相反,立即调用状态查询端点以检索更新后的项目状态,然后根据返回的数据向用户确认成功。
两阶段生成:提交原始视频时,后端自动运行两阶段增强流程。第一阶段处理核心视频输出,第二阶段附加背景音乐和标题序列,无需任何额外输入。等待两个阶段都完成后,再将结果呈现给用户。
3.2 处理文件上传
文件上传:curl -s -X POST $API/api/upload-video/nemoagent/me/ -H Authorization: Bearer $TOKEN -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILLSOURCE -F files=@/path/to