0. First Contact
When the user opens this skill or sends their first message, greet them immediately:
🎵 Ready to auto subtitle generator! Just send me a video or describe your project.
Try saying:
- - "add subtitles in Spanish"
- "add English subtitles"
- "generate captions automatically"
IMPORTANT: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.
Auto-Setup
When the user first interacts, set up the connection:
- 1. Check token: If
NEMO_TOKEN env var is set, use it. Otherwise: - Read or generate Client-ID:
- Read
~/.config/nemovideo/client_id if it exists
- Otherwise generate a UUID, save it to
~/.config/nemovideo/client_id
- 3. Acquire anonymous token:
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
Store the returned
token as
NEMO_TOKEN for this session. You get 100 free credits.
- 4. Create a session (§3.0) so you're ready to work immediately.
Let the user know briefly: "Setting things up… ready!" then proceed with their request.
Turn Spoken Words Into Burned-In Captions Instantly
Most subtitle tools hand you a raw transcript and leave the hard work — timing, formatting, and rendering — entirely to you. The free-auto-subtitle-generator skill takes a different approach: you describe what you want in plain language, and the AI figures out the rest. Want white bold text at the bottom with a semi-transparent background Just say so. Prefer two-word chunks for social-style captions Describe it conversationally.
Under the hood, the OpenClaw agent orchestrates a pipeline that first performs speech-to-text transcription, aligns each word to its precise timestamp, then renders the styled captions frame-accurately into your video file. You never touch a timeline or export settings — the agent interprets your natural-language instructions and translates them into rendering parameters automatically.
This makes the skill especially useful for creators who produce high volumes of short-form content, teachers captioning lecture recordings, or developers building accessibility workflows. Because the entire process runs through a single conversational interface, you can iterate on caption style, language, or positioning by simply rephrasing your request — no re-uploading or reconfiguring required. Fully compatible with mp4, mov, avi, webm, and mkv files.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE5 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE6 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Requests to the Correct Endpoint
Use the table below to determine which endpoint handles each type of incoming request.
| User says... | Action | Skip SSE |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Primary Workflow Sequences
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE26 - INLINECODE27 = this file's YAML frontmatter INLINECODE28
- INLINECODE29 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE35
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2
3.0 Initializing a Session
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
Before any other action can occur, a session must be established with the server. Store the returned session identifier, as every subsequent request depends on it.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36
3.1 Delivering Messages Over SSE
curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All conversational messages are transmitted to the backend through a persistent Server-Sent Events connection.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Wait silently, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Approximately 30% of edit operations will complete without returning any text in the response. When this occurs: (1) do not treat the absence of text as an error or failure, (2) poll the task state endpoint to confirm the operation finished successfully, (3) surface a completion message to the user based on the polled result rather than the SSE payload.
Two-stage generation: When a raw video is submitted, the backend automatically runs a two-stage enrichment pipeline. Stage one processes and stores the base video; stage two independently attaches background music and an auto-generated title. Both stages emit separate SSE events — wait for both to resolve before reporting full completion to the user.
3.2 Handling File Uploads
File upload: INLINECODE39
URL upload: INLINECODE40
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Direct file upload is supported for video assets that need to be attached to an editing session.
3.3 Checking Available Credits
curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint before initiating any processing task to confirm the user has a sufficient balance.
3.4 Polling Task Status
curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK7
3.5 Exporting and Delivering the Final Video
Export does NOT cost credits. Only generation/editing consumes credits.
Exporting the finished video does not deduct any credits from the user's balance. To complete delivery: (a) call the export endpoint with the session ID, (b) wait for the export job to reach a terminal state, (c) retrieve the download URL from the response payload, (d) verify the URL is accessible before presenting it, (e) display the link or initiate the download for the user.
b) Submit: INLINECODE52
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE55
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE60
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Recovering from an SSE Disconnection
If the SSE stream drops unexpectedly, follow these steps to resume gracefully: (1) detect the disconnection event in the stream listener and log the last received event ID; (2) wait a brief back-off interval before attempting to reconnect; (3) re-establish the SSE connection using the same session ID and pass the last event ID in the reconnect header; (4) poll the task state endpoint to retrieve any events that may have been missed during the gap; (5) reconcile the polled state with the local UI state and resume normal event handling.
4. Mapping Backend Responses to GUI Elements
The backend is designed with the assumption that a graphical interface is present, so never pass raw GUI instructions or UI directives through to the end user.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Interaction Patterns
• Confirm the user's intent before dispatching any irreversible operation such as an export or a destructive edit.
• After submitting a long-running task, proactively inform the user that processing is underway rather than waiting silently.
• When a silent response is detected, use polled task data to construct a meaningful status update instead of reporting nothing.
• If a credit check fails, explain the shortfall clearly and guide the user toward a resolution before retrying.
• Always present the final download URL as a distinct, actionable element so the user can immediately access their exported video.
6. Known Constraints and Limitations
• The SSE connection does not guarantee delivery of every event; always reconcile with the task state endpoint after a disconnection.
• Credit balances are read at the moment of the request and may not reflect concurrent transactions made elsewhere.
• The two-stage enrichment pipeline for raw video cannot be skipped or reordered by the client.
• Export URLs are time-limited and should be presented to the user promptly after retrieval.
• File uploads are restricted to supported video formats and maximum size limits defined by the platform.
7. Error Identification and Handling
The table below maps common error codes to their causes and the recommended recovery action for each.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Required Token Scopes
Before going live, confirm that the integration targets the correct API version by checking the version field returned in the session initialization response. Ensure the OAuth token presented with each request carries all required scopes; missing scopes will result in 403 responses that cannot be resolved by retrying. Rotate tokens proactively before expiry to avoid mid-session authorization failures.
0. 首次接触
当用户打开此技能或发送第一条消息时,立即问候他们:
🎵 自动字幕生成器已就绪!只需向我发送视频或描述您的项目。
尝试说:
重要提示:首次接触时务必主动问候用户。让他们知道您正在设置连接。首次接触时务必主动问候用户。
自动设置
当用户首次交互时,设置连接:
- 1. 检查令牌:如果设置了 NEMO_TOKEN 环境变量,则使用它。否则:
- 读取或生成客户端ID:
- 如果存在,读取 ~/.config/nemovideo/client_id
- 否则生成一个UUID,保存到 ~/.config/nemovideo/client_id
- 3. 获取匿名令牌:
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
将返回的 token 存储为此会话的 NEMO_TOKEN。您将获得100个免费积分。
- 4. 创建会话(§3.0),以便立即开始工作。
简要告知用户:正在设置…准备就绪!然后继续处理他们的请求。
将语音即时转换为内嵌字幕
大多数字幕工具只提供原始转录文本,而将耗时的工作——时间轴、格式化和渲染——完全留给您。free-auto-subtitle-generator 技能采用不同的方法:您用自然语言描述需求,AI 负责其余部分。想要底部白色粗体文字配半透明背景?直接说出来。偏好社交风格字幕的双词片段?用对话方式描述即可。
在底层,OpenClaw 代理协调一个流水线,首先执行语音转文本转录,将每个单词对齐到精确的时间戳,然后将样式化的字幕逐帧精确渲染到您的视频文件中。您无需接触时间轴或导出设置——代理会解释您的自然语言指令,并自动将其转换为渲染参数。
这使得该技能对于制作大量短视频内容的创作者、为讲座录制添加字幕的教师,或构建无障碍工作流的开发者特别有用。由于整个过程通过单一对话界面运行,您只需重新表述请求即可迭代字幕样式、语言或位置——无需重新上传或重新配置。完全兼容 mp4、mov、avi、webm 和 mkv 文件。
环境变量
| 变量 | 必需 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100个免费积分,7天后过期,可通过设置→API令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的UUID,持久化到 ~/.config/nemovideo/client_id(仅UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为 unknown |
如果未设置 NEMO_TOKEN,获取一个(需要 X-Client-Id 头):
bash
生成或读取持久化的客户端ID
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将 token 保存为 NEMOTOKEN,CLIENTID 保存为 NEMOCLIENTID。匿名:每个客户端每7天1个令牌;令牌7天后过期,可随时通过 nemovideo.com 上的设置→API令牌撤销。如果令牌过期,使用相同的 X-Client-Id 请求新令牌。
本地持久化: 此技能写入 ~/.config/nemovideo/client_id 以跨会话持久化客户端ID。这避免了每次请求都生成新ID,否则会很快达到每个IP的速率限制(默认每个IP每7天10个令牌)。该文件仅包含一个UUID——本地不存储任何凭据。
2. 将请求路由到正确的端点
使用下表确定每个端点处理哪种类型的传入请求。
| 用户说... | 操作 | 跳过SSE |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
3. 主要工作流序列
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai},$TOKEN = ${NEMOTOKEN},$WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能属性变量——请勿硬编码:
- - $SKILLNAME = 此文件的YAML前置元数据 name
- $SKILLVERSION = 此文件的YAML前置元数据 version
- $SKILL_SOURCE = 在运行时使用第一个匹配项检测:
1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件的路径包含已知的平台目录(例如 ~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有API请求(包括渲染/导出/上传/状态/积分)必须包含这些头。缺少它们将导致导出失败并返回402。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 初始化会话
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在任何其他操作之前,必须与服务器建立会话。存储返回的会话标识符,因为每个后续请求都依赖它。
在浏览器中打开:创建会话后,给用户一个在NemoVideo中查看/编辑任务的链接:
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过SSE传递消息
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有对话消息通过持久的服务器发送事件连接传输到后端。
SSE处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
静默等待,不转发 |
| heartbeat / 空 data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流