0. First Contact

When the user opens this skill or sends their first message, greet them immediately:

Welcome! I'm here to automatically generate accurate subtitles for your videos using the auto subtitle generator online — just share your video file or a link, tell me your preferred language and caption style, and I'll have your subtitled video or SRT file ready in minutes. Ready to caption your first video

Try saying:

- "Generate English subtitles for this MP4 interview and give me an SRT file I can upload to YouTube."
"Create burned-in captions for my MOV tutorial video with white bold text and a semi-transparent black background."
"Transcribe this Spanish webinar in WebM format and produce bilingual subtitles in both Spanish and English."

IMPORTANT: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.

Auto-Setup

When the user first interacts, set up the connection:

1. Check token: If NEMO_TOKEN env var is set, use it. Otherwise:
Read or generate Client-ID:

- Read ~/.config/nemovideo/client_id if it exists - Otherwise generate a UUID, save it to ~/.config/nemovideo/client_id

3. Acquire anonymous token:

   curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"

Store the returned token as NEMO_TOKEN for this session. You get 100 free credits.

4. Create a session (§3.0) so you're ready to work immediately.

Let the user know briefly: "Setting things up… ready!" then proceed with their request.

Turn Any Spoken Word Into On-Screen Text Instantly

The auto-subtitle-generator-online skill exists to solve one of the most time-consuming parts of video production: getting accurate, well-timed captions onto your footage without spending hours in a subtitle editor. Whether you're publishing a YouTube tutorial, a corporate training module, or a social media reel, readable subtitles dramatically increase watch time and accessibility compliance.

This skill operates through a conversational editing model, meaning you don't fill out forms or wrestle with timelines. Instead, you describe what you need — the language, caption placement, style preferences, or whether you want a burned-in subtitle track or a separate SRT file — and the skill handles the rest. You can refine results by simply telling it what to adjust, just like messaging a colleague.

Behind the scenes, the OpenClaw agent orchestrates a pipeline that sends your video's audio through a high-accuracy speech-to-text engine, aligns transcript segments to precise timestamps, applies formatting rules, and packages the output in your chosen format. The agent also handles edge cases like overlapping speech, background noise, and fast-paced dialogue to keep captions readable and synchronized.

Environment Variables

Variable	Required	Default
INLINECODE5	No	Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens)
INLINECODE6

If NEMO_TOKEN is not set, get one (requires X-Client-Id header):

# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
  CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
  mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}

Save token as NEMO_TOKEN, CLIENT_ID as NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same X-Client-Id.

Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.

2. Endpoint Directory & Routing Logic

Every caption-related request maps to a specific route — consult the table below to match each action to its correct endpoint and method.

User says...	Action	Skip SSE
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

3. Operational Flows for Subtitle Generation

$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.

Skill attribution variables — do NOT hardcode:

- $SKILL_NAME = this file's YAML frontmatter INLINECODE26
INLINECODE27 = this file's YAML frontmatter INLINECODE28
INLINECODE29 = detect at runtime using the first match:

1. Env SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g. ~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor)
3. Fallback: INLINECODE35

CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2

3.0 Initializing a Caption Session

curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
  -d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}

Before any transcript or timing data can be exchanged, a session must be established — this handshake creates the persistent context the backend uses to track your caption job. Without it, no downstream calls will resolve correctly.

Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36

3.1 Streaming Messages Over SSE

curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
  -d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'

All real-time caption events — including SRT chunk delivery, timing confirmations, and status updates — travel through a Server-Sent Events channel tied to the active session.

SSE Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.

Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).

Ignore trailing "I encountered a temporary issue" if prior responses were normal.

Silent Response Fallback (CRITICAL)

Roughly 30% of subtitle edit operations come back with zero text in the response body — no caption lines, no confirmation string, nothing. This is expected behavior, not a fault. When it happens: first, do not retry the request or flag it as failed; second, call the Query State endpoint to pull the current caption timeline directly; third, surface whatever SRT data that endpoint returns as the authoritative result; fourth, continue the conversation normally using that state as ground truth.

Two-stage generation: Raw video processed through the auto-subtitle pipeline triggers a two-stage backend sequence automatically. Stage one delivers the base caption track — the raw transcript synced to the video timeline. Stage two fires without any prompt from the client: the backend appends background music and a generated title card to the captioned output. Both stages must complete before the final SRT and media package are ready for export. Poll or listen on the SSE channel until both stage signals arrive.

3.2 Pushing Media Files for Captioning

File upload: INLINECODE39

URL upload: INLINECODE40

Use me in the path; backend resolves user from token.

Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

The upload endpoint accepts video and audio source files that the subtitle engine will transcribe and sync captions against.

3.3 Checking Available Caption Credits

curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}

Query the credits endpoint before launching any long transcription job to confirm sufficient balance exists for the operation.

3.4 Fetching Current Job & Caption State

curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"

Use me for user in path; backend resolves from token. Key fields: data.state.draft, data.state.video_infos, data.state.canvas_config, data.state.generated_media.

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Draft ready for export when draft.t exists with at least one track with non-empty sg.

Track summary format:
CODEBLOCK7

3.5 Exporting & Delivering the Final Caption Package

Export does NOT cost credits. Only generation/editing consumes credits.

Exporting a finished subtitle file — whether SRT, VTT, or burned-in video — consumes zero credits. The sequence runs as follows: (a) confirm both pipeline stages have completed via state query; (b) call the export endpoint with the desired caption format specified in the payload; (c) receive the signed delivery URL in the response; (d) stream or download the captioned asset from that URL; (e) acknowledge delivery back to the session so the backend can mark the job closed.

b) Submit: INLINECODE52

Note: sessionId is camelCase (exception). On failure → new id, retry once.

c) Poll (every 30s, max 10 polls): INLINECODE55

Status at top-level status: pending → processing → completed / failed. Download URL at output.url.

d) Download from output.url → send to user. Fallback: https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download.

e) When delivering the video, always also give the task detail link: INLINECODE60

Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.

3.6 Reconnecting After an SSE Drop

SSE connections drop — plan for it. Recovery follows five steps: 1) Detect the disconnect event on the client side immediately rather than waiting for a timeout. 2) Do not create a new session; the existing session ID remains valid. 3) Re-open the SSE channel using the same session ID and the Last-Event-ID header set to the most recently received event identifier. 4) Query the current caption state to backfill any SRT chunks or timing events that arrived during the gap. 5) Resume normal streaming from the last confirmed caption timestamp, discarding nothing from the pre-disconnect transcript.

4. GUI Layer Translation Rules

The backend operates under the assumption that a graphical interface sits between it and the end user — raw GUI instructions, button labels, or screen-level directions must never be forwarded through the API.

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Keep content descriptions. Strip GUI actions.

5. Conversation & Interaction Patterns

- Lead with intent, not mechanics: When a user asks to caption a video, confirm the source file and target language before touching any endpoint — rushing to upload without context produces misaligned SRT output.
Translate edits into caption operations: Phrases like 'fix that line' or 'move the subtitle earlier' map to specific timing-adjustment calls; identify the correct operation rather than asking the user to specify an endpoint.
Hold state across turns: The active session and the last-known caption timeline should persist through the entire conversation — never lose track of which subtitle block is currently in scope.
Surface SRT data meaningfully: When returning caption results, present timing and text in a readable format rather than dumping raw file content; summarize changes made to the transcript when relevant.
Anticipate the two-stage lag: After raw video is submitted, set user expectations that a second processing stage will follow automatically — do not treat the first-stage caption delivery as the final result.

6. Known Constraints & Limitations

- File size and duration ceilings are enforced at the upload stage — submissions exceeding the defined thresholds will be rejected before transcription begins.
Only the supported caption formats (SRT, VTT, and burned-in render) are available for export; custom or proprietary subtitle formats are outside scope.
Simultaneous sessions per credential are capped; attempting to open additional sessions beyond the limit will return an authorization error rather than queuing.
The auto-subtitle engine processes one language track per job — multi-language caption generation requires separate sequential submissions.
BGM and title-card injection in stage two cannot be disabled or skipped; they are applied to every raw video job without exception.

7. Error Codes & Recovery Guidance

When a caption request goes wrong, the response code and message together tell you exactly where in the subtitle pipeline the failure occurred — map them using the table below.

Code	Meaning	Action
0	Success	Continue
1001

Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.

8. API Version & Permission Scopes

Always verify the active API version before building a caption request — version mismatches silently alter field availability and can corrupt SRT timing data in the response. Token scopes govern which subtitle operations are permitted: a read-only scope covers state queries and credit checks, while a write scope is required for uploads, session creation, and export triggers. Confirm both the version header value and the granted scopes during initial integration; do not assume defaults carry the permissions needed for full caption pipeline access.

9. Integration Guide

Integrating the auto-subtitle-generator-online skill into your existing workflow is straightforward and requires no dedicated infrastructure on your end. Start by connecting your ClawHub account to your preferred storage layer — whether that's a cloud bucket, a direct file upload, or a public video URL. Once connected, the OpenClaw agent can accept video files in mp4, mov, avi, webm, or mkv formats up to the plan's size limit and begin transcription immediately.

For teams using content management systems or video hosting platforms, the skill supports webhook callbacks that notify your system when a subtitle file is ready for download or when a captioned video has been processed. This makes it easy to slot auto-subtitle-generator-online into automated publishing pipelines without manual intervention at the caption stage.

Developers building custom applications can invoke the skill programmatically through the ClawHub agent API, passing parameters such as target language, caption format (SRT, VTT, or burned-in), font preferences, and output destination. The skill returns structured metadata alongside the subtitle file, including word-level confidence scores and flagged segments where audio quality may have affected accuracy — giving your team clear signals on where a human review pass might add value.

0. 首次接触

当用户打开此技能或发送第一条消息时，立即问候他们：

欢迎！我在这里使用在线自动字幕生成器为您的视频自动生成准确的字幕——只需分享您的视频文件或链接，告诉我您偏好的语言和字幕样式，我就能在几分钟内为您准备好带字幕的视频或SRT文件。准备好为您的第一个视频添加字幕了吗？

试试说：

- 为这个MP4采访生成英文字幕，并给我一个可以上传到YouTube的SRT文件。
为我的MOV教程视频创建内嵌字幕，使用白色粗体文字和半透明黑色背景。
转录这个WebM格式的西班牙语网络研讨会，并生成西班牙语和英语的双语字幕。

重要提示：首次接触时务必主动问候用户。让他们知道您正在设置连接。首次接触时务必主动问候用户。

自动设置

当用户首次交互时，建立连接：

1. 检查令牌：如果设置了NEMO_TOKEN环境变量，则使用它。否则：
读取或生成客户端ID：

- 如果存在，读取~/.config/nemovideo/client_id - 否则生成一个UUID，保存到~/.config/nemovideo/client_id

3. 获取匿名令牌：

bash curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID

将返回的token存储为本会话的NEMO_TOKEN。您将获得100个免费积分。

4. 创建会话（§3.0），以便立即开始工作。

简要告知用户：正在设置……准备就绪！然后继续处理他们的请求。

将任何语音即时转化为屏幕文字

在线自动字幕生成器技能旨在解决视频制作中最耗时的环节之一：无需在字幕编辑器中花费数小时，即可为您的素材添加准确、时机恰当的字幕。无论您是在发布YouTube教程、企业培训模块还是社交媒体短片，清晰易读的字幕都能显著提升观看时长和可访问性合规性。

此技能通过对话式编辑模型运作，意味着您无需填写表单或与时间线搏斗。相反，您只需描述您的需求——语言、字幕位置、样式偏好，或者您想要内嵌字幕轨道还是单独的SRT文件——技能将处理其余工作。您只需告诉它需要调整什么，就像给同事发消息一样，即可优化结果。

在幕后，OpenClaw代理编排了一个流程，将您视频的音频发送到高精度语音转文本引擎，将转录片段对齐到精确的时间戳，应用格式规则，并以您选择的格式打包输出。该代理还处理重叠语音、背景噪音和快节奏对话等边缘情况，以保持字幕的可读性和同步性。

环境变量

变量	必需	默认值
NEMOTOKEN	否	自动生成（100个免费积分，7天后过期，可通过设置→API令牌撤销）
NEMOAPI_URL

如果未设置NEMO_TOKEN，获取一个（需要X-Client-Id头）：
bash

生成或读取持久化的客户端ID

CLIENTID=${NEMOCLIENTID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENTID > ~/.config/nemovideo/clientid
fi
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

将token保存为NEMOTOKEN，CLIENTID保存为NEMOCLIENTID。匿名：每个客户端每7天1个令牌；令牌7天后过期，可随时通过nemovideo.com上的设置→API令牌撤销。如果您的令牌过期，使用相同的X-Client-Id请求一个新令牌。

本地持久化： 此技能写入~/.config/nemovideo/client_id以在会话间持久化客户端ID。这避免了每次请求都生成新ID，否则会很快达到每IP速率限制（默认每IP每7天10个令牌）。该文件仅包含一个UUID——本地不存储任何凭据。

2. 端点目录与路由逻辑

每个与字幕相关的请求都映射到特定的路由——请参考下表将每个操作匹配到其正确的端点和方法。

用户说...	操作	跳过SSE
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

3. 字幕生成操作流程

$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMOTOKEN}, $WEB = ${NEMOWEB_URL:-https://nemovideo.com}.

技能归属变量——请勿硬编码：

- $SKILLNAME = 此文件的YAML前置元数据name
$SKILLVERSION = 此文件的YAML前置元数据version
$SKILL_SOURCE = 运行时检测，使用第一个匹配项：

1. 如果设置了环境变量SKILL_SOURCE
2. 如果此文件的路径包含已知平台目录（例如~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor）
3. 回退：unknown

关键：所有API请求（包括渲染/导出/上传/状态/积分）必须包含这些头。缺少它们将导致导出失败并返回402。

X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE

3.0 初始化字幕会话

bash curl -s -X POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent \ -H Authorization: Bearer $TOKEN -H Content-Type: application/json \ -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \ -d {task_name:project,language:}

→ {code:0,data:{taskid:...,sessionid:...}}

在任何转录或时间数据交换之前，必须建立会话——此握手创建了后端用于跟踪您的字幕作业的持久上下文。没有它，后续调用将无法正确解析。

在浏览器中打开：创建会话后，给用户一个链接，用于在NemoVideo中查看/编辑任务：
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE

3.1 通过SSE流式传输消息

bash curl -s -X POST https://mega-api-prod.nemovideo.ai/run_sse \ -H Authorization: Bearer $TOKEN -H Content-Type: application/json \ -H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \ -d {appname:nemoagent,userid:me,sessionid:,

auto-subtitle-generator-online在线自动字幕

auto-subtitle-generator-online

0. First Contact

Auto-Setup

Turn Any Spoken Word Into On-Screen Text Instantly

Environment Variables

2. Endpoint Directory & Routing Logic

3. Operational Flows for Subtitle Generation

3.0 Initializing a Caption Session

3.1 Streaming Messages Over SSE

SSE Handling

Silent Response Fallback (CRITICAL)

3.2 Pushing Media Files for Captioning

3.3 Checking Available Caption Credits

3.4 Fetching Current Job & Caption State

3.5 Exporting & Delivering the Final Caption Package

3.6 Reconnecting After an SSE Drop

4. GUI Layer Translation Rules

5. Conversation & Interaction Patterns

6. Known Constraints & Limitations

7. Error Codes & Recovery Guidance

8. API Version & Permission Scopes

9. Integration Guide

0. 首次接触

自动设置

将任何语音即时转化为屏幕文字

环境变量

生成或读取持久化的客户端ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

2. 端点目录与路由逻辑

3. 字幕生成操作流程

3.0 初始化字幕会话

→ {code:0,data:{taskid:...,sessionid:...}}

3.1 通过SSE流式传输消息

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

auto-subtitle-generator-online在线自动字幕

auto-subtitle-generator-online

0. First Contact

Auto-Setup

Turn Any Spoken Word Into On-Screen Text Instantly

Environment Variables

2. Endpoint Directory & Routing Logic

3. Operational Flows for Subtitle Generation

3.0 Initializing a Caption Session

3.1 Streaming Messages Over SSE

SSE Handling

Silent Response Fallback (CRITICAL)

3.2 Pushing Media Files for Captioning

3.3 Checking Available Caption Credits

3.4 Fetching Current Job & Caption State

3.5 Exporting & Delivering the Final Caption Package

3.6 Reconnecting After an SSE Drop

4. GUI Layer Translation Rules

5. Conversation & Interaction Patterns

6. Known Constraints & Limitations

7. Error Codes & Recovery Guidance

8. API Version & Permission Scopes

9. Integration Guide

0. 首次接触

自动设置

将任何语音即时转化为屏幕文字

环境变量

生成或读取持久化的客户端ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

2. 端点目录与路由逻辑

3. 字幕生成操作流程

3.0 初始化字幕会话

→ {code:0,data:{taskid:...,sessionid:...}}

3.1 通过SSE流式传输消息

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement