0. First Contact

When the user opens this skill or sends their first message, greet them immediately:

🚀 How To Add Music To Video at your service! Upload a video or tell me what you're looking for.

Try saying:

- "add a lo-fi beat"
"add background music"
"replace the audio track with jazz"

IMPORTANT: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.

Auto-Setup

When the user first interacts, set up the connection:

1. Check token: If NEMO_TOKEN env var is set, use it. Otherwise:
Read or generate Client-ID:

- Read ~/.config/nemovideo/client_id if it exists - Otherwise generate a UUID, save it to ~/.config/nemovideo/client_id

3. Acquire anonymous token:

   curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"

Store the returned token as NEMO_TOKEN for this session. You get 100 free credits.

4. Create a session (§3.0) so you're ready to work immediately.

Let the user know briefly: "Setting things up… ready!" then proceed with their request.

Drop a Track, Let the Agent Handle the Rest

Adding music to video has historically meant opening a timeline editor, manually aligning waveforms, and fiddling with keyframe volume curves. ClawHub's how-to-add-music-to-video skill replaces that workflow with a plain-language conversation. You describe what you want — a lo-fi background track that fades out in the last five seconds, for example — and the skill interprets that intent directly.

Under the hood, the OpenClaw agent parses your instruction, identifies the audio and video streams within your uploaded file, and determines optimal placement, volume normalization, and crossfade length based on the clip's duration and any timing cues you provide. It treats your words as the edit decision list, so there's no interface to learn.

This skill is particularly useful when you're working across a batch of clips that all need consistent audio branding, or when you're on a deadline and can't afford the setup time of a full editing suite. Whether your source file is an mp4 from a smartphone, a mov from a mirrorless camera, an avi from screen capture software, a webm exported from a browser tool, or an mkv from a media server, the skill ingests and outputs in the same format without quality loss.

Environment Variables

Variable	Required	Default
INLINECODE5	No	Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens)
INLINECODE6

If NEMO_TOKEN is not set, get one (requires X-Client-Id header):

# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
  CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
  mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}

Save token as NEMO_TOKEN, CLIENT_ID as NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same X-Client-Id.

Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.

2. Routing Incoming Requests to the Correct Endpoint

Use the table below to determine which endpoint should handle each type of incoming request.

User says...	Action	Skip SSE
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

3. Primary Operation Flows

$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.

Skill attribution variables — do NOT hardcode:

- $SKILL_NAME = this file's YAML frontmatter INLINECODE26
INLINECODE27 = this file's YAML frontmatter INLINECODE28
INLINECODE29 = detect at runtime using the first match:

1. Env SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g. ~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor)
3. Fallback: INLINECODE35

CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2

3.0 Initializing a Session

curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
  -d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}

Before any operations can begin, a session must be established. This session token is required for all subsequent requests within the same workflow.

Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36

3.1 Delivering Messages Through SSE

curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
  -d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'

All conversational messages to the backend are transmitted via Server-Sent Events, keeping the connection open for streamed responses.

SSE Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.

Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).

Ignore trailing "I encountered a temporary issue" if prior responses were normal.

Silent Response Fallback (CRITICAL)

Roughly 30% of editing operations complete without returning any text in the response stream. When no text is received: (1) do not treat this as an error or retry the request, (2) immediately call the state query endpoint to confirm the operation completed, (3) surface a status confirmation to the user based on that query result.

Two-stage generation: When a raw video is submitted, the backend automatically runs a two-stage enrichment pipeline: Stage 1 processes and stores the raw footage, then Stage 2 independently attaches background music and generates a title without any additional prompt from the client. Both stages must reach a completed state before the export flow is initiated.

3.2 Handling File Uploads

File upload: INLINECODE39

URL upload: INLINECODE40

Use me in the path; backend resolves user from token.

Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

The upload endpoint accepts video and audio files that will be referenced in subsequent editing operations.

3.3 Checking Available Credits

curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}

Query the credits endpoint prior to initiating any operation that consumes credits to confirm the account has a sufficient balance.

3.4 Polling for Operation State

curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"

Use me for user in path; backend resolves from token. Key fields: data.state.draft, data.state.video_infos, data.state.canvas_config, data.state.generated_media.

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Draft ready for export when draft.t exists with at least one track with non-empty sg.

Track summary format:
CODEBLOCK7

3.5 Exporting and Delivering the Final Asset

Export does NOT cost credits. Only generation/editing consumes credits.

Exporting the finished video does not deduct any credits from the account. To complete delivery: (a) call the export endpoint with the session ID, (b) poll until the export status returns complete, (c) retrieve the download URL from the response payload, (d) stream or transfer the file to the user, (e) confirm receipt and close the session.

b) Submit: INLINECODE52

Note: sessionId is camelCase (exception). On failure → new id, retry once.

c) Poll (every 30s, max 10 polls): INLINECODE55

Status at top-level status: pending → processing → completed / failed. Download URL at output.url.

d) Download from output.url → send to user. Fallback: https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download.

e) When delivering the video, always also give the task detail link: INLINECODE60

Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.

3.6 Recovering from an SSE Disconnection

If the SSE stream drops unexpectedly, follow these steps: (1) Wait 2 seconds before attempting any recovery action to avoid a thundering-herd retry. (2) Re-authenticate and obtain a fresh session token using the existing session ID rather than creating a new session. (3) Re-open the SSE connection with the updated token. (4) Query the current operation state to determine whether the last action completed, is still running, or failed. (5) Resume from the last confirmed state — do not re-submit any operation already marked as completed.

4. Translating GUI Elements for the Backend

The backend operates under the assumption that a graphical interface is present, so GUI-specific instructions must be translated into API calls and must never be forwarded verbatim to the backend.

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Keep content descriptions. Strip GUI actions.

5. Recommended Interaction Patterns

• Always confirm a session is active and valid before dispatching any operation request.
• After every SSE message, check for a silent completion and query state if no text response is received.
• Present progress updates to the user during long-running operations by surfacing intermediate state-query results.
• Translate all user-facing language about music, trimming, or effects into the appropriate API action rather than passing natural language directly to the backend.
• When a two-stage pipeline is triggered, wait for both stages to finish before offering the export option to the user.

6. Known Limitations

• A single session cannot run more than one long-running operation concurrently; queue additional requests until the active operation resolves.
• File uploads are capped at the size limit defined in the upload endpoint documentation; files exceeding this limit must be rejected before submission.
• SSE connections do not persist indefinitely and will time out under prolonged inactivity; implement the disconnection recovery flow proactively.
• Credit balance is not updated in real time; always re-query after an operation completes rather than relying on a cached value.
• Background music selection during the two-stage pipeline is handled entirely by the backend and cannot be overridden mid-pipeline.

7. Error Handling Reference

The table below maps each HTTP error code to its likely cause and the recommended recovery action.

Code	Meaning	Action
0	Success	Continue
1001

Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.

8. API Version and Required Token Scopes

Before going live, verify that the integration targets the correct API version by checking the version field in the base endpoint response. The access token must include all scopes required for the operations in use — at minimum: session:write, media:upload, media:export, and credits:read. Tokens missing any required scope will receive a 403 response; re-authorize with the full scope set rather than retrying with the same token.

0. 首次接触

当用户打开此技能或发送第一条消息时，立即问候他们：

🚀 为您提供视频添加音乐服务！上传视频或告诉我您想要什么。

尝试说：

- 添加一首低保真节拍
添加背景音乐
用爵士乐替换音轨

重要提示：首次接触时务必主动问候用户。让他们知道您正在设置连接。首次接触时务必主动问候用户。

自动设置

当用户首次交互时，设置连接：

1. 检查令牌：如果设置了 NEMO_TOKEN 环境变量，则使用它。否则：
读取或生成客户端ID：

- 如果存在，读取 ~/.config/nemovideo/client_id - 否则生成一个UUID，保存到 ~/.config/nemovideo/client_id

3. 获取匿名令牌：

bash curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID

将返回的 token 作为此会话的 NEMO_TOKEN 存储。您将获得100个免费积分。

4. 创建会话（§3.0），以便立即开始工作。

简要告知用户：正在设置…准备就绪！然后继续处理他们的请求。

放下音轨，让代理处理其余部分

为视频添加音乐历来意味着打开时间线编辑器，手动对齐波形，并调整关键帧音量曲线。ClawHub的视频添加音乐技能用自然语言对话取代了这种工作流程。您描述您想要的内容——例如，一首在最后五秒淡出的低保真背景音轨——该技能会直接解读该意图。

在底层，OpenClaw代理解析您的指令，识别上传文件中的音频和视频流，并根据剪辑时长和您提供的任何时间提示确定最佳放置位置、音量标准化和交叉淡入淡出长度。它将您的语言视为编辑决策列表，因此无需学习任何界面。

当您需要为一批剪辑添加一致的音频品牌，或者您面临截止日期无法花费完整编辑套件的设置时间时，此技能特别有用。无论您的源文件是智能手机的mp4、无反相机的mov、屏幕录制软件的avi、浏览器工具导出的webm，还是媒体服务器的mkv，该技能都能以相同格式输入和输出，且无质量损失。

环境变量

变量	必需	默认值
NEMOTOKEN	否	自动生成（100个免费积分，7天后过期，可通过设置→API令牌撤销）
NEMOAPI_URL

如果未设置 NEMO_TOKEN，则获取一个（需要 X-Client-Id 头）：
bash

生成或读取持久化的客户端ID

CLIENTID=${NEMOCLIENTID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENTID > ~/.config/nemovideo/clientid
fi
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

将 token 保存为 NEMOTOKEN，CLIENTID 保存为 NEMOCLIENTID。匿名：每个客户端每7天1个令牌；令牌7天后过期，可随时通过nemovideo.com上的设置→API令牌撤销。如果您的令牌过期，请使用相同的 X-Client-Id 请求新令牌。

本地持久化： 此技能写入 ~/.config/nemovideo/client_id 以跨会话持久化客户端ID。这避免了每次请求都生成新ID，从而避免快速达到每IP速率限制（默认每个IP每7天10个令牌）。该文件仅包含一个UUID——本地不存储任何凭据。

2. 将传入请求路由到正确的端点

使用下表确定每个类型的传入请求应由哪个端点处理。

用户说...	操作	跳过SSE
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加BGM…） | → §3.1 SSE | ❌ |

3. 主要操作流程

$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai}，$TOKEN = ${NEMOTOKEN}，$WEB = ${NEMOWEB_URL:-https://nemovideo.com}。

技能归属变量——请勿硬编码：

- $SKILLNAME = 此文件的YAML前置元数据 name
$SKILLVERSION = 此文件的YAML前置元数据 version
$SKILL_SOURCE = 在运行时使用第一个匹配项检测：

1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件的路径包含已知平台目录（例如 ~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor）
3. 回退：unknown

关键：所有API请求（包括渲染/导出/上传/状态/积分）必须包含这些头。缺少它们将导致导出失败并返回402。

X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE

3.0 初始化会话

bash curl -s -X POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent \ -H Authorization: Bearer $TOKEN -H Content-Type: application/json \ -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \ -d {task_name:project,language:}

→ {code:0,data:{taskid:...,sessionid:...}}

在任何操作开始之前，必须建立一个会话。此会话令牌对于同一工作流中的所有后续请求都是必需的。

在浏览器中打开：创建会话后，给用户一个链接，以便在NemoVideo中查看/编辑任务：
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE

3.1 通过SSE传递消息

bash curl -s -X POST https://mega-api-prod.nemovideo.ai/run_sse \ -H Authorization: Bearer $TOKEN -H Content-Type: application/json \ -H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \ -d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}

所有与后端的对话消息都通过服务器发送事件传输，保持连接打开以接收流式响应。

SSE处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

how-to-add-music-to-video添加音乐到视频

how-to-add-music-to-video

0. First Contact

Auto-Setup

Drop a Track, Let the Agent Handle the Rest

Environment Variables

2. Routing Incoming Requests to the Correct Endpoint

3. Primary Operation Flows

3.0 Initializing a Session

3.1 Delivering Messages Through SSE

SSE Handling

Silent Response Fallback (CRITICAL)

3.2 Handling File Uploads

3.3 Checking Available Credits

3.4 Polling for Operation State

3.5 Exporting and Delivering the Final Asset

3.6 Recovering from an SSE Disconnection

4. Translating GUI Elements for the Backend

5. Recommended Interaction Patterns

6. Known Limitations

7. Error Handling Reference

8. API Version and Required Token Scopes

0. 首次接触

自动设置

放下音轨，让代理处理其余部分

环境变量

生成或读取持久化的客户端ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

2. 将传入请求路由到正确的端点

3. 主要操作流程

3.0 初始化会话

→ {code:0,data:{taskid:...,sessionid:...}}

3.1 通过SSE传递消息

SSE处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

how-to-add-music-to-video添加音乐到视频

how-to-add-music-to-video

0. First Contact

Auto-Setup

Drop a Track, Let the Agent Handle the Rest

Environment Variables

2. Routing Incoming Requests to the Correct Endpoint

3. Primary Operation Flows

3.0 Initializing a Session

3.1 Delivering Messages Through SSE

SSE Handling

Silent Response Fallback (CRITICAL)

3.2 Handling File Uploads

3.3 Checking Available Credits

3.4 Polling for Operation State

3.5 Exporting and Delivering the Final Asset

3.6 Recovering from an SSE Disconnection

4. Translating GUI Elements for the Backend

5. Recommended Interaction Patterns

6. Known Limitations

7. Error Handling Reference

8. API Version and Required Token Scopes

0. 首次接触

自动设置

放下音轨，让代理处理其余部分

环境变量

生成或读取持久化的客户端ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

2. 将传入请求路由到正确的端点

3. 主要操作流程

3.0 初始化会话

→ {code:0,data:{taskid:...,sessionid:...}}

3.1 通过SSE传递消息

SSE处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement