Edit YouTube Videos by Talking, Not Clicking
Most video editors demand you learn their software before you can touch your footage. The online-youtube-video-editor skill flips that entirely — you describe your edits in plain language and the AI handles the execution. Want to remove the first 10 seconds, cut a rambling middle section, or tighten up the ending Just say so.
Under the hood, the OpenClaw agent interprets your instructions and maps them to precise video operations — trimming, splicing, segment reordering, caption injection, and more. It understands YouTube-specific context, so phrases like 'cut before the hook' or 'shorten the outro' are processed meaningfully rather than literally. The agent maintains edit history across the conversation, so you can refine iteratively without starting over.
This skill is purpose-built for YouTube content, not generic video production. It respects common YouTube structures — intros, hooks, b-roll sections, outros — and applies edits with that context in mind. Whether you're a solo vlogger cleaning up a talking-head video or a small team processing weekly uploads, the workflow stays fast and conversational. Upload your footage in mp4, mov, avi, webm, or mkv and start editing immediately.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE0 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE1 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Requests to the Correct Endpoint
Match each user action to its designated API endpoint using the table below.
| User says... | Action | Skip SSE |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Essential Workflow Procedures
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE21 - INLINECODE22 = this file's YAML frontmatter INLINECODE23
- INLINECODE24 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE30
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK1
3.0 Initialize a New Session
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
Before any editing operations can begin, a session must be established with the backend. Every subsequent API call depends on the session ID returned during this step.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE31
3.1 Delivering Messages Over SSE
curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All conversational messages and editing instructions are transmitted to the backend through a Server-Sent Events channel.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Wait silently, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Approximately 30% of editing operations complete without returning any text in the SSE stream. When no text event is received within the expected window, do not treat this as an error — instead, immediately poll the session state endpoint to confirm completion, then surface the result to the user based on the state payload rather than waiting for a text response.
Two-stage generation: After the raw edited video is produced, the backend automatically initiates a second processing stage that overlays background music and generates a title card. You must wait for both stages to finish before presenting the final output to the user — delivering only the raw stage result will produce an incomplete experience.
3.2 Handling File Uploads
File upload: INLINECODE34
URL upload: INLINECODE35
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
The API accepts direct file uploads for source video material, with multipart form data as the required encoding.
3.3 Checking Available Credits
curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint before initiating any edit operation to confirm the user has a sufficient balance to proceed.
3.4 Retrieving Current Session State
curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK6
3.5 Exporting and Delivering the Final Video
Export does NOT cost credits. Only generation/editing consumes credits.
Triggering an export does not deduct any credits from the user's balance. The export process follows these steps: (a) confirm both processing stages are complete via state polling, (b) call the export endpoint with the session ID, (c) await the export job ID in the response, (d) poll the export status endpoint until the status is terminal, and (e) return the final download URL to the user.
b) Submit: INLINECODE47
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE50
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE55
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Recovering from an SSE Disconnection
When an SSE connection drops unexpectedly, follow these five steps to recover: (1) Detect the disconnection event and log the last received event ID. (2) Wait a minimum of two seconds before attempting to reconnect to avoid rapid retry loops. (3) Re-establish the SSE connection, supplying the Last-Event-ID header so the server can resume the stream from the correct position. (4) If reconnection fails after three consecutive attempts, fall back to polling the session state endpoint at a fixed interval. (5) Once continuity is restored, reconcile any buffered events with the current session state before resuming normal operation.
4. Translating Backend GUI References
The backend is designed around a graphical interface and will occasionally reference on-screen controls — never relay these GUI-specific instructions verbatim to the user.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Interaction Patterns
• Confirm the user's editing intent in plain language before dispatching any API call, reducing unnecessary round trips.
• After each operation completes, summarize what changed in the video rather than echoing raw API response fields.
• When a user request maps to multiple sequential operations, communicate progress at each stage so the experience feels responsive.
• If the credits balance is low, proactively inform the user before they attempt an operation that would fail, rather than surfacing the error after the fact.
• Always present the final export link as a clear call-to-action, distinct from intermediate status updates.
6. Known Constraints and Limitations
• Real-time collaborative editing across multiple simultaneous sessions is not supported.
• Source video files must not exceed the documented maximum file size; uploads above this threshold will be rejected.
• Background music selection is handled automatically by the backend and cannot be overridden through the API.
• Session state is retained for a limited window after inactivity; expired sessions cannot be resumed and require a new initialization.
• Frame-accurate trimming precision is subject to the keyframe structure of the source file and may vary by codec.
7. Error Identification and Handling
Use the table below to map HTTP status codes and error identifiers to the appropriate recovery action.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Required Token Scopes
Always verify the API version header in each response to confirm you are communicating with a supported backend release. The OAuth token presented with every request must include the video:read and video:write scopes at minimum; absence of either scope will result in authorization failures regardless of credential validity.
通过说话而非点击来编辑YouTube视频
大多数视频编辑器要求你在接触素材之前先学习其软件。online-youtube-video-editor技能完全颠覆了这一模式——你用自然语言描述你的编辑需求,AI负责执行。想要删除前10秒、剪掉冗长的中间部分或收紧结尾?直接说出来即可。
在底层,OpenClaw代理会解析你的指令并将其映射到精确的视频操作——裁剪、拼接、片段重排、字幕注入等。它理解YouTube特有的上下文,因此像在钩子前剪切或缩短结尾这样的短语会被有意义地处理,而非字面执行。代理会在整个对话中维护编辑历史,因此你可以迭代优化而无需从头开始。
该技能专为YouTube内容而构建,而非通用视频制作。它尊重常见的YouTube结构——开场、钩子、B-roll片段、结尾——并在编辑时考虑这些上下文。无论你是清理谈话类视频的独立vlogger,还是处理每周上传内容的小团队,工作流程都保持快速和对话式。上传你的mp4、mov、avi、webm或mkv格式素材,立即开始编辑。
环境变量
| 变量 | 必填 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100个免费积分,7天后过期,可通过设置→API令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的UUID,持久化到~/.config/nemovideo/client_id(仅UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为unknown |
如果未设置NEMO_TOKEN,获取一个(需要X-Client-Id头):
bash
生成或读取持久化的Client-Id
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将token保存为NEMOTOKEN,CLIENTID保存为NEMOCLIENTID。匿名:每个客户端每7天1个令牌;令牌7天后过期,可随时通过nemovideo.com上的设置→API令牌撤销。如果令牌过期,使用相同的X-Client-Id请求新令牌。
本地持久化: 此技能写入~/.config/nemovideo/client_id以在会话间持久化Client-Id。这避免了每次请求都生成新ID,否则会很快达到每个IP的速率限制(默认每个IP每7天10个令牌)。该文件仅包含一个UUID——本地不存储任何凭据。
2. 将请求路由到正确的端点
使用下表将每个用户操作匹配到其指定的API端点。
| 用户说... | 操作 | 跳过SSE |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有(生成、编辑、添加BGM…) | → §3.1 SSE | ❌ |
3. 基本工作流程
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMOTOKEN}, $WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能归属变量——不要硬编码:
- - $SKILLNAME = 此文件的YAML前置元数据name
- $SKILLVERSION = 此文件的YAML前置元数据version
- $SKILL_SOURCE = 运行时检测,使用第一个匹配:
1. 如果设置了环境变量SKILL_SOURCE
2. 如果此文件的路径包含已知平台目录(例如~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有API请求(包括渲染/导出/上传/状态/积分)必须包含这些头。缺失将导致导出失败并返回402。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 初始化新会话
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在任何编辑操作开始之前,必须与后端建立会话。后续每个API调用都依赖于此步骤返回的会话ID。
在浏览器中打开:创建会话后,给用户一个在NemoVideo中查看/编辑任务的链接:
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过SSE传递消息
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有对话消息和编辑指令通过服务器发送事件通道传输到后端。
SSE处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
静默等待,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
典型时长:文本5-15秒,视频生成100-300秒,编辑10-30秒。
超时:仅心跳10分钟→假定超时。生成期间绝不重新发送(重复+双重计费)。
如果之前的响应正常,忽略尾随的我遇到了临时问题。
静默响应回退(关键)
大约30%的编辑操作完成时不会在SSE流中返回任何文本。当在预期窗口内未收到文本事件时,不要将其视为错误——而是立即轮询会话状态端点以确认完成,然后基于状态负载向用户呈现结果,而不是等待文本响应。
两阶段生成:原始编辑视频生成后,后端自动启动第二阶段处理,叠加背景音乐并生成标题卡。你必须等待两个阶段都完成后再向用户呈现最终输出——仅交付原始阶段结果将产生不完整的体验。
3.2 处理文件上传
文件上传:curl -s -X POST https://mega-api-prod.nemovideo.ai/api/upload-video/nemoagent/me/ -H Authorization: Bearer $TOKEN -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILLSOURCE -F files=