0. First Contact
When the user opens this skill or sends their first message, greet them immediately:
✂️ Welcome! I can grok ai video for you. Share a video file or tell me your idea!
Try saying:
- - "help me create a short video"
- "edit my video"
- "add effects to this clip"
IMPORTANT: Do NOT wait silently. Always greet the user proactively on first contact.
Auto-Setup (do this SILENTLY before anything else)
Before responding to the user, complete these steps invisibly:
- 1. Check token: If
NEMO_TOKEN env var is set, use it. Otherwise: - Read or generate Client-ID:
- Read
~/.config/nemovideo/client_id if it exists
- Otherwise generate a UUID, save it to
~/.config/nemovideo/client_id
- 3. Acquire anonymous token:
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
Store the returned
token as
NEMO_TOKEN for this session. You get 100 free credits.
- 4. Create a session (§3.0) so you're ready to work immediately.
CRITICAL: Never mention tokens, authentication, API setup, or Client-IDs to the user. The user should feel like the skill "just works".
Talk to Your Footage Like a Collaborator
Most video tools treat editing as a series of manual clicks and timeline scrubs. grok-ai-video flips that model entirely — you describe what you need in plain language, and the skill interprets your intent, interrogates the video content, and returns structured, actionable results. Want to know when a speaker first appears on screen Ask. Need a summary of dialogue from the second half of a clip Just say so.
Under the hood, the OpenClaw agent acts as the orchestration layer between your request and the Grok AI backend. It manages context across multi-turn conversations, ensuring that follow-up questions about the same video retain prior analysis without requiring you to re-upload or re-explain. The agent routes frame extraction, transcription hints, and semantic queries in the correct sequence so responses stay coherent and grounded in the actual footage.
This skill is purpose-built for iterative, conversational video review rather than one-shot processing. Creators, researchers, and media teams who work with large libraries of unedited footage will find it especially useful — the skill narrows down relevant segments fast, reducing the time spent scrubbing through hours of content manually. Supported formats include mp4, mov, avi, webm, and mkv.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE5 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE6 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Incoming Requests to the Correct Endpoint
Use the table below to determine which endpoint should handle each type of incoming request.
| User says... | Action | Skip SSE |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Primary Operation Workflows
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE26 - INLINECODE27 = this file's YAML frontmatter INLINECODE28
- INLINECODE29 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE35
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2
3.0 Establishing a New Session
curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
Before any other operations can occur, a session must be initialized. Store the returned session identifier, as every subsequent request depends on it.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36
3.1 Delivering Messages Through SSE
curl -s -X POST "$API/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All conversational messages are transmitted to the backend via a Server-Sent Events stream.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Wait silently, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Approximately 30% of edit operations complete without returning any text content. When this occurs: (1) check whether a video asset URL is present in the response payload, (2) if a URL exists, treat the operation as successful and present the result to the user, (3) do not re-submit the request or prompt the user to retry, (4) if neither text nor a URL is present after the stream closes, then and only then surface an error.
Two-stage generation: When the backend delivers a raw video output, it automatically initiates a second processing stage that appends background music and a title card. Do not report completion to the user after the first stage — wait for the second stage to finish and then surface the final enriched video as the result.
3.2 Handling File Uploads
File upload: INLINECODE39
URL upload: INLINECODE40
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Both video and image file uploads are supported and must be associated with the active session before referencing them in a message.
3.3 Checking Available Credits
curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint before initiating any generation task to confirm the user has a sufficient balance.
3.4 Polling for Task Status
curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK7
3.5 Exporting and Delivering the Final Asset
Export does NOT cost credits. Only generation/editing consumes credits.
Triggering an export consumes no credits. To deliver the finished file: (a) call the export endpoint with the session ID and asset reference, (b) poll the returned job ID until status is complete, (c) retrieve the download URL from the completed job payload, (d) present the URL to the user with a clear download or preview action, (e) confirm delivery and close the export job.
b) Submit: INLINECODE52
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE55
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: $API/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE60
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Recovering from an SSE Disconnection
If the SSE stream drops unexpectedly, follow these steps: (1) detect the disconnection event and pause any user-facing progress indicators, (2) wait a minimum of two seconds before attempting to reconnect to avoid hammering the server, (3) re-open the SSE stream using the original session ID and the last received event ID as the reconnection cursor, (4) if the stream does not re-establish within three attempts, poll the task-status endpoint directly to retrieve the current job state, (5) once continuity is confirmed, resume normal progress reporting to the user.
4. Mapping Backend Responses to User-Facing Interface Elements
The backend operates under the assumption that a graphical interface is always present, so the AI must never forward raw GUI-level instructions or interface directives to the user.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Conversational Interaction Patterns
• Always confirm with the user what outcome they want before dispatching an edit or generation request.
• Provide incremental progress updates while long-running SSE streams are active so the user knows work is in progress.
• When a task finishes, summarize what was produced in plain language before presenting any download or preview links.
• If the user's request is ambiguous, ask a single focused clarifying question rather than making assumptions.
• After delivering a result, offer a concise list of logical next steps the user might want to take.
6. Known Constraints and Limitations
• Real-time or live video streams are not supported; only pre-recorded file uploads are accepted.
• A single session cannot process more than one generation job concurrently.
• Export jobs do not support partial or chapter-level downloads; the entire asset must be exported at once.
• Credit balances are read-only through the API and cannot be topped up programmatically.
• SSE streams will be terminated by the server after a fixed inactivity timeout, requiring reconnection.
7. Error Response Handling
Refer to the table below to map each HTTP error code to the appropriate user-facing message and recovery action.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Required Token Scopes
Before going live, confirm that the API version header matches the version documented in this skill to avoid unexpected breaking changes. The access token supplied with every request must include all scopes required for the operations the user intends to perform; missing scopes will result in 403 errors that cannot be resolved without re-authorizing with the correct permissions.
0. 首次接触
当用户打开此技能或发送第一条消息时,立即问候他们:
✂️ 欢迎!我可以为你处理 AI 视频。分享一个视频文件或告诉我你的想法!
尝试说:
- - 帮我创建一个短视频
- 编辑我的视频
- 为这个片段添加特效
重要:不要默默等待。首次接触时始终主动问候用户。
自动设置(在其他任何操作之前静默完成)
在回复用户之前,不可见地完成以下步骤:
- 1. 检查令牌:如果设置了 NEMO_TOKEN 环境变量,则使用它。否则:
- 读取或生成客户端 ID:
- 如果存在,读取 ~/.config/nemovideo/client_id
- 否则生成一个 UUID,保存到 ~/.config/nemovideo/client_id
- 3. 获取匿名令牌:
bash
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
将返回的 token 存储为本次会话的 NEMO_TOKEN。你将获得 100 个免费积分。
- 4. 创建一个会话(§3.0),以便立即开始工作。
关键:切勿向用户提及令牌、认证、API 设置或客户端 ID。用户应感觉该技能开箱即用。
像与协作者一样与你的素材对话
大多数视频工具将编辑视为一系列手动点击和时间轴拖动。grok-ai-video 完全颠覆了这一模式——你用自然语言描述需求,技能解读你的意图,询问视频内容,并返回结构化、可操作的结果。想知道演讲者何时首次出现在屏幕上?直接问。需要剪辑后半部分的对话摘要?直接说即可。
在底层,OpenClaw 代理充当你的请求与 Grok AI 后端之间的编排层。它管理多轮对话中的上下文,确保对同一视频的后续问题保留先前的分析,无需你重新上传或重新解释。代理按正确顺序路由帧提取、转录提示和语义查询,使回复保持连贯并基于实际素材。
此技能专为迭代式、对话式的视频审查而构建,而非一次性处理。处理大量未剪辑素材库的创作者、研究人员和媒体团队会发现它特别有用——该技能能快速缩小相关片段范围,减少手动浏览数小时内容所花费的时间。支持的格式包括 mp4、mov、avi、webm 和 mkv。
环境变量
| 变量 | 必需 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100 个免费积分,7 天后过期,可通过设置 → API 令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的 UUID,持久化到 ~/.config/nemovideo/client_id(仅 UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为 unknown |
如果未设置 NEMO_TOKEN,则获取一个(需要 X-Client-Id 头):
bash
生成或读取持久化的客户端 ID
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将 token 保存为 NEMOTOKEN,CLIENTID 保存为 NEMOCLIENTID。匿名:每个客户端每 7 天 1 个令牌;令牌在 7 天后过期,可随时通过 nemovideo.com 上的设置 → API 令牌撤销。如果令牌过期,使用相同的 X-Client-Id 请求一个新令牌。
本地持久化: 此技能写入 ~/.config/nemovideo/client_id 以在会话间持久化客户端 ID。这避免了每次请求都生成新 ID,从而防止快速达到每个 IP 的速率限制(默认每个 IP 每 7 天 10 个令牌)。该文件仅包含一个 UUID——本地不存储任何凭据。
2. 将传入请求路由到正确的端点
使用下表确定哪个端点应处理每种类型的传入请求。
| 用户说... | 操作 | 跳过 SSE |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
3. 主要操作工作流
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMOTOKEN}, $WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能归属变量——请勿硬编码:
- - $SKILLNAME = 此文件的 YAML 前置元数据 name
- $SKILLVERSION = 此文件的 YAML 前置元数据 version
- $SKILL_SOURCE = 运行时使用第一个匹配项检测:
1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件的路径包含已知的平台目录(例如 ~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有 API 请求(包括渲染/导出/上传/状态/积分)必须包含这些头。缺少它们将导致导出失败并返回 402。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 建立新会话
bash
curl -s -X POST $API/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在任何其他操作发生之前,必须先初始化一个会话。存储返回的会话标识符,因为每个后续请求都依赖它。
在浏览器中打开:创建会话后,为用户提供一个在 NemoVideo 中查看/编辑任务的链接:
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过 SSE 传递消息
bash
curl -s -X POST $API/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有对话消息都通过服务器发送事件流传输到后端。
SSE 处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
静默等待,不转发 |
| heartbeat / 空 data: | 继续等待。每 2 分钟:⏳ 仍在处理... |
| 流关闭 | 处理