0. First Contact
When the user opens this skill or sends their first message, greet them immediately:
🎨 Welcome! I can grok ai for you. Share a video file or tell me your idea!
Try saying:
- - "help me create a short video"
- "edit my video"
- "add effects to this clip"
IMPORTANT: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.
Auto-Setup
When the user first interacts, set up the connection:
- 1. Check token: If
NEMO_TOKEN env var is set, use it. Otherwise: - Read or generate Client-ID:
- Read
~/.config/nemovideo/client_id if it exists
- Otherwise generate a UUID, save it to
~/.config/nemovideo/client_id
- 3. Acquire anonymous token:
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
Store the returned
token as
NEMO_TOKEN for this session. You get 100 free credits.
- 4. Create a session (§3.0) so you're ready to work immediately.
Let the user know briefly: "Setting things up… ready!" then proceed with their request.
When Grok's Reasoning Meets Your Raw Footage
Most video tools treat your footage as a passive file to be trimmed or filtered. The grok-ai skill inside ClawHub treats it as something worth actually understanding. By routing your video through Grok's advanced reasoning model, ClawHub can interpret scene context, spoken content, and visual patterns — then act on them with precision edits or structured summaries you actually asked for.
The interaction model here is conversational rather than menu-driven. You describe what you want — 'pull every moment where the speaker changes slides' or 'cut anything under five seconds between transitions' — and the OpenClaw agent translates that intent into a concrete editing pipeline. The OpenClaw agent acts as the coordination layer, dispatching tasks to the grok-ai backend, managing frame analysis, and assembling the final output without requiring you to manually sequence every step.
This makes grok-ai particularly well-suited for content creators, researchers, and product teams who work with dense, information-rich recordings where manual scrubbing wastes hours. The skill is built to handle the formats you already use — mp4, mov, avi, webm, and mkv — so there's no conversion friction before the real work begins.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE5 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE6 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Requests to the Correct Endpoint
Use the following table to determine which endpoint should handle each type of incoming request.
| User says... | Action | Skip SSE |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Primary Operation Flows
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE26 - INLINECODE27 = this file's YAML frontmatter INLINECODE28
- INLINECODE29 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE35
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2
3.0 Establishing a Session
curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
Before any other operations can proceed, a session must be initialized. This session ID anchors all subsequent requests within the same working context.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36
3.1 Delivering Messages Over SSE
curl -s -X POST "$API/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All conversational messages are transmitted to the backend and responses are received through a Server-Sent Events stream.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Wait silently, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Approximately 30% of editing operations complete without returning any text in the SSE stream. When no text content is detected in the response, do not treat this as an error. Instead: (1) check the session state endpoint to confirm the operation finished, (2) retrieve the output asset URL directly, (3) present the result to the user as a successfully completed edit, and (4) skip any prompt asking the user to retry.
Two-stage generation: When a raw video asset is produced, the backend automatically initiates a second processing stage that layers in background music and generates a title overlay — no additional instruction is required. Stage one delivers the unprocessed video; stage two delivers the fully composed version. Always wait for stage two to complete before surfacing the final result to the user.
3.2 Uploading Assets
File upload: INLINECODE39
URL upload: INLINECODE40
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Both video and image file uploads are supported through the designated upload endpoint.
3.3 Checking Available Credits
curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint before initiating any operation that consumes credits to confirm the account has a sufficient balance.
3.4 Polling Operation State
curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK7
3.5 Exporting and Delivering the Final Asset
Export does NOT cost credits. Only generation/editing consumes credits.
Triggering an export does not deduct credits from the account. The export flow proceeds as follows: (a) call the export endpoint with the session ID, (b) poll until the export status returns complete, (c) retrieve the signed download URL from the response, (d) verify the URL is accessible, and (e) present the download link or stream the asset directly to the user.
b) Submit: INLINECODE52
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE55
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: $API/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE60
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Handling SSE Disconnections
When an SSE connection drops unexpectedly, follow these five steps to recover: (1) Wait 2 seconds before attempting to reconnect to avoid hammering the server. (2) Re-open the SSE stream using the original session ID — do not create a new session. (3) Resume listening from the last received event ID by passing it in the Last-Event-ID header. (4) If reconnection fails after three attempts, fall back to polling the session state endpoint at 5-second intervals. (5) Once the operation status confirms completion, retrieve the asset URL and deliver the result to the user as normal.
4. Translating GUI Elements for the Backend
The backend operates under the assumption that a graphical interface is present, so GUI-specific instructions must never be forwarded verbatim — they must be translated into the appropriate API calls.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Interaction Patterns
- - Always confirm a session is active before dispatching any editing request, creating a new one if none exists.
- After submitting a message, listen on the SSE stream and apply the silent response fallback if no text arrives within the expected window.
- Surface credit balance proactively when a user is about to run a credit-intensive operation, so they are not caught off guard by a failure.
- When a user requests an export, remind them that the action is free of charge before proceeding.
- If an operation appears stalled, use the state polling endpoint to get an authoritative status rather than asking the user to retry manually.
6. Known Limitations
- - Session IDs are not persistent across separate browser or app sessions and must be re-established each time.
- The SSE stream does not guarantee a text payload for every completed operation; the silent fallback pattern is mandatory, not optional.
- Credit balances are read-only through the API; top-ups must be handled through the ClawHub dashboard.
- Concurrent editing operations within a single session are not supported; operations must be queued sequentially.
- Asset URLs returned by the export endpoint are time-limited signed URLs and will expire after the period specified in the response.
7. Error Response Handling
The table below maps HTTP status codes and API error codes to their recommended recovery actions.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Required Token Scopes
Before making any calls, confirm the API version in use matches the version this skill was built against by inspecting the version field in the base URL or the response headers. All requests must be authenticated with a bearer token that carries the required scopes: video:read, video:write, session:manage, and export:read. Tokens missing any of these scopes will receive a 403 response, and the user should be directed to regenerate their token from the ClawHub dashboard with the full scope set.
0. 首次接触
当用户打开此技能或发送第一条消息时,立即问候他们:
🎨 欢迎!我可以为你解析 AI 视频。分享一个视频文件或告诉我你的想法!
尝试说:
- - 帮我创建一个短视频
- 编辑我的视频
- 为这个片段添加特效
重要:首次接触时务必主动问候用户。让他们知道你在连接的同时正在设置。首次接触时务必主动问候用户。
自动设置
当用户首次交互时,建立连接:
- 1. 检查令牌:如果设置了 NEMO_TOKEN 环境变量,则使用它。否则:
- 读取或生成客户端 ID:
- 如果存在,读取 ~/.config/nemovideo/client_id
- 否则生成一个 UUID,保存到 ~/.config/nemovideo/client_id
- 3. 获取匿名令牌:
bash
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
将返回的 token 存储为本会话的 NEMO_TOKEN。你将获得 100 个免费积分。
- 4. 创建一个会话(§3.0),以便立即开始工作。
简要告知用户:正在设置…准备就绪!然后继续处理他们的请求。
当 Grok 的推理遇到你的原始素材
大多数视频工具将你的素材视为需要裁剪或过滤的被动文件。ClawHub 中的 grok-ai 技能将其视为值得真正理解的内容。通过将你的视频路由到 Grok 的高级推理模型,ClawHub 可以解释场景上下文、语音内容和视觉模式——然后通过你实际要求的精确编辑或结构化摘要来执行操作。
这里的交互模式是对话式的,而非菜单驱动的。你描述你想要的内容——提取演讲者切换幻灯片的每个时刻或剪掉过渡之间任何少于五秒的内容——然后 OpenClaw 代理将该意图转化为具体的编辑流程。OpenClaw 代理充当协调层,将任务分派给 grok-ai 后端,管理帧分析,并组装最终输出,无需你手动编排每一步。
这使得 grok-ai 特别适合内容创作者、研究人员和产品团队,他们处理密集、信息丰富的录制内容,手动浏览会浪费大量时间。该技能旨在处理你已经使用的格式——mp4、mov、avi、webm 和 mkv——因此在真正工作开始前无需进行格式转换。
环境变量
| 变量 | 必需 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100 个免费积分,7 天后过期,可通过设置 → API 令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的 UUID,持久化到 ~/.config/nemovideo/client_id(仅 UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为 unknown |
如果未设置 NEMO_TOKEN,获取一个(需要 X-Client-Id 头):
bash
生成或读取持久化的客户端 ID
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将 token 保存为 NEMOTOKEN,CLIENTID 保存为 NEMOCLIENTID。匿名:每个客户端每 7 天 1 个令牌;令牌在 7 天后过期,可随时通过 nemovideo.com 上的设置 → API 令牌撤销。如果令牌过期,使用相同的 X-Client-Id 请求一个新令牌。
本地持久化: 此技能写入 ~/.config/nemovideo/client_id 以在会话间持久化客户端 ID。这避免了每次请求都生成新 ID,否则会很快达到每个 IP 的速率限制(默认每个 IP 每 7 天 10 个令牌)。该文件仅包含一个 UUID——本地不存储任何凭据。
2. 将请求路由到正确的端点
使用下表确定哪个端点应处理每种类型的传入请求。
| 用户说... | 操作 | 跳过 SSE |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐…) | → §3.1 SSE | ❌ |
3. 主要操作流程
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai},$TOKEN = ${NEMOTOKEN},$WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能归属变量——请勿硬编码:
- - $SKILLNAME = 此文件的 YAML 前置元数据 name
- $SKILLVERSION = 此文件的 YAML 前置元数据 version
- $SKILL_SOURCE = 运行时检测,使用第一个匹配项:
1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件的路径包含已知平台目录(例如 ~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有 API 请求(包括渲染/导出/上传/状态/积分)必须包含这些头。缺少它们将导致导出失败并返回 402。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 建立会话
bash
curl -s -X POST $API/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在任何其他操作可以继续之前,必须先初始化一个会话。此会话 ID 将同一工作上下文中的所有后续请求锚定在一起。
在浏览器中打开:创建会话后,给用户一个链接,以便在 NemoVideo 中查看/编辑任务:
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过 SSE 传递消息
bash
curl -s -X POST $API/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有对话消息都传输到后端,并通过服务器发送事件流接收响应。
SSE 处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
静默等待,不转发 |
| heartbeat / 空 data: | 继续等待。每 2 分钟:⏳ 仍在工作... |
| 流