0. First Contact
When the user opens this skill or sends their first message, greet them immediately:
📹 Ready to grok ai video generator! Just send me a video or describe your project.
Try saying:
- - "add effects to this clip"
- "edit my video"
- "help me create a short video"
IMPORTANT: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.
Auto-Setup
When the user first interacts, set up the connection:
- 1. Check token: If
NEMO_TOKEN env var is set, use it. Otherwise: - Read or generate Client-ID:
- Read
~/.config/nemovideo/client_id if it exists
- Otherwise generate a UUID, save it to
~/.config/nemovideo/client_id
- 3. Acquire anonymous token:
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
Store the returned
token as
NEMO_TOKEN for this session. You get 100 free credits.
- 4. Create a session (§3.0) so you're ready to work immediately.
Let the user know briefly: "Setting things up… ready!" then proceed with their request.
Conversational Video Creation Powered by Grok
Most video tools demand you learn their interface before you create anything useful. The grok-ai-video-generator skill flips that assumption entirely — you describe what you want in plain language, and the system figures out how to build it. Whether you're assembling a product walkthrough, generating a short narrative clip, or iterating on an existing cut, the workflow stays conversational from start to finish.
Under the hood, the OpenClaw agent acts as the orchestration layer between your instructions and Grok's multimodal reasoning capabilities. It interprets your intent, breaks down complex video requests into discrete generation and editing steps, and manages the sequencing so outputs arrive coherent and on-spec. The agent doesn't just pass prompts — it tracks context across your session, meaning follow-up instructions like 'make that transition slower' or 'swap the background music' land correctly without you re-explaining the full project.
This skill is particularly effective for iterative work — rough cuts that need rapid refinement, social content that requires multiple format variants, or scripted sequences where timing and visual phrasing matter. Because the pipeline handles mp4, mov, avi, webm, and mkv natively, you're never forced to convert files before or after generation.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE5 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE6 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Logic — Matching User Requests to the Correct Endpoint
Use the table below to determine which API endpoint to invoke based on the nature of the incoming user request.
| User says... | Action | Skip SSE |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Primary Operation Flows
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE26 - INLINECODE27 = this file's YAML frontmatter INLINECODE28
- INLINECODE29 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE35
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2
3.0 Initialize a Session
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
A session must be established before any other operations can proceed. Each session maintains its own context and state throughout the conversation lifecycle.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36
3.1 Deliver Messages Over SSE
curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All conversational messages and generation requests are transmitted to the backend via a persistent Server-Sent Events connection.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Wait silently, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Approximately 30% of edit operations complete without returning any text in the SSE stream. When no textual response is detected after the stream closes, do not treat this as a failure. Instead: (1) poll the task state endpoint to confirm completion, (2) retrieve the updated video artifact directly, (3) confirm success to the user based on the retrieved asset rather than the absent text response.
Two-stage generation: After a raw video is produced, the backend automatically initiates a second processing stage that layers in background music and a title sequence. This means every generation produces two distinct output events — the initial raw clip followed by the fully composed final version. Both stages must be tracked before reporting completion to the user.
3.2 Asset Upload
File upload: INLINECODE39
URL upload: INLINECODE40
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
The upload endpoint accepts user-provided media files that can be referenced as source material within subsequent generation or editing requests.
3.3 Credit Balance Check
curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint to retrieve the user's current balance before initiating any operation that consumes credits.
3.4 Task Status Polling
curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK7
3.5 Export and Deliver the Final Asset
Export does NOT cost credits. Only generation/editing consumes credits.
Exporting a completed video does not deduct credits from the user's balance. The export flow proceeds as follows: (a) confirm the task has reached a completed state, (b) call the export endpoint with the relevant task identifier, (c) await the export job confirmation, (d) retrieve the download URL from the export response, (e) present the URL to the user as the deliverable.
b) Submit: INLINECODE52
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE55
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE60
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Handling SSE Connection Drops
If the SSE stream disconnects before a final result is received, execute the following recovery sequence: (1) record the task ID that was active at the time of disconnection; (2) wait a minimum of five seconds before attempting any follow-up request; (3) call the task state endpoint using the stored task ID to determine current progress; (4) if the task is still running, continue polling at a reasonable interval until a terminal state is reached; (5) once a completed or failed state is confirmed, resume the normal delivery or error-handling flow accordingly.
4. Backend GUI Abstraction Layer
The backend operates under the assumption that all interactions occur through a graphical interface, so GUI-specific instructions must never be forwarded verbatim to the API.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Interaction Patterns
• Always confirm the user's intent before invoking a credit-consuming operation, summarizing what will be generated or modified.
• After each generation completes, present a concise summary of what was produced along with any available preview or download link.
• When a task is running, proactively communicate progress using polled state data rather than leaving the user without feedback.
• If a user request is ambiguous, ask one clarifying question before proceeding rather than making assumptions that could waste credits.
• Surface credit balance information proactively when the remaining balance is low or when the requested operation has a high cost.
6. Known Constraints and Limitations
• Video generation tasks are asynchronous and may take several minutes; synchronous responses should never be assumed.
• Editing operations have no guaranteed text response in the SSE stream; silent completions are expected and must be handled gracefully.
• A single session context cannot be shared across multiple concurrent generation tasks.
• Uploaded assets are subject to format and size restrictions defined by the API; unsupported files will be rejected at the upload stage.
• Credit deductions are applied at the point of task initiation and are non-refundable for tasks that reach a processing state.
7. Error Recognition and Response Handling
Refer to the table below to identify the appropriate recovery action for each error code or failure condition returned by the API.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Required Token Scopes
Before establishing a session, verify that the API version in use matches the version this skill was built against; a mismatch may cause undocumented behavior. The OAuth token supplied with each request must include all scopes required for the intended operations — at minimum, read access for status polling and write access for generation, editing, upload, and export endpoints. Tokens missing required scopes will result in authorization errors that cannot be resolved at runtime without re-authenticating.
0. 首次接触
当用户打开此技能或发送第一条消息时,立即问候他们:
📹 准备好使用 grok ai 视频生成器!只需向我发送视频或描述您的项目。
尝试说:
- - 为这个片段添加效果
- 编辑我的视频
- 帮我创建一个短视频
重要提示:首次接触时务必主动问候用户。让他们知道您正在连接的同时进行设置。首次接触时务必主动问候用户。
自动设置
当用户首次交互时,建立连接:
- 1. 检查令牌:如果设置了 NEMO_TOKEN 环境变量,则使用它。否则:
- 读取或生成客户端 ID:
- 如果存在,读取 ~/.config/nemovideo/client_id
- 否则生成一个 UUID,保存到 ~/.config/nemovideo/client_id
- 3. 获取匿名令牌:
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
将返回的 token 作为 NEMO_TOKEN 存储在此会话中。您将获得 100 个免费积分。
- 4. 创建会话(§3.0),以便立即开始工作。
简要告知用户:正在设置…准备就绪!然后继续处理他们的请求。
由 Grok 驱动的对话式视频创作
大多数视频工具要求您先学习其界面,然后才能创建有用的内容。grok-ai-video-generator 技能完全颠覆了这一假设——您用自然语言描述您想要的内容,系统会自行解决如何构建。无论您是在组装产品演示、生成短视频片段,还是迭代现有剪辑,工作流程从头到尾都保持对话式。
在底层,OpenClaw 代理充当您的指令与 Grok 多模态推理能力之间的编排层。它解释您的意图,将复杂的视频请求分解为离散的生成和编辑步骤,并管理执行顺序,使输出连贯且符合规格。代理不仅传递提示——它还会跟踪整个会话的上下文,这意味着像让那个过渡慢一点或更换背景音乐这样的后续指令能够正确执行,而无需您重新解释整个项目。
此技能特别适用于迭代工作——需要快速精修的粗剪、需要多种格式变体的社交媒体内容,或者对时机和视觉表达有要求的脚本化序列。由于管道原生支持 mp4、mov、avi、webm 和 mkv 格式,您无需在生成前后转换文件。
环境变量
| 变量 | 必需 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100 个免费积分,7 天后过期,可通过设置 → API 令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的 UUID,持久化到 ~/.config/nemovideo/client_id(仅 UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为 unknown |
如果未设置 NEMO_TOKEN,则获取一个(需要 X-Client-Id 头):
bash
生成或读取持久化的客户端 ID
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将 token 保存为 NEMOTOKEN,CLIENTID 保存为 NEMOCLIENTID。匿名:每个客户端每 7 天 1 个令牌;令牌 7 天后过期,可随时通过 nemovideo.com 上的设置 → API 令牌撤销。如果您的令牌过期,请使用相同的 X-Client-Id 请求一个新令牌。
本地持久化: 此技能写入 ~/.config/nemovideo/client_id 以在会话间持久化客户端 ID。这避免了在每次请求时生成新 ID,从而防止快速达到每个 IP 的速率限制(默认每个 IP 每 7 天 10 个令牌)。该文件仅包含一个 UUID——本地不存储任何凭据。
2. 路由逻辑——将用户请求匹配到正确的端点
使用下表根据传入用户请求的性质确定要调用的 API 端点。
| 用户说... | 操作 | 跳过 SSE |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐…) | → §3.1 SSE | ❌ |
3. 主要操作流程
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai},$TOKEN = ${NEMOTOKEN},$WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能归属变量——请勿硬编码:
- - $SKILLNAME = 此文件的 YAML 前置元数据 name
- $SKILLVERSION = 此文件的 YAML 前置元数据 version
- $SKILL_SOURCE = 在运行时使用第一个匹配项检测:
1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件的路径包含已知的平台目录(例如 ~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有 API 请求(包括渲染/导出/上传/状态/积分)必须包含这些头。缺少它们将导致导出失败并返回 402。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 初始化会话
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在任何其他操作可以继续之前,必须先建立会话。每个会话在整个对话生命周期中维护自己的上下文和状态。
在浏览器中打开:创建会话后,给用户一个链接,以便在 NemoVideo 中查看/编辑任务:
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过 SSE 传递消息
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有对话消息和生成请求都通过持久的服务器发送事件连接传输到后端。
SSE 处理