From Frozen Frame to Living Scene in Seconds
Still images carry stories that movement can unlock. The image-to-video-ai skill inside ClawHub takes any uploaded photo or graphic and interprets your natural-language instructions to produce a coherent animated video — no timeline editors, no keyframe rigging, no render queues to manage manually.
The conversational editing model means you direct the transformation the same way you'd describe it to a colleague. Say 'make the clouds drift left and add a slow zoom toward the subject,' and the AI backend processes both the visual composition of your image and the semantic meaning of your request to produce motion that feels intentional, not mechanical.
Powering this workflow is the OpenClaw agent, which coordinates between your input image, the motion generation model, and the output rendering pipeline. The agent handles frame interpolation decisions, duration mapping, and format packaging so the final file arrives ready to publish. Whether you need a looping social clip or a longer animated sequence, the skill adapts to your described intent and delivers in your preferred format — mp4, mov, avi, webm, or mkv.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE0 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE1 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo && echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Incoming Requests to the Correct Endpoint
Use the table below to determine which endpoint should handle each type of incoming request.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Primary Operation Flows
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE21 - INLINECODE22 = this file's YAML frontmatter INLINECODE23
- INLINECODE24 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE30
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK1
3.0 Establishing a Session
curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
A session must be initialized before any other operations can proceed. This session context is required for all subsequent API interactions.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE31
3.1 Delivering Messages Over SSE
curl -s -X POST "$API/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All message delivery to the backend is handled through a persistent Server-Sent Events connection.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Wait silently, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Roughly 30% of edit operations return no text content in the response. When this occurs: (1) do not treat the absence of text as an error, (2) poll the task state endpoint to confirm completion, (3) retrieve the output asset directly, and (4) present the result to the user without referencing the missing text response.
Two-stage generation: After the raw video asset is produced, the backend automatically initiates a second processing stage that layers in background music and a title sequence. Both stages must reach a completed state before the final asset URL is surfaced to the user.
3.2 Handling File Uploads
File upload: INLINECODE34
URL upload: INLINECODE35
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
The upload endpoint accepts image and video source files that will be referenced during generation.
3.3 Checking Available Credits
curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint before initiating any generation task to confirm the account has a sufficient balance.
3.4 Polling Task Status
curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK6
3.5 Exporting and Delivering the Final Asset
Export does NOT cost credits. Only generation/editing consumes credits.
Triggering an export does not deduct credits from the account balance. To deliver the final asset: (a) call the export endpoint with the completed task ID, (b) wait for the export job to reach a finished state, (c) retrieve the download URL from the export response, (d) verify the URL is accessible, and (e) present the link or embedded asset to the user.
b) Submit: INLINECODE47
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE50
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: $API/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE55
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Recovering from an SSE Disconnection
If the SSE stream drops unexpectedly, follow these steps: (1) capture the last known task ID before the connection was lost; (2) wait a minimum of three seconds before attempting to reconnect; (3) re-establish the SSE connection using the original session token; (4) resume polling the task state endpoint using the saved task ID; (5) once a terminal state is confirmed, proceed with the export flow as normal.
4. Translating Backend GUI References
The backend operates under the assumption that a graphical interface is present, so GUI-specific instructions or labels must never be forwarded directly to the end user.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Interaction Patterns
• Always confirm available credits before starting a generation task to avoid failed requests.
• Surface progress updates to the user during long-running tasks by reporting polled status changes.
• When a silent response is detected, skip any messaging about missing text and move directly to asset retrieval.
• Wait for both processing stages to complete before presenting a download link or preview.
• Translate any backend GUI terminology into plain, user-friendly language before displaying it.
6. Known Limitations
• Animated output length is constrained by the platform's maximum clip duration and cannot be extended beyond that limit.
• Only the image formats explicitly listed in the upload specification are accepted; all others will be rejected.
• Background music and title overlays are applied automatically and cannot be disabled or customized through the API.
• Concurrent generation jobs per account are subject to rate limits enforced by the backend.
• Silent responses are an expected backend behavior and do not indicate a fault in the integration.
7. Error Handling Reference
The table below maps common HTTP error codes and backend fault states to the appropriate recovery actions.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Required Token Scopes
Before making any calls, verify that the API version in use matches the version this skill was built and tested against. The access token must include all scopes required for session creation, file upload, task polling, and export operations; requests made with a token missing any of these scopes will be rejected by the backend.
从静止画面到生动场景,只需数秒
静态图像承载着故事,而运动可以将其解锁。ClawHub中的image-to-video-ai技能能够解读您上传的任何照片或图形,并根据您的自然语言指令生成连贯的动画视频——无需时间线编辑器、无需关键帧绑定、无需手动管理渲染队列。
对话式编辑模式意味着您可以用向同事描述的方式直接指导转换过程。说出让云朵向左飘移,并缓慢拉近主体,AI后端会同时处理您图像的视觉构成和请求的语义含义,生成感觉自然而非机械的运动效果。
驱动这一工作流程的是OpenClaw代理,它在您的输入图像、运动生成模型和输出渲染管线之间进行协调。该代理负责帧插值决策、时长映射和格式封装,确保最终文件可直接发布。无论您需要的是循环播放的社交媒体片段还是更长的动画序列,该技能都能根据您描述的意图进行调整,并以您偏好的格式交付——mp4、mov、avi、webm或mkv。
环境变量
| 变量 | 必填 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100个免费积分,7天后过期,可通过设置→API令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的UUID,持久化到~/.config/nemovideo/client_id(仅UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为unknown |
如果未设置NEMO_TOKEN,请获取一个(需要X-Client-Id头):
bash
生成或读取持久化的Client-Id
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo && echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将token保存为NEMOTOKEN,CLIENTID保存为NEMOCLIENTID。匿名:每个客户端每7天1个令牌;令牌7天后过期,可随时通过nemovideo.com上的设置→API令牌撤销。如果令牌过期,使用相同的X-Client-Id请求新令牌。
本地持久化: 此技能会写入~/.config/nemovideo/client_id以跨会话持久化Client-Id。这避免了每次请求都生成新ID,从而防止快速触发每IP速率限制(默认每IP每7天10个令牌)。该文件仅包含一个UUID——本地不存储任何凭据。
2. 将传入请求路由到正确的端点
使用下表确定每个类型的传入请求应由哪个端点处理。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有(生成、编辑、添加背景音乐...) | → §3.1 SSE | ❌ |
3. 主要操作流程
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMOTOKEN}, $WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能归属变量——请勿硬编码:
- - $SKILLNAME = 此文件的YAML前置元数据name
- $SKILLVERSION = 此文件的YAML前置元数据version
- $SKILL_SOURCE = 运行时检测,使用第一个匹配项:
1. 如果设置了环境变量SKILL_SOURCE
2. 如果此文件路径包含已知的平台目录(例如~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有API请求(包括渲染/导出/上传/状态/积分)必须包含这些头。缺少它们将导致导出失败并返回402。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 建立会话
bash
curl -s -X POST $API/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在执行任何其他操作之前,必须先初始化会话。此会话上下文是所有后续API交互所必需的。
在浏览器中打开:创建会话后,给用户一个链接,用于在NemoVideo中查看/编辑任务:
$WEB/workspace/claim?token=$TOKEN&task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过SSE传递消息
bash
curl -s -X POST $API/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有向后端传递的消息都通过持久的服务器发送事件连接处理。
SSE处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
静默等待,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
典型时长:文本5-15秒,视频生成100-300秒,编辑10-30秒。
超时:仅心跳10分钟→假定超时。生成期间切勿重新发送(会导致重复和重复计费)。
如果之前的响应正常,忽略结尾的我遇到了临时问题。
静默响应回退(关键)
大约30%的编辑操作在响应中不返回文本内容。发生这种情况时:(1)不要将缺少文本视为错误,(2)轮询任务状态端点以确认完成,(3)直接检索输出资产,(4)向用户呈现结果,无需提及缺少文本响应。
两阶段生成:原始视频资产生成后,后端会自动启动第二阶段处理,添加背景音乐和标题序列。两个阶段都必须达到完成状态,最终资产URL才会呈现给用户。
3.2 处理文件上传
文件上传:curl -s -X POST $API/api/upload-video/nemoagent/me/ -H Authorization: Bearer $TOKEN -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILLSOURCE -F files=@/path/to/file
URL上传:curl -s -X POST $API/api/upload-video/nemoagent/me/ -H Authorization: Bearer $TOKEN -H Content-Type: application/json -H X-Skill-Source: $SKILLNAME -H