0. First Contact

When the user opens this skill or sends their first message, greet them immediately:

📱 Ready to image to video generator! Just send me a video or describe your project.

Try saying:

- "help me create a short video"
"add effects to this clip"
"edit my video"

IMPORTANT: Do NOT wait silently. Always greet the user proactively on first contact.

Auto-Setup (do this SILENTLY before anything else)

Before responding to the user, complete these steps invisibly:

1. Check token: If NEMO_TOKEN env var is set, use it. Otherwise:
Read or generate Client-ID:

- Read ~/.config/nemovideo/client_id if it exists - Otherwise generate a UUID, save it to ~/.config/nemovideo/client_id

3. Acquire anonymous token:

   curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"

Store the returned token as NEMO_TOKEN for this session. You get 100 free credits.

4. Create a session (§3.0) so you're ready to work immediately.

CRITICAL: Never mention tokens, authentication, API setup, or Client-IDs to the user. The user should feel like the skill "just works".

Turn Static Images Into Cinematic Motion Stories

Still images carry meaning, but video captures attention. This skill bridges that gap by letting you describe exactly how you want your image to move — whether that's a gentle parallax drift, a dramatic zoom, flowing particle effects, or scene transitions — and delivering a rendered video that matches your creative intent.

The experience is fully conversational. Rather than wrestling with timelines, keyframes, or motion curves, you simply describe what you want in plain language. The OpenClaw agent interprets your instructions, communicates with the underlying AI video synthesis backend, and iterates with you until the result feels right. You can refine motion speed, direction, loop behavior, and visual tone across multiple turns in the same session.

Under the hood, the AI backend analyzes depth, subject boundaries, and visual composition within your source image to generate plausible, coherent motion that respects the original scene. The result is not a slideshow or a pan-and-scan effect — it is a genuinely animated video derived from a single frame. Final exports are available in mp4, mov, avi, webm, and mkv formats to fit any platform or publishing workflow.

Environment Variables

Variable	Required	Default
INLINECODE5	No	Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens)
INLINECODE6

If NEMO_TOKEN is not set, get one (requires X-Client-Id header):

# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
  CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
  mkdir -p ~/.config/nemovideo && echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}

Save token as NEMO_TOKEN, CLIENT_ID as NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same X-Client-Id.

Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.

2. Routing Incoming Requests

Use the following table to determine which endpoint handles each type of incoming request.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

3. Primary Operation Flows

$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.

Skill attribution variables — do NOT hardcode:

- $SKILL_NAME = this file's YAML frontmatter INLINECODE26
INLINECODE27 = this file's YAML frontmatter INLINECODE28
INLINECODE29 = detect at runtime using the first match:

1. Env SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g. ~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor)
3. Fallback: INLINECODE35

CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2

3.0 Establishing a Session

curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
  -d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}

A session must be initialized before any other operations can proceed. This creates the context that all subsequent requests will be tied to.

Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36

3.1 Delivering Messages via SSE

curl -s -X POST "$API/run_sse" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
  -d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'

All conversational messages are transmitted to the backend through a Server-Sent Events connection.

SSE Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.

Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).

Ignore trailing "I encountered a temporary issue" if prior responses were normal.

Silent Response Fallback (CRITICAL)

Approximately 30% of edit operations return no visible text in the response. When this occurs: (1) do not treat the absence of text as a failure, (2) poll the task state endpoint to confirm processing is underway, (3) once the task reaches a completed state, proceed directly to the export step, and (4) inform the user that their edit is being processed without alarming them about the lack of a text reply.

Two-stage generation: After the raw video is produced, the backend automatically initiates a second processing stage that layers in background music and a title sequence. Do not treat the first completed video as the final deliverable — wait for both stages to finish before presenting the result to the user.

3.2 Handling File Uploads

File upload: INLINECODE39

URL upload: INLINECODE40

Use me in the path; backend resolves user from token.

Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

The upload endpoint accepts image and video files that will serve as source material for generation tasks.

3.3 Checking Available Credits

curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}

Query the credits endpoint before initiating any generation task to confirm the user has a sufficient balance.

3.4 Polling Task Status

curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"

Use me for user in path; backend resolves from token. Key fields: data.state.draft, data.state.video_infos, data.state.canvas_config, data.state.generated_media.

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Draft ready for export when draft.t exists with at least one track with non-empty sg.

Track summary format:
CODEBLOCK7

3.5 Exporting and Delivering the Final Asset

Export does NOT cost credits. Only generation/editing consumes credits.

Triggering an export does not deduct any credits from the user's balance. To deliver the finished asset: (a) call the export endpoint once the task is confirmed complete, (b) retrieve the download URL from the response, (c) verify the URL is accessible, (d) present the link or embed the asset directly in the chat, and (e) confirm successful delivery to the user.

b) Submit: INLINECODE52

Note: sessionId is camelCase (exception). On failure → new id, retry once.

c) Poll (every 30s, max 10 polls): INLINECODE55

Status at top-level status: pending → processing → completed / failed. Download URL at output.url.

d) Download from output.url → send to user. Fallback: $API/api/render/proxy/<id>/download.

e) When delivering the video, always also give the task detail link: INLINECODE60

Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.

3.6 Recovering from an SSE Disconnection

If the SSE stream drops unexpectedly, follow these steps to recover: (1) detect the disconnection event and log it internally without surfacing an error to the user prematurely, (2) attempt to re-establish the SSE connection using the existing session ID, (3) if reconnection succeeds, resume listening for task progress events from where the stream left off, (4) if reconnection fails after the maximum number of retries, fall back to polling the task state endpoint at a regular interval, and (5) once a terminal task state is confirmed, proceed with the export flow as normal.

4. Translating GUI Elements

The backend operates under the assumption that a graphical interface is present, so GUI-specific instructions must never be passed through directly to the user.

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Keep content descriptions. Strip GUI actions.

5. Recommended Interaction Patterns

• Acknowledge the user's request immediately and set clear expectations about processing time before the generation task begins.
• Provide incremental progress updates during long-running tasks so users remain informed without needing to ask.
• When a task completes, always surface the final exported asset rather than an intermediate result.
• If the user submits an ambiguous prompt, ask a single focused clarifying question rather than making assumptions.
• After delivering the finished video, invite the user to request edits or refinements to keep the conversation moving forward.

6. Known Limitations

• Generation tasks can take several minutes to complete; real-time delivery is not possible.
• The system does not support more than one concurrent generation task per session.
• Source images must meet minimum resolution requirements or the upload will be rejected.
• Background music and title overlays are applied automatically and cannot be individually disabled through the API.
• Credit balances are read-only via the API; top-ups must be handled through the platform's billing interface.

7. Error Handling Reference

The table below maps common error codes to their likely causes and the recommended recovery action for each.

Code	Meaning	Action
0	Success	Continue
1001

Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.

8. API Version and Required Scopes

Always verify that the API version header matches the version documented in this skill before making requests, as older versions may not support all endpoints described here. The access token provided at session creation must include the required scopes for generation, upload, export, and credits reading; requests made with tokens missing any of these scopes will return a 403 response.

0. 首次接触

当用户打开此技能或发送第一条消息时，立即问候他们：

📱 图像转视频生成器已就绪！只需发送视频或描述您的项目。

尝试说：

- 帮我创建一个短视频
为这个片段添加特效
编辑我的视频

重要提示：不要静默等待。首次接触时务必主动问候用户。

自动设置（在其他操作之前静默完成）

在回复用户之前，不可见地完成以下步骤：

1. 检查令牌：如果设置了 NEMO_TOKEN 环境变量，则使用它。否则：
读取或生成客户端ID：

- 如果存在，读取 ~/.config/nemovideo/client_id - 否则生成一个UUID，保存到 ~/.config/nemovideo/client_id

3. 获取匿名令牌：

bash curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID

将返回的 token 存储为本会话的 NEMO_TOKEN。您将获得100个免费积分。

4. 创建会话（§3.0），以便立即开始工作。

关键：切勿向用户提及令牌、认证、API设置或客户端ID。用户应感觉该技能开箱即用。

将静态图像转变为电影级动态故事

静态图像承载意义，但视频吸引注意力。本技能通过让您精确描述希望图像如何运动——无论是微妙的视差漂移、戏剧性的缩放、流动的粒子效果还是场景过渡——并交付符合您创意意图的渲染视频，弥合了这一差距。

体验完全基于对话。您无需与时间线、关键帧或运动曲线搏斗，只需用自然语言描述您的需求。OpenClaw代理会解释您的指令，与底层AI视频合成后端通信，并与您反复迭代直至结果满意。您可以在同一会话中多次调整运动速度、方向、循环行为和视觉基调。

在底层，AI后端分析源图像中的深度、主体边界和视觉构图，生成合理且连贯的运动，同时尊重原始场景。结果不是幻灯片或平移扫描效果——而是从单帧衍生出的真正动画视频。最终导出支持mp4、mov、avi、webm和mkv格式，适配任何平台或发布工作流。

环境变量

变量	是否必需	默认值
NEMOTOKEN	否	自动生成（100个免费积分，7天过期，可通过设置→API令牌撤销）
NEMOAPI_URL

如果未设置 NEMO_TOKEN，获取一个（需要 X-Client-Id 头）：
bash

生成或读取持久化的客户端ID

CLIENTID=${NEMOCLIENTID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo && echo $CLIENTID > ~/.config/nemovideo/clientid
fi
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

将 token 保存为 NEMOTOKEN，CLIENTID 保存为 NEMOCLIENTID。匿名：每个客户端每7天1个令牌；令牌7天后过期，可随时通过nemovideo.com上的设置→API令牌撤销。如果令牌过期，使用相同的 X-Client-Id 请求新令牌。

本地持久化： 此技能写入 ~/.config/nemovideo/client_id 以跨会话持久化客户端ID。这避免了每次请求生成新ID，从而防止快速达到每IP速率限制（默认每IP每7天10个令牌）。该文件仅包含一个UUID——本地不存储任何凭证。

2. 路由传入请求

使用下表确定每个端点处理哪种类型的传入请求。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加背景音乐…） | → §3.1 SSE | ❌ |

3. 主要操作流程

$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMOTOKEN}, $WEB = ${NEMOWEB_URL:-https://nemovideo.com}。

技能归属变量——请勿硬编码：

- $SKILLNAME = 此文件的YAML前置元数据 name
$SKILLVERSION = 此文件的YAML前置元数据 version
$SKILL_SOURCE = 运行时检测，使用第一个匹配项：

1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件路径包含已知平台目录（例如 ~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor）
3. 回退：unknown

关键：所有API请求（包括渲染/导出/上传/状态/积分）必须包含以下头。缺少它们将导致导出失败并返回402。

X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE

3.0 建立会话

bash curl -s -X POST $API/api/tasks/me/with-session/nemo_agent \ -H Authorization: Bearer $TOKEN -H Content-Type: application/json \ -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \ -d {task_name:project,language:}

→ {code:0,data:{taskid:...,sessionid:...}}

在任何其他操作进行之前，必须先初始化会话。这将创建后续所有请求所绑定的上下文。

在浏览器中打开：创建会话后，给用户一个在NemoVideo中查看/编辑任务的链接：
$WEB/workspace/claim?token=$TOKEN&task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE

3.1 通过SSE传递消息

bash curl -s -X POST $API/run_sse \ -H Authorization: Bearer $TOKEN -H Content-Type: application/json \ -H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \ -d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}

所有对话消息通过服务器发送事件连接传输到后端。

SSE处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

典型时长：文本5-15秒，视频生成100-300秒，编辑10-30秒。

超时：仅心跳持续10分钟→视为超时。生成期间切勿重新发送（会导致重复和双重计费）。

如果之前的响应

ai-image-to-video-generatorAI图像转视频

ai-image-to-video-generator

0. First Contact

Auto-Setup (do this SILENTLY before anything else)

Turn Static Images Into Cinematic Motion Stories

Environment Variables

2. Routing Incoming Requests

3. Primary Operation Flows

3.0 Establishing a Session

3.1 Delivering Messages via SSE

SSE Handling

Silent Response Fallback (CRITICAL)

3.2 Handling File Uploads

3.3 Checking Available Credits

3.4 Polling Task Status

3.5 Exporting and Delivering the Final Asset

3.6 Recovering from an SSE Disconnection

4. Translating GUI Elements

5. Recommended Interaction Patterns

6. Known Limitations

7. Error Handling Reference

8. API Version and Required Scopes

0. 首次接触

自动设置（在其他操作之前静默完成）

将静态图像转变为电影级动态故事

环境变量

生成或读取持久化的客户端ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

2. 路由传入请求

3. 主要操作流程

3.0 建立会话

→ {code:0,data:{taskid:...,sessionid:...}}

3.1 通过SSE传递消息

SSE处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

ai-image-to-video-generatorAI图像转视频

ai-image-to-video-generator

0. First Contact

Auto-Setup (do this SILENTLY before anything else)

Turn Static Images Into Cinematic Motion Stories

Environment Variables

2. Routing Incoming Requests

3. Primary Operation Flows

3.0 Establishing a Session

3.1 Delivering Messages via SSE

SSE Handling

Silent Response Fallback (CRITICAL)

3.2 Handling File Uploads

3.3 Checking Available Credits

3.4 Polling Task Status

3.5 Exporting and Delivering the Final Asset

3.6 Recovering from an SSE Disconnection

4. Translating GUI Elements

5. Recommended Interaction Patterns

6. Known Limitations

7. Error Handling Reference

8. API Version and Required Scopes

0. 首次接触

自动设置（在其他操作之前静默完成）

将静态图像转变为电影级动态故事

环境变量

生成或读取持久化的客户端ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

2. 路由传入请求

3. 主要操作流程

3.0 建立会话

→ {code:0,data:{taskid:...,sessionid:...}}

3.1 通过SSE传递消息

SSE处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement