Getting Started
Welcome! I turn your still images into dynamic, motion-filled video clips using AI — no editing software needed. Drop an image and tell me the style of animation you want to get started!
Try saying:
- - "Here's a product photo of my sneakers — can you animate it with a slow 360-degree rotation and a subtle zoom-in effect for an Instagram reel?"
- "I have a landscape painting I'd like turned into a video with a cinematic parallax drift and soft cloud movement in the background — about 6 seconds long."
- "Take this portrait photo and create a short looping video with a gentle breathing effect and a slow push-in toward the subject's face."
Getting Connected
Before handling any user request, establish a connection to the backend API. Show a brief status like "Connecting...".
If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:
- - Generate a UUID as client identifier
- POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header - The response includes a
token with 100 free credits valid for 7 days — use it as NEMO_TOKEN
Then create a session by POSTing to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.
Tell the user you're ready. Keep the technical details out of the chat.
From Still Frame to Moving Story — Instantly
Static images tell a moment. Videos tell a story. This skill bridges that gap by taking any image you provide — a product photo, a portrait, a landscape, an illustration — and generating a video clip that moves, breathes, and engages viewers in ways a flat image simply cannot.
Whether you're building a social media reel, animating a hero image for a landing page, or creating a slideshow that flows like a film, this tool handles the heavy lifting. You describe the motion style you want — a slow zoom, a parallax drift, a dramatic pan — and the AI interprets your image's content to produce natural-looking movement that fits the scene.
This skill is built for speed and creative flexibility. You don't need to storyboard, keyframe, or export from a timeline editor. Just bring your image and your vision, and within moments you'll have a video ready to share, embed, or build upon. It's the fastest path from a single photo to a scroll-stopping video clip.
Motion Request Routing Logic
When you submit a still image for animation, your request is parsed for motion parameters — including frame duration, camera movement style, and interpolation intensity — then dispatched to the appropriate rendering pipeline based on resolution and complexity.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Rendering API Reference
All image-to-video synthesis runs on distributed GPU clusters via the animation backend, which handles optical flow estimation, temporal frame generation, and video encoding entirely in the cloud. Your source image never needs to leave the session payload — the API accepts base64-encoded frames or direct CDN URLs for seamless diffusion-based motion processing.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE8 - INLINECODE9 : from frontmatter INLINECODE10
- INLINECODE11 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE21
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE31
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE35
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE39
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Troubleshooting
If the generated video motion looks unnatural or jittery, the most common cause is an image with very low contrast between the subject and background. Try providing an image where the main subject is clearly separated from the background, or specify a simpler motion style like a slow zoom rather than a full parallax.
If the animation ignores part of your image — for example, not moving a sky element you expected to animate — try describing the specific region in your prompt more explicitly. Instead of 'make the clouds move,' try 'animate the upper third of the image with slow drifting cloud motion from left to right.'
For looping videos that feel seamless, request a motion style that returns to its starting position — such as a slow zoom that eases back out, or a drift that reverses gently. Abrupt-ending clips are harder to loop cleanly. If your output has an unexpected color shift or vignette, it may be related to the motion blur style applied — specify 'no vignette' or 'preserve original color grading' in your prompt to override default stylistic choices.
Performance Notes
The quality of the generated video depends significantly on the input image. High-resolution images with clear subjects and well-defined edges produce the most convincing motion — the AI has more detail to work with when simulating depth and movement. Low-resolution or heavily compressed images may result in softer motion or visible artifacts around fine details like hair or foliage.
Complex motion requests on images with busy backgrounds — such as animating a crowd scene or a highly detailed illustration — may take longer to process and can occasionally produce inconsistent results in peripheral areas. For best output, start with a clear focal subject and a relatively uncluttered background.
Video length also affects processing time. Clips under 8 seconds generate quickly and maintain high fidelity. Longer sequences may require breaking the animation into segments for optimal quality. Portrait-oriented images (9:16) and square images (1:1) are natively supported for social media formats.
Use Cases
This image-to-video-ai-generator skill fits naturally into a wide range of creative and professional workflows. E-commerce brands use it to animate product photos into short showcase clips for platforms like TikTok, Instagram Reels, and Pinterest Video Pins — dramatically increasing engagement compared to static listings.
Digital marketers use it to turn hero images and campaign visuals into motion ads without hiring a video production team. A single well-composed brand photo can become a 5-second bumper ad or a looping background video for a landing page.
Artists and illustrators use it to bring their work to life for portfolio showcases, NFT presentations, or social media promotion. A finished illustration animated with a subtle parallax effect feels entirely new without altering the original artwork.
Content creators building slideshows, memorial videos, travel recaps, or educational content use it to stitch animated image clips together into cohesive narrative videos that feel polished and intentional.
开始使用
欢迎!我使用AI将您的静态图像转化为动态、充满活力的视频片段——无需任何编辑软件。只需上传一张图片,告诉我您想要的动画风格即可开始!
试试这样说:
- - 这是我运动鞋的产品照片——你能为它制作一个慢速360度旋转并带有细微放大效果的动画,用于Instagram Reels吗?
- 我有一幅风景画,想把它变成一个带有电影感视差漂移和背景中柔和云朵运动的视频——大约6秒钟长。
- 用这张肖像照片创建一个带有轻柔呼吸效果和缓慢推进到人物面部的短循环视频。
建立连接
在处理任何用户请求之前,先建立与后端API的连接。显示一个简短的状态,如正在连接...。
如果环境中存在NEMO_TOKEN,直接使用它并创建一个会话。否则,获取一个免费的起始令牌:
- - 生成一个UUID作为客户端标识符
- 使用X-Client-Id头信息POST到https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
- 响应中包含一个token,带有100个免费积分,有效期为7天——将其用作NEMO_TOKEN
然后创建一个会话,使用Bearer授权和主体{taskname:project,language:en}POST到https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent。响应中的session_id是所有后续请求所必需的。
告诉用户您已准备就绪。将技术细节保留在聊天之外。
从静态画面到动态故事——即刻实现
静态图像捕捉一个瞬间。视频讲述一个故事。这项技能通过接收您提供的任何图像——产品照片、肖像、风景、插图——并生成一个移动、呼吸、以平面图像无法做到的方式吸引观众的视频片段,弥合了这一差距。
无论您是在制作社交媒体Reels、为着陆页制作英雄图像动画,还是创建像电影一样流畅的幻灯片,这个工具都能处理繁重的工作。您描述想要的运动风格——慢速缩放、视差漂移、戏剧性的平移——AI会解读您图像的内容,生成适合场景的自然运动。
这项技能专为速度和创意灵活性而构建。您不需要制作故事板、关键帧或从时间线编辑器导出。只需带上您的图像和愿景,片刻之间,您就会拥有一个可以分享、嵌入或进一步制作的视频。这是从一张照片到令人驻足的视频片段的最快路径。
运动请求路由逻辑
当您提交静态图像进行动画处理时,您的请求会被解析为运动参数——包括帧持续时间、摄像机运动风格和插值强度——然后根据分辨率和复杂度分派到适当的渲染管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐...) | → §3.1 SSE | ❌ |
云端渲染API参考
所有图像到视频的合成都在动画后端的分布式GPU集群上运行,该后端完全在云端处理光流估计、时间帧生成和视频编码。您的源图像永远不需要离开会话负载——API接受base64编码的帧或直接的CDN URL,以实现无缝的基于扩散的运动处理。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:image-to-video-ai-generator
- X-Skill-Version:来自前置元数据 version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则 unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败并返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 主体 {taskname:project,language:} — 返回 taskid、sessionid。
发送消息(SSE):POST /runsse — 主体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}} 并带有 Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 主体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到 status = completed。下载URL位于 output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每2分钟:⏳ 仍在工作中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 您做 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=持续时间(毫秒),m=元数据。
时间线(3个轨道):1. 视频:城市延时摄影(0-10秒) 2. 背景音乐:Lo-fi(0-10秒,35%) 3. 标题:城市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示带有 ?bind=
的注册URL(需要时从create-session或state响应获取 )。已注册用户:在您的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,不是积分问题。注册或升级您的计划以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
故障排除
如果生成的视频运动看起来不自然或抖动,最常见的原因是图像主体和背景之间的对比度非常低。尝试提供主体与背景清晰分离的图像,或指定更简单的运动风格