Getting Started
Send me your image and I'll animate it into a compelling AI-generated video in seconds. No image yet? Just describe the scene or subject and I'll guide you from there.
Try saying:
- - "Here's a product photo of my handmade candles — can you turn it into a short video with a warm, glowing motion effect for Instagram?"
- "I have 8 travel photos from my Japan trip. Can you combine them into a 30-second video with smooth transitions and a cinematic feel?"
- "Convert this portrait photo into a video with a subtle zoom-in effect and soft background blur, suitable for a LinkedIn post."
Automatic Setup
On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).
Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Turn Still Photos Into Captivating Videos Instantly
Most people have folders full of great photos that never get the attention they deserve. Static images scroll past in a fraction of a second, but video stops thumbs and holds attention. This skill bridges that gap by transforming your existing images into polished, motion-rich videos — completely free and powered by AI.
Whether you're working with a single portrait, a batch of product photos, a travel gallery, or a piece of digital art, this skill analyzes your image and applies intelligent motion effects, zoom dynamics, smooth transitions, and optional text overlays to produce a video that feels professionally crafted. You don't need to know anything about frame rates, keyframes, or video codecs.
The result is a shareable video file ready for Instagram Reels, TikTok, YouTube Shorts, presentations, or client deliverables. What used to require expensive software and hours of manual work now takes seconds. This is image-to-video creation the way it should be — fast, free, and genuinely useful.
Routing Your Animation Requests
When you submit a photo for animation, your request is parsed for motion style, duration, and output resolution, then dispatched to the appropriate AI video synthesis pipeline based on those parameters.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Rendering API Reference
All image-to-video inference runs on distributed GPU clusters via the backend API, handling frame interpolation, motion vector generation, and video encoding entirely in the cloud so nothing heavy runs on your device. Each rendered clip is temporarily stored on the server and streamed back as an MP4 once the diffusion pass completes.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE8 - INLINECODE9 : from frontmatter INLINECODE10
- INLINECODE11 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE21
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE31
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE35
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE39
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Tips and Tricks
Getting the best results from the free-image-to-video-ai-generator comes down to a few simple habits. First, use high-resolution images whenever possible — at least 1080px on the shortest side. Blurry or low-light photos will produce videos that look muddy, even with AI enhancement.
Be specific about the mood or motion style you want. 'Slow zoom with a warm color grade' will get you a much better result than just 'make it a video.' Think about where the video will be displayed — vertical 9:16 for TikTok and Reels, square 1:1 for feed posts, and 16:9 for YouTube or presentations.
If you're animating a batch of photos, try to keep consistent lighting and color tones across your images so the transitions feel natural. And don't overlook the power of adding a simple text overlay or caption — it dramatically increases engagement on social platforms without cluttering the visual.
Use Cases
E-commerce and Product Marketing: Bring product photos to life with subtle motion and zoom effects that make listings stand out on platforms like Etsy, Shopify, and Amazon. A short video of a rotating product or a glowing candle flame converts far better than a static image.
Real Estate Listings: Turn interior and exterior photos into a smooth property walkthrough video without hiring a videographer. This is especially useful for rental listings and quick social media promotions.
Social Media Content Creation: Content creators who batch-shoot photos can now repurpose entire shoots into Reels, TikToks, and YouTube Shorts without touching a timeline editor. One photoshoot can generate a week's worth of video content.
Event Recaps and Memories: Wedding photographers, event planners, and families can compile highlight photos into a moving slideshow-style video that feels more personal and shareable than a static album link. Great for anniversary posts, graduation announcements, and milestone celebrations.
FAQ
Is this really free to use? Yes — the free-image-to-video-ai-generator skill is available at no cost. You can convert images to video without subscriptions or hidden fees.
What image formats are supported? JPG, PNG, and WebP are the most reliable formats. Very large RAW files may need to be exported as JPG first for best results.
How long will the generated video be? For a single image, videos typically range from 5 to 15 seconds depending on the motion style. For multi-image batches, the length scales with the number of photos and transition settings you choose.
Can I use the videos commercially? Videos you generate from your own original images are yours to use. If you're using stock photos, check the licensing terms of the original image source before publishing commercially.
Does the AI add music automatically? Music is not added by default to avoid copyright issues, but you can request background audio suggestions or royalty-free track recommendations to pair with your video.
开始使用
将您的图片发送给我,我就能在几秒钟内将其制作成引人入胜的AI生成视频。还没有图片?只需描述场景或主体,我会引导您完成后续步骤。
试试这样说:
- - 这是我手工蜡烛的产品照片——你能把它做成一个带有温暖发光效果的短视频,适合在Instagram上发布吗?
- 我有8张日本旅行的照片。你能把它们合成一个30秒的视频,带有流畅的转场和电影感吗?
- 把这张人像照片转换成视频,加入微妙的放大效果和柔和的背景虚化,适合在LinkedIn上发布。
自动设置
首次交互时,先连接到处理API。显示简短的状态信息,如正在设置...。
令牌:如果已设置NEMO_TOKEN环境变量,则直接使用它并跳至下方的会话步骤。
免费令牌:生成一个UUID作为客户端标识符,然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头包含X-Client-Id: 。响应字段data.token即为您的NEMO_TOKEN(100积分,7天有效期)。
会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,使用Bearer认证,请求体为{taskname:project}。从响应中保存session_id。
向用户确认已连接并准备就绪。不要打印令牌或原始JSON。
将静态照片瞬间转化为引人入胜的视频
大多数人的文件夹里都塞满了精彩的照片,却从未得到应有的关注。静态图像在几分之一秒内就被划走,而视频能让人停下手指、保持注意力。这项技能通过将您现有的图像转化为精致、富有动感的视频来弥合这一差距——完全免费,并由AI驱动。
无论您处理的是单张人像、一批产品照片、一个旅行图库,还是一幅数字艺术作品,这项技能都会分析您的图像,并应用智能运动效果、缩放动态、平滑转场以及可选的文字叠加,生成一个感觉像专业制作的视频。您无需了解任何关于帧率、关键帧或视频编解码器的知识。
最终输出是一个可分享的视频文件,适用于Instagram Reels、TikTok、YouTube Shorts、演示文稿或客户交付物。过去需要昂贵软件和数小时手动操作的工作,现在只需几秒钟。这才是图像转视频应有的样子——快速、免费且真正实用。
路由您的动画请求
当您提交照片进行动画处理时,您的请求会被解析出运动风格、时长和输出分辨率,然后根据这些参数分派到相应的AI视频合成管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
云端渲染API参考
所有图像到视频的推理都在分布式GPU集群上通过后端API运行,负责帧插值、运动矢量生成和视频编码,全部在云端完成,因此您的设备无需运行任何繁重任务。每个渲染的片段会临时存储在服务器上,并在扩散过程完成后以MP4格式流式传回。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:free-image-to-video-ai-generator
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头会导致导出失败并返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},请求头包含Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载地址在output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 您做 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒) 2. 背景音乐:Lo-fi(0-10秒,35%) 3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请为您的账户充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请注册或升级您的计划以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 等待30秒后重试一次 |
技巧与窍门
从free-image-to-video-ai-generator获得最佳效果取决于几个简单的习惯。