Getting Started
Welcome! I'm here to help you turn your photos into eye-catching videos completely free — just share your images and tell me the style or mood you're going for, and let's create something worth watching.
Try saying:
- - "Convert my 5 product photos into a 15-second Instagram Reel with smooth zoom transitions and upbeat pacing"
- "Animate my single landscape photo with a slow Ken Burns zoom-out effect for a cinematic YouTube intro"
- "Create a slideshow video from 10 wedding photos with elegant fade transitions and a 3-second display time per image"
Getting Connected
Before handling any user request, establish a connection to the backend API. Show a brief status like "Connecting...".
If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:
- - Generate a UUID as client identifier
- POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header - The response includes a
token with 100 free credits valid for 7 days — use it as NEMO_TOKEN
Then create a session by POSTing to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.
Tell the user you're ready. Keep the technical details out of the chat.
Turn Still Moments Into Moving Stories
Still images tell a story, but videos make people feel it. The image-to-video-free skill bridges that gap by giving you a fast, accessible way to convert your photos into shareable video content — no expensive software, no steep learning curve, and no hidden costs.
Whether you have a single hero image you want to breathe life into or a whole gallery you'd like to sequence into a slideshow reel, this skill handles the heavy lifting. You describe what you want — the mood, the pacing, the style — and the skill generates a video that matches your vision. Think Ken Burns-style zooms, fade transitions, beat-matched cuts, or simple pan effects that make a flat photo feel alive.
This is built for creators who move fast. Social media managers juggling multiple platforms, Etsy sellers showcasing products, event photographers turning shoot highlights into recap reels — all of them can produce polished video output in minutes rather than hours. No timeline scrubbing, no keyframe headaches. Just describe, generate, and share.
Routing Your Animation Requests
When you submit a photo, your request is parsed for motion style, duration, and output resolution before being dispatched to the appropriate image-to-video rendering pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Rendering API Reference
All frame interpolation and motion synthesis happens on distributed GPU clusters via the image-to-video API, so your local device handles zero heavy lifting. Keyframe extraction, temporal coherence, and mp4 encoding are all managed server-side before the animated clip is returned to you.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE8 - INLINECODE9 : from frontmatter INLINECODE10
- INLINECODE11 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE21
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE31
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE35
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE39
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Use Cases
The image-to-video-free skill fits naturally into a wide range of real-world workflows. E-commerce sellers use it to turn product flat-lays into short promotional clips for Instagram Stories or Facebook Ads — video posts consistently outperform static images for click-through rates.
Real estate agents convert property photo galleries into walkthrough-style video tours that can be embedded on listings or shared via email without hiring a videographer. Event planners and photographers create same-day recap reels by feeding in a selection of event shots and getting a shareable highlight video within minutes.
Content creators and bloggers use it to repurpose existing photo libraries into fresh video content, extending the shelf life of work they've already done. Even educators and nonprofit communicators find value here — turning infographic images or campaign photos into compelling video narratives for presentations and social outreach.
Quick Start Guide
Getting your first image-to-video output is straightforward. Start by gathering the images you want to use — JPG, PNG, and WebP formats all work well. The clearer and higher-resolution your source images, the better your final video will look.
Next, tell the skill how you want the video to feel. Mention the number of images, how long each should appear on screen, what kind of transitions you prefer (fades, zooms, slides), and the target platform if relevant (vertical for TikTok/Reels, horizontal for YouTube, square for feed posts).
For best results, describe the emotional tone too — 'energetic and fast-paced' versus 'calm and cinematic' will produce noticeably different outputs. Once generated, you can request tweaks like adjusting timing, swapping transition styles, or reordering the image sequence. Iterating is fast, so don't hesitate to refine.
Troubleshooting
If your generated video feels choppy or the transitions look abrupt, the most common fix is specifying a longer display duration per image — try 3 to 5 seconds per photo instead of 1 to 2. Fast cuts work great for high-energy content but can feel jarring with portrait or landscape photography.
If the output aspect ratio doesn't match your target platform, explicitly state the format in your prompt. For example, '9:16 vertical for TikTok' or '16:9 horizontal for YouTube' will steer the output correctly from the start.
Low-resolution or heavily compressed source images can result in visible pixelation, especially when zoom effects are applied. If this happens, try using the highest-resolution versions of your photos available. If you're working with older or smaller images, request subtler motion effects like gentle pans rather than close-up zooms to minimize quality loss.
开始使用
欢迎!我在这里帮你将照片完全免费地转化为吸睛视频——只需分享你的图片,告诉我你想要的风格或氛围,让我们一起创作值得观看的内容。
试试这样说:
- - 将我的5张产品照片转换为15秒的Instagram Reels,带有平滑缩放过渡和明快节奏
- 用慢速肯·伯恩斯缩放效果为我的单张风景照片制作动画,用于电影风格的YouTube片头
- 从10张婚礼照片创建幻灯片视频,采用优雅淡入淡出过渡,每张图片显示3秒
建立连接
在处理任何用户请求之前,先建立与后端API的连接。显示一个简短的状态,如正在连接...。
如果环境中存在NEMO_TOKEN,直接使用它并创建一个会话。否则,获取一个免费的起始令牌:
- - 生成一个UUID作为客户端标识符
- 使用X-Client-Id头信息POST到https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
- 响应中包含一个token,带有100个免费积分,有效期为7天——将其用作NEMO_TOKEN
然后创建会话,通过Bearer授权POST到https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent,请求体为{taskname:project,language:en}。响应中的session_id在后续所有请求中都需要使用。
告诉用户你已准备就绪。将技术细节排除在对话之外。
将静止瞬间转化为动态故事
静态图片讲述故事,但视频让人感受故事。图像转视频技能通过提供快速、便捷的方式将照片转化为可分享的视频内容,弥合了这一差距——无需昂贵的软件,无需陡峭的学习曲线,也无需隐藏费用。
无论你有一张想要注入生命的主角图像,还是有一整个图库想要编排成幻灯片卷轴,这个技能都能处理繁重的工作。你描述你想要的内容——氛围、节奏、风格——技能就会生成符合你愿景的视频。想象一下肯·伯恩斯风格的缩放、淡入淡出过渡、节拍匹配的剪辑或简单的平移效果,让平面照片变得生动。
这是为快速行动的创作者打造的。管理多个平台的社交媒体经理、展示产品的Etsy卖家、将拍摄亮点转化为回顾卷轴的活动摄影师——他们都能在几分钟而不是几小时内制作出精美的视频输出。无需时间线调整,无需关键帧烦恼。只需描述、生成和分享。
路由你的动画请求
当你提交照片时,你的请求会被解析出运动风格、持续时间和输出分辨率,然后被分派到适当的图像转视频渲染管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有(生成、编辑、添加背景音乐…) | → §3.1 SSE | ❌ |
云端渲染API参考
所有帧插值和运动合成都通过图像转视频API在分布式GPU集群上完成,因此你的本地设备无需处理任何繁重工作。关键帧提取、时间一致性和mp4编码都在服务器端管理,然后将动画片段返回给你。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:image-to-video-free
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则 unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},附带 Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分查询:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=持续时间(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请在你的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请注册或升级你的计划以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
使用场景
图像转视频技能自然地适用于各种实际工作流程。电商卖家用它来将产品平铺图转化为Instagram故事或Facebook广告的短促销片段——视频帖子在点击率方面始终优于静态图片。
房地产经纪人将物业照片图库转化为导览式视频参观,可以嵌入房源列表或通过电子邮件