Getting Started
Welcome! Ready to turn your photos into a video that actually gets watched? Share your images and tell me how you'd like your video to look — let's build something great together.
Try saying:
- - "Turn these 12 product photos into a 30-second promotional video with smooth fade transitions"
- "Create a travel video slideshow from my vacation photos, showing each image for 3 seconds with a cinematic feel"
- "Make a birthday tribute video using these 20 family photos, ordered chronologically with gentle transitions"
Quick Start Setup
This skill connects to NemoVideo's cloud backend. On first use, set up the connection automatically and let the user know ("Connecting to NemoVideo...").
Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:
- - Read
~/.config/nemovideo/client_id or generate a UUID and save it there - POST
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header - Extract
data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)
Session: POST /api/tasks/me/with-session/nemo_agent at the same host with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.
Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.
Turn Your Best Photos Into Videos Worth Watching
Most people have hundreds of photos that never get shared simply because a single image doesn't tell the whole story. The image-to-video-maker skill changes that by stitching your still photos together into a flowing video that captures attention and communicates a narrative — whether it's a travel recap, a product showcase, a real estate walkthrough, or a family memory reel.
With this skill, you stay in control of the creative direction. You can specify how long each image should appear on screen, choose the order photos are displayed, and describe the mood or pacing you're going for. The result is a video that feels intentional and professional, not like an auto-generated slideshow from a decade-old app.
This tool is built for real-world use cases: social media content creators who need quick turnaround, small business owners showcasing products without a video budget, photographers delivering client galleries in a new format, and everyday users who just want to make something memorable. Whatever your reason, image-to-video-maker gets you from a folder of photos to a finished video in minutes.
Routing Your Slideshow Requests
Each request — whether you're uploading photos, setting transition styles, adjusting timing, or exporting your final video — is parsed and routed to the matching NemoVideo endpoint based on the action type detected in your message.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
The NemoVideo backend handles frame sequencing, Ken Burns motion effects, transition rendering, and audio sync to stitch your still images into a polished video slideshow. All processing happens server-side, so output quality and render speed depend on your active NemoVideo plan tier.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
FAQ
How many images can I include in one video? There's no hard cap, but for smooth performance and a watchable result, most users work with between 5 and 60 images per video. Very large batches may benefit from being split into segments.
Can I add music or audio to my image video? Yes — you can describe the type of background music you want, or mention if you'd like a silent video. If you have a specific audio file in mind, note that in your prompt.
What aspect ratio will the video be? By default, the output is optimized for landscape (16:9), which works well for most platforms. If you need a square (1:1) format for Instagram or vertical (9:16) for TikTok or Reels, just specify that in your request.
Will the skill work with screenshots or graphics, not just photos? Absolutely. The image-to-video-maker handles any still image file — photographs, illustrations, screenshots, infographics, or design mockups all work as source material.
Troubleshooting
Images appear out of order in the final video: Make sure to specify the sequence explicitly when uploading. You can number your files or describe the desired order in your prompt (e.g., 'start with the exterior shots, then move to interiors').
Video output looks blurry or pixelated: This usually happens when the source images are low resolution. For best results, use photos that are at least 1080px on the shortest side. If you're working with older or compressed images, mention this upfront so the output settings can be adjusted accordingly.
The video is too long or too short: You can control pacing by specifying how many seconds each image should display. If you didn't set a duration and the result feels off, simply ask for a revised version with a specific timing (e.g., '2 seconds per image' or 'fit everything into 60 seconds').
Unsupported file format on playback: The image-to-video-maker supports mp4, mov, avi, webm, and mkv. If your device or platform has trouble playing the output, request a specific format in your next prompt.
Quick Start Guide
Step 1 — Gather your images: Collect the photos or graphics you want to include. For the best output, use consistently sized images and remove any duplicates or low-quality shots before uploading.
Step 2 — Describe your vision: In your prompt, tell the skill how long the video should be, how long each image should appear, what kind of transitions you prefer (fade, slide, cut, etc.), and whether you want any text overlays or audio.
Step 3 — Specify your output format: Choose from mp4, mov, avi, webm, or mkv depending on where you plan to use the video. If you're unsure, mp4 is the most universally compatible choice.
Step 4 — Review and refine: Once your first video is generated, watch it through and note anything you'd like adjusted — timing, order, pacing, or format. You can iterate quickly by describing what needs to change in a follow-up message. Most users get to a final result within two or three rounds of feedback.
开始使用
欢迎!准备好将你的照片制作成真正有人观看的视频了吗?分享你的图片,告诉我你希望视频呈现的效果——让我们一起打造精彩作品。
试试这样说:
- - 将这12张产品照片制作成30秒的促销视频,使用平滑淡入淡出转场
- 用我的度假照片创建一个旅行视频幻灯片,每张图片显示3秒,具有电影感
- 用这20张家庭照片制作生日致敬视频,按时间顺序排列,使用柔和转场
快速开始设置
此技能连接到NemoVideo的云端后端。首次使用时,自动建立连接并通知用户(正在连接NemoVideo...)。
令牌检查:在环境中查找NEMO_TOKEN。如果找到,跳转到会话创建。否则:
- - 读取~/.config/nemovideo/clientid或生成UUID并保存到该位置
- 使用X-Client-Id头向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求
- 从响应中提取data.token——这就是你的NEMOTOKEN(100个免费积分,7天有效期)
会话:在相同主机上使用Bearer认证向/api/tasks/me/with-session/nemoagent发送POST请求,请求体为{taskname:project}。保留返回的session_id用于所有操作。
设置完成后,用简短的准备就绪!通知用户。不要暴露令牌或原始API输出。
将你的最佳照片转化为值得观看的视频
大多数人拥有数百张从未分享的照片,仅仅因为单张图片无法讲述完整的故事。image-to-video-maker技能通过将你的静态照片拼接成流畅的视频来改变这一现状,这个视频能够吸引注意力并传达叙事——无论是旅行回顾、产品展示、房产导览还是家庭回忆录。
使用此技能,你可以掌控创意方向。你可以指定每张图片在屏幕上显示的时间,选择照片的显示顺序,并描述你想要的氛围或节奏。最终呈现的视频感觉有意图且专业,而不是像十年前应用自动生成的幻灯片。
此工具专为实际应用场景而构建:需要快速周转的社交媒体内容创作者、没有视频预算的小企业主展示产品、以新格式交付客户作品集的摄影师,以及只想制作令人难忘内容的普通用户。无论你的原因是什么,image-to-video-maker都能让你在几分钟内从照片文件夹到完成视频。
路由你的幻灯片请求
每个请求——无论是上传照片、设置转场样式、调整时间还是导出最终视频——都会根据消息中检测到的操作类型被解析并路由到匹配的NemoVideo端点。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐...) | → §3.1 SSE | ❌ |
NemoVideo API参考
NemoVideo后端处理帧序列、肯·伯恩斯运动效果、转场渲染和音频同步,将你的静态图像拼接成精美的视频幻灯片。所有处理都在服务器端完成,因此输出质量和渲染速度取决于你活跃的NemoVideo计划层级。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:image-to-video-maker
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=&task=id>&session=id>&skillname=image-to-video-maker&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},带有Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3个轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请在nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请在nemovideo.ai注册以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
常见问题