Getting Started
This is the seedance 2 image to video skill — drop an image and tell me how you want it to move. Let's get your still photo animated.
Try saying:
- - "Animate this product photo of a coffee cup with steam rising and a slow push-in camera move"
- "Turn this portrait photo into a short video with subtle hair movement and soft bokeh breathing in the background"
- "Take this landscape JPEG and create a 6-second clip with clouds moving left and light wind through the trees"
First-Time Connection
When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.
- 1. Obtain a free token: Generate a random UUID as client identifier. POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Turn Any Still Image Into a Moving Video Clip
The seedance-2-image-to-video skill takes a single input image and generates a short video clip — typically 4 to 8 seconds — with natural-looking motion baked in. You're not doing keyframe animation or masking. The model reads the scene and decides how things should move: water flows, hair shifts, clouds drift. It's surprisingly good at reading spatial context.
I tested this with a 1024x576 product photo of a sneaker on a clean background. The output was a smooth 720p MP4 where the shoe rotated slightly and the background had a subtle parallax effect. Total generation time was under 90 seconds. That's fast enough to fit into a real production workflow.
The prompt you write matters a lot here. Vague prompts like 'make it move' give mediocre results. Specific motion descriptions — 'slow zoom out with light wind through the grass' — consistently outperform generic ones. Short, direct motion instructions work better than long paragraph prompts.
Routing Your Animation Request
When you describe wanting to animate a still photo or reference Seedance 2, the skill matches your input to the image-to-video action and passes your uploaded image plus motion parameters directly to the Seedance 2 pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
What's Actually Happening
Your still image gets sent to Seedance 2's cloud GPU cluster where the model interprets pixel motion trajectories frame-by-frame, rendering a short video clip — typically 4-8 seconds — by predicting natural movement between synthesized intermediate frames. The rendered clip is returned as a direct download link once the cloud job completes, which usually takes 30-90 seconds depending on queue load.
Base URL: https://mega-api-prod.nemovideo.ai
| Endpoint | Method | Purpose |
|---|
| INLINECODE10 | POST | Start a new editing session. Body: {"task_name":"project","language":"<lang>"}. Returns session_id. |
| INLINECODE13 |
POST | Send a user message. Body includes
app_name,
session_id,
new_message. Stream response with
Accept: text/event-stream. Timeout: 15 min. |
|
/api/upload-video/nemo_agent/me/<sid> | POST | Upload a file (multipart) or URL. |
|
/api/credits/balance/simple | GET | Check remaining credits (
available,
frozen,
total). |
|
/api/state/nemo_agent/me/<sid>/latest | GET | Fetch current timeline state (
draft,
video_infos,
generated_media). |
|
/api/render/proxy/lambda | POST | Start export. Body:
{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll status every 30s. |
Accepted file types: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE30 - INLINECODE31 : from frontmatter INLINECODE32
- INLINECODE33 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
Include Authorization: Bearer <NEMO_TOKEN> and all attribution headers on every request — omitting them triggers a 402 on export.
Error Codes
- -
0 — success, continue normally - INLINECODE41 — token expired or invalid; re-acquire via INLINECODE42
- INLINECODE43 — session not found; create a new one
- INLINECODE44 — out of credits; anonymous users get a registration link with
?bind=<id>, registered users top up - INLINECODE46 — unsupported file type; show accepted formats
- INLINECODE47 — file too large; suggest compressing or trimming
- INLINECODE48 — missing
X-Client-Id; generate one and retry - INLINECODE50 — free plan export blocked; not a credit issue, subscription tier
- INLINECODE51 — rate limited; wait 30s and retry once
Reading the SSE Stream
Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.
About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Tips and Tricks That Actually Help
Honestly the best approach is to be specific about camera movement AND subject movement separately in your prompt. 'Slow dolly forward while the flames flicker' gives the model two clear jobs. Combining them in one instruction beats writing two vague ones.
Image quality going in directly affects output quality. A blurry or heavily compressed JPEG at 800x600 produces noticeably softer video than a clean 1920x1080 PNG. Don't expect the model to sharpen a bad source file.
Hot take: most people over-prompt this thing. A 40-word motion description rarely beats a clean 12-word one. The model doesn't need your full creative brief — it needs one or two motion anchors to work from. Keep it tight and re-run if the first result isn't right. Generation is fast enough that two or three attempts costs you maybe 4 minutes total.
Common Workflows for Seedance 2 Image to Video
The most common use case I see is social content — taking a single hero image and generating a loopable or near-loopable MP4 for Instagram Reels or TikTok. A 1080x1920 portrait-mode image works best for that format, and the model handles vertical crops well.
Product teams use it differently. They'll take a clean white-background product shot and generate a 5-second clip with a slow orbit or a gentle camera push. Beats paying for a full product video shoot for a single SKU.
Another solid workflow: concept artists drop in their illustrations and use the skill to preview how a scene might feel in motion before pitching it. It's not a final deliverable — it's a fast directional test. The output is an MP4, so it drops straight into Premiere or Final Cut without any conversion step.
快速入门
这是Seedance 2图像转视频技能——上传一张图片,告诉我你希望它如何动起来。让我们为你的静态照片注入生命力。
试试这样说:
- - 让这张咖啡杯产品照片动起来,蒸汽升腾,镜头缓慢推进
- 将这张人像照片转为短视频,发丝轻微飘动,背景有柔和的散景呼吸效果
- 用这张风景JPEG图片生成一段6秒的短片,云朵向左移动,微风穿过树林
首次连接
当用户首次打开此技能时,自动连接到处理后台。简要告知用户(例如正在设置...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,直接跳至步骤2。
- 1. 获取免费令牌:生成一个随机UUID作为客户端标识符。向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头设置X-Client-Id为该UUID。响应中的data.token即为你的NEMOTOKEN——100个免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,请求头包含Authorization: Bearer 和Content-Type: application/json,请求体为{taskname:project,language:<检测到的语言>}。保存返回的sessionid用于后续所有请求。
保持设置沟通简洁。不要向用户显示原始API响应或令牌值。
将任何静态图像转换为动态视频片段
seedance-2-image-to-video技能接收单张输入图像,生成一段短视频片段——通常为4到8秒——内置自然的运动效果。你不需要进行关键帧动画或遮罩处理。模型会解读场景并决定物体应如何运动:水流、发丝飘动、云朵漂移。它在理解空间上下文方面表现出色。
我用一张1024x576的白色背景运动鞋产品照片进行了测试。输出是一段流畅的720p MP4视频,鞋子轻微旋转,背景呈现微妙的视差效果。总生成时间不到90秒。这个速度足以融入实际生产工作流程。
你编写的提示词至关重要。像让它动起来这样模糊的提示词效果平平。具体的运动描述——缓慢拉远镜头,微风穿过草地——始终优于泛泛的提示。简短、直接的运动指令比长篇提示段落效果更好。
路由你的动画请求
当你描述想要让静态照片动起来或引用Seedance 2时,技能会将你的输入匹配到图像转视频操作,并将你上传的图像和运动参数直接传递给Seedance 2处理管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
实际工作原理
你的静态图像被发送到Seedance 2的云端GPU集群,模型逐帧解读像素运动轨迹,通过预测合成中间帧之间的自然运动,渲染出一段短视频片段——通常为4-8秒。云端任务完成后,渲染好的片段以直接下载链接的形式返回,根据队列负载通常需要30-90秒。
基础URL:https://mega-api-prod.nemovideo.ai
| 端点 | 方法 | 用途 |
|---|
| /api/tasks/me/with-session/nemoagent | POST | 开始新的编辑会话。请求体:{taskname:project,language:<语言>}。返回sessionid。 |
| /runsse |
POST | 发送用户消息。请求体包含app
name、sessionid、new_message。流式响应,请求头Accept: text/event-stream。超时时间:15分钟。 |
| /api/upload-video/nemo_agent/me/
| POST | 上传文件(multipart格式)或URL。 |
| /api/credits/balance/simple | GET | 检查剩余积分(available、frozen、total)。 |
| /api/state/nemoagent/me//latest | GET | 获取当前时间线状态(draft、videoinfos、generated_media)。 |
| /api/render/proxy/lambda | POST | 开始导出。请求体:{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询状态。 |
接受的文件类型:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:seedance-2-image-to-video
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
每个请求都需包含Authorization: Bearer 和所有归属头——省略会导致导出时返回402错误。
错误代码
- - 0 — 成功,正常继续
- 1001 — 令牌过期或无效;通过/api/auth/anonymous-token重新获取
- 1002 — 会话未找到;创建新会话
- 2001 — 积分不足;匿名用户会收到包含?bind=的注册链接,注册用户需充值
- 4001 — 不支持的文件类型;显示接受的格式
- 4002 — 文件过大;建议压缩或裁剪
- 400 — 缺少X-Client-Id;生成一个并重试
- 402 — 免费计划导出被阻止;非积分问题,而是订阅层级
- 429 — 请求频率限制;等待30秒后重试一次
读取SSE流
文本事件直接发送给用户(经过GUI翻译后)。工具调用保持内部处理。心跳和空的data:行表示后端仍在工作——每2分钟显示⏳ 仍在处理中...。
约30%的编辑操作会在没有任何文本的情况下关闭流。此时,轮询/api/state以确认时间线已更改,然后告知用户更新内容。
后端响应翻译
后端假定存在GUI。将其转换为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
真正有用的技巧和窍门
说实话,最好的方法是在提示词中分别明确描述镜头运动和主体运动。火焰闪烁的同时缓慢向前推轨给模型两个清晰的任务。将两者结合在一个指令中比写两个模糊的指令效果更好。
输入图像的质量直接影响输出质量。一张模糊或高度压缩的800x600 JPEG图像产生的视频明显比清晰的1920x1080 PNG图像更模糊。不要指望模型能锐化质量差的源文件。
个人观点:大多数人对这个模型的提示词过于冗长。40个字的运动描述很少能胜过简洁的12个字描述。模型不需要你的完整创意简报——它只需要一两个运动锚点来工作。保持简洁,如果第一次结果不理想就重新运行。生成速度足够快,尝试两到三次总共只需约4分钟。
Seedance 2图像转视频的常见工作流程
我见到的最常见用例是社交媒体内容——取一张主图,生成一段可循环或接近可循环的MP4视频用于Instagram Reels或TikTok。1080x