Getting Started
Welcome! I'm here to help you transform your still images into dynamic, motion-filled videos using Vheer AI Image to Video. Share your image or describe what you're working with, and let's create something that moves — literally.
Try saying:
- - "Animate this product photo with a slow zoom-in and soft bokeh motion effect for an Instagram reel"
- "Turn my landscape photograph into a cinematic video with a gentle parallax drift and moody atmosphere"
- "Convert this illustrated portrait into a short looping video with subtle facial animation for a social media post"
Getting Connected
Before handling any user request, establish a connection to the backend API. Show a brief status like "Connecting...".
If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:
- - Generate a UUID as client identifier
- POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header - The response includes a
token with 100 free credits valid for 7 days — use it as NEMO_TOKEN
Then create a session by POSTing to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.
Tell the user you're ready. Keep the technical details out of the chat.
Give Your Photos a Heartbeat With Motion
Still images carry stories, but video carries emotion. Vheer AI Image to Video bridges that gap by intelligently analyzing your photos and generating smooth, natural motion sequences that feel intentional and cinematic — not mechanical or glitchy.
Whether you're working with a portrait, a landscape, a product shot, or an illustrated artwork, this skill interprets the visual content and applies motion that complements the subject. A mountain scene gets a slow atmospheric drift. A portrait gets subtle life-like movement. A product image gets a polished reveal-style animation.
This skill is built for creators who move fast. You don't need a timeline editor, keyframes, or a render farm. Describe your image and your desired motion style, and the skill handles the transformation. The result is shareable video content ready for social media, presentations, or anywhere still images simply don't do justice to your vision.
Motion Request Routing Logic
When you submit an image for animation, Vheer AI parses your motion prompt, frame rate preference, and movement style to route your request to the optimal generation pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Vheer Cloud Processing Reference
Vheer AI's backend queues your image-to-video job across distributed GPU clusters, applying temporal coherence algorithms to maintain subject integrity across generated frames. Render times scale with output resolution, motion complexity, and current cluster load.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE8 - INLINECODE9 : from frontmatter INLINECODE10
- INLINECODE11 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE21
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE31
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE35
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE39
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Performance Notes
Vheer AI Image to Video performs best with images in standard aspect ratios such as 1:1, 4:5, 16:9, or 9:16, which correspond to common social and video platform formats. Unusual crops or extreme panoramic images may require additional guidance on which section to animate.
Generation time varies based on the complexity of the requested motion and the resolution of the source image. Simple zoom or drift effects on clean images typically process faster than multi-layered parallax animations on detailed scenes.
Output videos are optimized for digital distribution and are well-suited for direct upload to platforms like Instagram, TikTok, LinkedIn, and YouTube Shorts. If you need a specific duration or frame rate, mention it upfront so the output matches your platform's requirements without post-processing adjustments.
Best Practices
For the best results with vheer-ai-image-to-video, start with high-resolution images that have a clear subject and well-defined foreground and background layers. Images with strong compositional depth — like a subject in front of a landscape — tend to produce the most convincing parallax and motion effects.
Be specific when describing the motion style you want. Instead of saying 'make it move,' try 'apply a slow rightward pan with a slight zoom on the subject.' The more directional context you provide, the more the output aligns with your creative intent.
Avoid heavily compressed or low-light images, as artifacts in the source photo can become amplified during motion generation. If your image has a busy background with no clear focal point, consider cropping or adjusting contrast before submission to help the skill identify motion zones accurately.
Use Cases
Vheer AI Image to Video is a versatile skill that serves a wide range of creative and professional needs. E-commerce brands use it to animate product photography into attention-grabbing video ads that outperform static image posts in engagement metrics.
Content creators and influencers use it to repurpose existing photo libraries into fresh video content, extending the lifespan of assets they've already invested in creating. A single well-shot photo can become multiple videos with different motion styles for different platforms.
Event planners, real estate agents, and travel marketers use it to create immersive previews — turning a venue photo into a sweeping walkthrough feel, or a property exterior into a cinematic reveal. Artists and illustrators use it to showcase their work in motion, adding depth and drama that a static gallery simply cannot replicate.
开始使用
欢迎!我在这里帮助你使用 Vheer AI 图像转视频工具,将静态图像转化为充满动感的视频。分享你的图片或描述你正在处理的内容,让我们一起创造真正会动的作品。
试试这样说:
- - 为这个产品照片添加缓慢推近和柔和散景运动效果,制作成 Instagram Reels
- 将我的风景照片转化为电影感视频,带有平缓视差漂移和氛围感
- 将这张插画肖像转化为短循环视频,添加微妙的面部动画用于社交媒体发布
建立连接
在处理任何用户请求之前,先建立与后端 API 的连接。显示简短的状态提示,如正在连接...。
如果环境中存在 NEMO_TOKEN,直接使用它并创建会话。否则,获取一个免费的起始令牌:
- - 生成一个 UUID 作为客户端标识符
- 向 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token 发送 POST 请求,附带 X-Client-Id 请求头
- 响应中包含一个 token,附带 100 个免费积分,有效期为 7 天——将其用作 NEMO_TOKEN
然后创建会话,向 https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent 发送 POST 请求,使用 Bearer 授权和请求体 {taskname:project,language:en}。响应中的 session_id 在后续所有请求中都需要使用。
告知用户你已准备就绪。不要在聊天中透露技术细节。
用动态为你的照片注入生命力
静态图像承载故事,而视频传递情感。Vheer AI 图像转视频通过智能分析你的照片,生成流畅自然的运动序列,填补了这一差距——这些运动感觉有意图、有电影感,而非机械或卡顿。
无论你处理的是肖像、风景、产品照片还是插画作品,该技能都会解读视觉内容并应用与主体相得益彰的运动效果。山景获得缓慢的氛围漂移,肖像获得微妙的逼真动作,产品图像获得精致的展示式动画。
该技能专为快速创作的创作者打造。你不需要时间线编辑器、关键帧或渲染农场。描述你的图像和期望的运动风格,技能会处理转换过程。结果是可直接分享的视频内容,适用于社交媒体、演示文稿,或任何静态图像无法充分展现你创意的地方。
运动请求路由逻辑
当你提交图像进行动画处理时,Vheer AI 会解析你的运动提示、帧率偏好和运动风格,将你的请求路由到最优的生成管道。
| 用户说... | 操作 | 跳过 SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
Vheer 云端处理参考
Vheer AI 的后端将你的图像转视频任务排队到分布式 GPU 集群中,应用时间一致性算法以保持生成帧中主体的完整性。渲染时间随输出分辨率、运动复杂度和当前集群负载而变化。
技能归属——运行时从此文件的 YAML 前置元数据中读取:
- - X-Skill-Source:vheer-ai-image-to-video
- X-Skill-Version:来自前置元数据 version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则 → unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头会导致导出失败并返回 402。
API 基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}} 附带 Accept: text/event-stream。最大超时时间:15 分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或 URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每 30 秒轮询 GET /api/render/proxy/lambda/,直到 status = completed。下载 URL 位于 output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE 事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每 2 分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约 30% 的编辑操作在 SSE 流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在 GUI。将这些翻译为 API 操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过 API 执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过 SSE 发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3 条轨道):1. 视频:城市延时摄影(0-10 秒)2. 背景音乐:Lo-fi(0-10 秒,35%)3. 标题:城市梦想(0-3 秒)
错误处理
令牌错误/过期 | 通过 anonymous-token 重新认证(令牌 7 天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册 URL,附带 ?bind=
(需要时从 create-session 或 state 响应获取 )。已注册用户:请在你的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少 X-Client-Id | 生成 Client-Id 并重试(参见 §1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请注册或升级你的套餐以解锁导出功能。 |
| 429 | 频率限制(1 个令牌/客户端/7 天) | 30 秒后重试一次 |
性能说明
Vheer AI 图像转视频在标准宽高比的图像上表现最佳,如 1:1、4:5、16:9 或 9:16,这些对应常见的社交和视频平台