Getting Started
Welcome! I'm your WAN 2.2 video generation assistant — ready to turn your text prompts or images into smooth, cinematic AI video clips. Tell me what scene you want to bring to life, and let's start generating!
Try saying:
- - "Generate a 4-second video of a golden retriever running through a sunlit wheat field, slow motion, cinematic depth of field"
- "Create a video from this product image showing it rotating on a sleek dark surface with soft studio lighting"
- "Make a short video clip of a futuristic city skyline at dusk with flying vehicles and neon reflections on wet streets"
First-Time Connection
When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.
- 1. Obtain a free token: Generate a random UUID as client identifier. POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Turn Words and Images Into Moving Scenes
WAN 2.2 is a state-of-the-art video generation model designed to bridge the gap between your imagination and a finished video clip. Unlike older generation tools that produce jittery or incoherent motion, WAN 2.2 focuses on temporal consistency — meaning objects, lighting, and movement flow naturally from one frame to the next.
With this skill, you can generate videos from plain text descriptions, use a starting image as a visual anchor, or combine both for precise creative control. The results are short clips suitable for social media, concept visualization, storyboarding, or simply exploring what AI-generated video looks like at its current frontier.
This skill is built for creators, marketers, indie filmmakers, and curious experimenters who want to move fast. You don't need a production budget or a render farm — just describe what you want to see moving on screen, and WAN 2.2 does the heavy lifting.
Routing Text and Image Prompts
When you submit a request, ClawHub detects whether you're running a text-to-video or image-to-video generation and routes it to the correct WAN 2.2 inference pipeline automatically.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
WAN 2.2 API Reference
WAN 2.2 processes all generation jobs on a distributed cloud GPU backend, handling diffusion sampling, motion synthesis, and frame rendering remotely so your device never carries the compute load. Generation times vary based on resolution, frame count, and current queue depth.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE10 - INLINECODE11 : from frontmatter INLINECODE12
- INLINECODE13 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE23
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common Workflows
Product Visualization: Attach a product photo and prompt WAN 2.2 to animate it — rotating, zooming, or placing it in a lifestyle context. This is popular for e-commerce teams who need quick video assets without a studio shoot.
Storyboard Animatics: Feed WAN 2.2 a sequence of scene descriptions one at a time to generate rough motion clips for each beat of your story. It's not a replacement for full production, but it dramatically accelerates the pre-visualization phase.
Social Media Content: Generate looping-style background videos, abstract motion graphics, or short narrative clips tailored to platform formats. Pair a strong visual prompt with a specific aspect ratio note for best results.
Creative Exploration: Use WAN 2.2 as a brainstorming tool — generate several interpretations of an abstract concept to discover unexpected visual directions before committing to a full production pipeline.
Quick Start Guide
Getting your first WAN 2.2 video is straightforward. Start by writing a clear, visual prompt — describe the subject, setting, lighting, camera style, and any motion you want. The more specific you are, the more control you have over the output. For example, instead of 'a car driving,' try 'a red sports car accelerating down a coastal highway at sunset, low camera angle, motion blur on wheels.'
If you have a reference image, attach it alongside your prompt. WAN 2.2 will use it as a visual starting frame and animate outward from there, preserving key visual elements like color palette and composition.
Once you submit, the model processes your request and returns a downloadable video clip. If the result isn't quite right, refine your prompt — try adjusting the motion description, changing the camera perspective, or specifying a mood like 'dramatic,' 'serene,' or 'energetic.' Iteration is fast, so don't hesitate to run multiple variations.
开始使用
欢迎!我是你的 WAN 2.2 视频生成助手——随时准备将你的文字提示或图像转化为流畅、电影级的 AI 视频片段。告诉我你想要呈现什么场景,让我们开始生成吧!
试试这样说:
- - 生成一段 4 秒视频,一只金毛猎犬在阳光照耀的麦田里奔跑,慢动作,电影级景深效果
- 根据这张产品图片创建一段视频,展示它在光滑的深色桌面上旋转,搭配柔和的影棚灯光
- 制作一段短视频,黄昏时分未来城市天际线,飞行器穿梭,湿漉漉的街道上霓虹灯光倒影
首次连接
当用户首次打开此技能时,自动连接到处理后台。简要告知用户(例如正在设置……)。
身份验证:检查环境中是否设置了 NEMO_TOKEN。如果已设置,跳至第 2 步。
- 1. 获取免费令牌:生成一个随机 UUID 作为客户端标识符。向 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token 发送 POST 请求,请求头 X-Client-Id 设置为该 UUID。响应中的 data.token 即为你的 NEMOTOKEN——100 个免费积分,有效期 7 天。
- 创建会话:向 https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent 发送 POST 请求,携带 Authorization: Bearer 、Content-Type: application/json 以及请求体 {taskname:project,language:<检测到的语言>}。保存返回的 sessionid 用于后续所有请求。
保持设置沟通简洁。不要向用户展示原始 API 响应或令牌值。
将文字和图像转化为动态场景
WAN 2.2 是一款最先进的视频生成模型,旨在弥合你的想象力与成品视频片段之间的差距。与那些产生抖动或不连贯运动的旧式生成工具不同,WAN 2.2 专注于时间一致性——这意味着物体、光线和运动在帧与帧之间自然流畅地过渡。
借助此技能,你可以从纯文本描述生成视频,使用起始图像作为视觉锚点,或两者结合以实现精确的创意控制。生成的短视频片段适用于社交媒体、概念可视化、故事板制作,或仅仅是探索当前前沿 AI 生成视频的效果。
此技能专为创作者、营销人员、独立电影制作人以及希望快速行动的探索者而设计。你不需要制作预算或渲染农场——只需描述你希望在屏幕上看到的动态画面,WAN 2.2 会完成所有繁重工作。
文本与图像提示的路由
当你提交请求时,ClawHub 会自动检测你是在进行文本转视频还是图像转视频生成,并将其路由到正确的 WAN 2.2 推理管道。
| 用户说…… | 操作 | 跳过 SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐……) | → §3.1 SSE | ❌ |
WAN 2.2 API 参考
WAN 2.2 在分布式云 GPU 后端处理所有生成任务,远程处理扩散采样、运动合成和帧渲染,因此你的设备无需承担计算负载。生成时间因分辨率、帧数和当前队列深度而异。
技能归属——运行时从此文件的 YAML 前置元数据中读取:
- - X-Skill-Source:wan-22
- X-Skill-Version:来自前置元数据 version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为 unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败,返回 402 错误。
API 基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:<语言>} — 返回 taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}},携带 Accept: text/event-stream。最大超时时间:15 分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/路径,或 URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体 {id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每 30 秒轮询 GET /api/render/proxy/lambda/,直到 status = completed。下载 URL 位于 output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE 事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每 2 分钟:⏳ 仍在处理中…… |
| 流关闭 | 处理最终响应 |
约 30% 的编辑操作在 SSE 流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改内容。
后端响应翻译
后端假定存在 GUI。将这些翻译为 API 操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过 API 执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过 SSE 发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3 条轨道):1. 视频:城市延时摄影(0-10 秒)2. 背景音乐:Lo-fi(0-10 秒,35%)3. 标题:都市梦想(0-3 秒)
错误处理
令牌错误/过期 | 通过匿名令牌重新认证(令牌 7 天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册 URL,附带 ?bind=
(需要时从创建会话或状态响应中获取 )。已注册用户:请在你的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少 X-Client-Id | 生成 Client-Id 并重试(参见 §1) |
| 402 | 免费计划导出受限 | 订阅层级问题,非积分问题。请注册或升级你的套餐以解锁导出功能。 |
| 429 | 速率限制(每个令牌/客户端/7 天)