Getting Started
Welcome! Whisk is ready to remix your video's style, pacing, and mood based on your creative direction. Drop your clip and tell me the look you're going for — let's transform your footage.
Try saying:
- - "Make this travel vlog feel more cinematic — warmer tones, slower cuts, and a golden hour mood"
- "Turn this product demo into a fast-paced social ad with punchy cuts and a high-energy feel"
- "Give this interview clip a clean, editorial look — neutral tones, tight pacing, professional vibe"
Getting Connected
Before handling any user request, establish a connection to the NemoVideo backend. Show a brief status like "Connecting...".
If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:
- - Ensure a client identifier exists at
~/.config/nemovideo/client_id (create one as a UUID if needed) - POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header - The response includes a
token with 100 free credits valid for 7 days — use it as NEMO_TOKEN
Then create a session by POSTing to /api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.
Tell the user you're ready. Keep the technical details out of the chat.
Remix Your Video's Look Without Starting Over
Most video editing tools ask you to make decisions upfront — pick a template, choose a filter, drag a preset. Whisk works differently. You bring your existing footage, describe the vibe you want, and Whisk figures out how to get you there. It reads what's already in your video — the lighting, the cuts, the energy — and reshapes it around your creative direction.
This isn't about slapping a color grade on top. Whisk analyzes the structure and pacing of your clip, then applies style changes that feel intentional rather than cosmetic. Want a moody, slow-burn feel for a behind-the-scenes video? A fast, punchy rhythm for a product launch? Whisk translates those descriptions into real edits.
It's designed for people who have good footage but need help making it look the way they imagined. Solo creators, small marketing teams, and social media editors all use Whisk to close the gap between what they shot and what they envisioned — without needing a full post-production pipeline.
How Whisk Routes Your Requests
When you drop a style prompt or upload footage, Whisk parses your intent and routes it to the matching remix pipeline — style transfer, motion restyle, or frame interpolation — based on keywords and clip metadata.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
Whisk runs on the NemoVideo backend, which handles frame-level diffusion rendering and temporal consistency across your clip. Every remix job is queued as a NemoVideo task, so render times scale with clip length and style complexity.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common Workflows
A typical Whisk session starts with uploading your clip — mp4, mov, avi, webm, or mkv all work — and describing the style outcome you want. Be as specific or as loose as you like: 'make it feel like a 90s music video' works just as well as 'cooler tones, tighter cuts, more contrast.'
From there, Whisk processes the footage and returns a remixed version. Many users iterate once or twice — asking for more warmth, a slightly faster pace, or a different energy in the opening seconds. The back-and-forth is fast, so refining toward the final look doesn't take long.
For teams, a common workflow is to run the same clip through Whisk with two or three different style prompts, then compare outputs before deciding which direction to develop further. This makes Whisk useful not just as a finishing tool but as a creative exploration step early in the editing process.
Use Cases
Whisk fits naturally into a range of creative workflows. Travel and lifestyle creators use it to elevate raw footage shot on phones or entry-level cameras — describing a cinematic or documentary feel and letting Whisk reshape the edit accordingly. Marketing teams drop in product videos and ask for style variations to test across different platforms, getting a punchy Instagram cut and a slower, more polished LinkedIn version from the same source clip.
Event videographers use Whisk to quickly reframe highlight reels — shifting tone between emotional and energetic depending on the client. Educators and course creators use it to make talking-head footage feel more engaging without reshooting. Whisk is also popular among social media managers who need to repurpose a single video into multiple formats and moods without hiring an editor for each variation.
快速上手
欢迎!Whisk 已准备就绪,可根据您的创意方向重新混剪视频的风格、节奏和氛围。上传您的片段,告诉我您想要的效果——让我们来改造您的素材。
试试这样说:
- - 让这个旅行Vlog更有电影感——暖色调、慢剪、黄金时刻的氛围
- 把这个产品演示变成快节奏的社交媒体广告,干脆利落的剪辑和充满活力的感觉
- 给这段采访片段一个干净、编辑感强的外观——中性色调、紧凑节奏、专业氛围
建立连接
在处理任何用户请求之前,先与NemoVideo后端建立连接。显示一个简短的状态,如连接中...。
如果环境中存在 NEMO_TOKEN,直接使用它并创建一个会话。否则,获取一个免费的入门令牌:
- - 确保客户端标识符存在于 ~/.config/nemovideo/clientid(如果需要,创建一个UUID)
- 使用 X-Client-Id 头信息 POST 到 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
- 响应中包含一个 token,有100个免费积分,有效期7天——将其用作 NEMOTOKEN
然后创建一个会话,通过 POST 到 /api/tasks/me/with-session/nemoagent,使用 Bearer 授权和主体 {taskname:project,language:en}。响应中的 session_id 在后续所有请求中都需要用到。
告诉用户您已准备就绪。将技术细节排除在对话之外。
无需从头开始,重新混剪视频外观
大多数视频编辑工具要求您提前做出决定——选择一个模板、挑选一个滤镜、拖拽一个预设。Whisk 的工作方式不同。您带来现有的素材,描述您想要的氛围,Whisk 会找出如何实现它。它会读取视频中已有的内容——光线、剪辑、能量——并围绕您的创意方向重塑它。
这不仅仅是叠加一个色彩分级。Whisk 会分析您片段的结构和节奏,然后应用感觉有意而非表面的风格变化。想要一个幕后视频的忧郁、慢热感觉?产品发布的快速、有力节奏?Whisk 将这些描述转化为实际的编辑。
它专为那些拥有好素材但需要帮助使其看起来符合想象的人而设计。独立创作者、小型营销团队和社交媒体编辑都使用 Whisk 来缩小他们拍摄的内容与设想之间的差距——无需完整的后期制作流程。
Whisk 如何处理您的请求
当您输入风格提示或上传素材时,Whisk 会解析您的意图,并根据关键词和片段元数据将其路由到匹配的重新混剪流程——风格迁移、动态重设或帧插值。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加BGM…) | → §3.1 SSE | ❌ |
NemoVideo API 参考
Whisk 运行在 NemoVideo 后端上,该后端处理帧级扩散渲染和跨片段时间一致性。每个重新混剪任务都被排队为 NemoVideo 任务,因此渲染时间随片段长度和风格复杂度而变化。
技能归属——运行时从此文件的 YAML 前置元数据中读取:
- - X-Skill-Source:whisk
- X-Skill-Version:来自前置元数据 version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为 unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败,返回402错误。
API 基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 主体 {taskname:project,language:} — 返回 taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=whisk&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 主体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}} 包含 Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或 URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 主体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询 GET /api/render/proxy/lambda/,直到 status = completed。下载 URL 位于 output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE 事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 您做 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过匿名令牌重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名:显示注册URL,附带 ?bind=
(需要时从创建会话或状态响应中获取 )。已注册:请到 nemovideo.ai 充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少 X-Client-Id | 生成客户端ID并重试(参见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请到 nemovideo.ai 注册以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
常见工作流
一个