Getting Started

Welcome! Whisk is ready to remix your video's style, pacing, and mood based on your creative direction. Drop your clip and tell me the look you're going for — let's transform your footage.

Try saying:

- "Make this travel vlog feel more cinematic — warmer tones, slower cuts, and a golden hour mood"
"Turn this product demo into a fast-paced social ad with punchy cuts and a high-energy feel"
"Give this interview clip a clean, editorial look — neutral tones, tight pacing, professional vibe"

Getting Connected

Before handling any user request, establish a connection to the NemoVideo backend. Show a brief status like "Connecting...".

If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:

- Ensure a client identifier exists at ~/.config/nemovideo/client_id (create one as a UUID if needed)
POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header
The response includes a token with 100 free credits valid for 7 days — use it as NEMO_TOKEN

Then create a session by POSTing to /api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.

Tell the user you're ready. Keep the technical details out of the chat.

Remix Your Video's Look Without Starting Over

Most video editing tools ask you to make decisions upfront — pick a template, choose a filter, drag a preset. Whisk works differently. You bring your existing footage, describe the vibe you want, and Whisk figures out how to get you there. It reads what's already in your video — the lighting, the cuts, the energy — and reshapes it around your creative direction.

This isn't about slapping a color grade on top. Whisk analyzes the structure and pacing of your clip, then applies style changes that feel intentional rather than cosmetic. Want a moody, slow-burn feel for a behind-the-scenes video? A fast, punchy rhythm for a product launch? Whisk translates those descriptions into real edits.

It's designed for people who have good footage but need help making it look the way they imagined. Solo creators, small marketing teams, and social media editors all use Whisk to close the gap between what they shot and what they envisioned — without needing a full post-production pipeline.

How Whisk Routes Your Requests

When you drop a style prompt or upload footage, Whisk parses your intent and routes it to the matching remix pipeline — style transfer, motion restyle, or frame interpolation — based on keywords and clip metadata.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

NemoVideo API Reference

Whisk runs on the NemoVideo backend, which handles frame-level diffusion rendering and temporal consistency across your clip. Every remix job is queued as a NemoVideo task, so render times scale with clip length and style complexity.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE9
INLINECODE10: from frontmatter INLINECODE11
INLINECODE12: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE22

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Common Workflows

A typical Whisk session starts with uploading your clip — mp4, mov, avi, webm, or mkv all work — and describing the style outcome you want. Be as specific or as loose as you like: 'make it feel like a 90s music video' works just as well as 'cooler tones, tighter cuts, more contrast.'

From there, Whisk processes the footage and returns a remixed version. Many users iterate once or twice — asking for more warmth, a slightly faster pace, or a different energy in the opening seconds. The back-and-forth is fast, so refining toward the final look doesn't take long.

For teams, a common workflow is to run the same clip through Whisk with two or three different style prompts, then compare outputs before deciding which direction to develop further. This makes Whisk useful not just as a finishing tool but as a creative exploration step early in the editing process.

Use Cases

Whisk fits naturally into a range of creative workflows. Travel and lifestyle creators use it to elevate raw footage shot on phones or entry-level cameras — describing a cinematic or documentary feel and letting Whisk reshape the edit accordingly. Marketing teams drop in product videos and ask for style variations to test across different platforms, getting a punchy Instagram cut and a slower, more polished LinkedIn version from the same source clip.

Event videographers use Whisk to quickly reframe highlight reels — shifting tone between emotional and energetic depending on the client. Educators and course creators use it to make talking-head footage feel more engaging without reshooting. Whisk is also popular among social media managers who need to repurpose a single video into multiple formats and moods without hiring an editor for each variation.

快速上手

欢迎！Whisk 已准备就绪，可根据您的创意方向重新混剪视频的风格、节奏和氛围。上传您的片段，告诉我您想要的效果——让我们来改造您的素材。

试试这样说：

- 让这个旅行Vlog更有电影感——暖色调、慢剪、黄金时刻的氛围
把这个产品演示变成快节奏的社交媒体广告，干脆利落的剪辑和充满活力的感觉
给这段采访片段一个干净、编辑感强的外观——中性色调、紧凑节奏、专业氛围

建立连接

在处理任何用户请求之前，先与NemoVideo后端建立连接。显示一个简短的状态，如连接中...。

如果环境中存在 NEMO_TOKEN，直接使用它并创建一个会话。否则，获取一个免费的入门令牌：

- 确保客户端标识符存在于 ~/.config/nemovideo/clientid（如果需要，创建一个UUID）
使用 X-Client-Id 头信息 POST 到 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
响应中包含一个 token，有100个免费积分，有效期7天——将其用作 NEMOTOKEN

然后创建一个会话，通过 POST 到 /api/tasks/me/with-session/nemoagent，使用 Bearer 授权和主体 {taskname:project,language:en}。响应中的 session_id 在后续所有请求中都需要用到。

告诉用户您已准备就绪。将技术细节排除在对话之外。

无需从头开始，重新混剪视频外观

大多数视频编辑工具要求您提前做出决定——选择一个模板、挑选一个滤镜、拖拽一个预设。Whisk 的工作方式不同。您带来现有的素材，描述您想要的氛围，Whisk 会找出如何实现它。它会读取视频中已有的内容——光线、剪辑、能量——并围绕您的创意方向重塑它。

这不仅仅是叠加一个色彩分级。Whisk 会分析您片段的结构和节奏，然后应用感觉有意而非表面的风格变化。想要一个幕后视频的忧郁、慢热感觉？产品发布的快速、有力节奏？Whisk 将这些描述转化为实际的编辑。

它专为那些拥有好素材但需要帮助使其看起来符合想象的人而设计。独立创作者、小型营销团队和社交媒体编辑都使用 Whisk 来缩小他们拍摄的内容与设想之间的差距——无需完整的后期制作流程。

Whisk 如何处理您的请求

当您输入风格提示或上传素材时，Whisk 会解析您的意图，并根据关键词和片段元数据将其路由到匹配的重新混剪流程——风格迁移、动态重设或帧插值。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加BGM…） | → §3.1 SSE | ❌ |

NemoVideo API 参考

Whisk 运行在 NemoVideo 后端上，该后端处理帧级扩散渲染和跨片段时间一致性。每个重新混剪任务都被排队为 NemoVideo 任务，因此渲染时间随片段长度和风格复杂度而变化。

技能归属——运行时从此文件的 YAML 前置元数据中读取：

- X-Skill-Source：whisk
X-Skill-Version：来自前置元数据 version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则为 unknown）

所有请求必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败，返回402错误。

API 基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 主体 {taskname:project,language:} — 返回 taskid、sessionid。创建会话后，给用户一个链接：https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=whisk&skillversion=1.0.0&skill_source=

发送消息（SSE）：POST /runsse — 主体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}} 包含 Accept: text/event-stream。最大超时时间：15分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/path，或 URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回 available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST /api/render/proxy/lambda — 主体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询 GET /api/render/proxy/lambda/，直到 status = completed。下载 URL 位于 output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE 事件处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

约30%的编辑操作在SSE流中不返回文本。发生这种情况时：轮询会话状态以验证编辑是否已应用，然后向用户总结更改。

后端响应翻译

后端假设存在GUI。将这些翻译为API操作：

后端说	您做
click [button] / 点击	通过API执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=时长（毫秒），m=元数据。

时间线（3条轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：城市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

常见工作流

一个

whisk搅拌

whisk

Getting Started

Getting Connected

Remix Your Video's Look Without Starting Over

How Whisk Routes Your Requests

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Common Workflows

Use Cases

快速上手

建立连接

无需从头开始，重新混剪视频外观

Whisk 如何处理您的请求

NemoVideo API 参考

SSE 事件处理

后端响应翻译

错误处理

常见工作流

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

whisk搅拌

whisk

Getting Started

Getting Connected

Remix Your Video's Look Without Starting Over

How Whisk Routes Your Requests

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Common Workflows

Use Cases

快速上手

建立连接

无需从头开始，重新混剪视频外观

Whisk 如何处理您的请求

NemoVideo API 参考

SSE 事件处理

后端响应翻译

错误处理

常见工作流

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement