Getting Started
Welcome! Ready to turn your raw footage into something worth watching? Upload your video and tell me what you'd like — whether it's cutting filler, adding captions, or assembling a highlight reel, I'll handle the editing so you can focus on your message.
Try saying:
- - "Trim all the silent pauses and filler words from this interview clip and export a clean version"
- "Create a 60-second highlight reel from this 20-minute product demo, focusing on the key feature reveals"
- "Add burned-in subtitles to this video and sync them with the spoken dialogue"
Automatic Setup
On first interaction, connect to the NemoVideo API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Read or create a UUID at ~/.config/nemovideo/client_id. Then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <your-uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry). If the token has expired, request a new one with the same Client-Id.
Session: POST to the same host at /api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Edit Smarter: Let AI Do the Heavy Lifting
Most video editing tools demand hours of manual work — scrubbing timelines, syncing audio, trimming pauses, and hunting for the right moment. This skill flips that process. Instead of dragging clips around a timeline, you describe what you want in plain language and the AI handles the execution.
Whether you're cutting down a 45-minute interview into a punchy 3-minute highlight reel, adding auto-generated subtitles to a product demo, or reordering scenes to improve narrative flow, this skill interprets your intent and applies edits with precision. It understands pacing, context, and content — not just timestamps.
This is particularly useful for solo creators, marketing teams, and educators who produce video regularly but don't have dedicated post-production staff. Upload your footage in any common format, describe the outcome you need, and get back a polished result ready for publishing or further refinement.
Routing Cuts and Commands
Every prompt you send — whether trimming dead frames, applying LUTs, or generating B-roll descriptions — gets parsed by intent and dispatched to the matching NemoVideo pipeline automatically.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
The NemoVideo backend processes your raw footage metadata and edit instructions through a multi-model inference layer, handling everything from scene detection and auto-reframing to AI-driven color grading and subtitle generation. Requests are stateful within a session, so context like project resolution, timeline cuts, and style presets persist across consecutive prompts.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Quick Start Guide
Getting started with video-editing-with-ai takes less than two minutes. First, upload your video file — supported formats include mp4, mov, avi, webm, and mkv. Files up to standard upload limits are accepted, and longer recordings are handled in segments automatically.
Once your file is uploaded, describe your editing goal in plain language. Be as specific or as broad as you like. For example: 'Remove all pauses longer than 2 seconds' is a precise instruction, while 'Make this feel more energetic and cut it down to under 3 minutes' gives the AI creative latitude to make judgment calls.
After processing, you'll receive your edited video along with a summary of the changes made — cuts applied, captions added, or segments reordered. You can then request further adjustments in the same conversation. Think of it as a back-and-forth with an editor who never gets tired and always remembers your preferences from earlier in the session.
Integration Guide
The video-editing-with-ai skill is designed to slot into existing content production pipelines without disruption. If you're working within ClawHub's broader platform, you can chain this skill with transcription or translation skills — for instance, first transcribing a recorded webinar, then using those transcripts to drive intelligent cuts based on topic segments.
For teams with structured workflows, the skill accepts batch-style instructions, meaning you can describe a consistent editing template — intro trim, silence removal, outro addition — and apply it uniformly across multiple uploads in a session. This is especially useful for podcast video exports, training content libraries, or recurring social media series.
Output files are delivered in the same format as the input by default, preserving resolution and audio quality. If you need a specific output format or resolution target for a platform like YouTube Shorts, Instagram Reels, or LinkedIn, simply include that in your prompt and the skill will adapt the export accordingly.
开始使用
欢迎!准备好将你的原始素材变成值得一看的作品了吗?上传你的视频,告诉我你的需求——无论是剪掉冗余内容、添加字幕,还是制作精彩集锦,我来处理剪辑,让你专注于传达信息。
试试这样说:
- - 剪掉这段采访视频中所有的静音停顿和填充词,导出一个干净的版本
- 从这段20分钟的产品演示中制作一个60秒的精彩集锦,聚焦于关键功能展示
- 给这段视频添加内嵌字幕,并与对话语音同步
自动设置
首次交互时,先连接到NemoVideo API。显示简短状态,如正在设置....
令牌:如果NEMO_TOKEN环境变量已设置,则直接使用并跳至下方的会话部分。
免费令牌:在~/.config/nemovideo/clientid中读取或创建一个UUID。然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头包含X-Client-Id: <你的UUID>。响应字段data.token即为你的NEMOTOKEN(100积分,7天有效期)。如果令牌已过期,使用相同的Client-Id请求新令牌。
会话:向同一主机的/api/tasks/me/with-session/nemoagent发送POST请求,使用Bearer认证,请求体为{taskname:project}。保存响应中的session_id。
向用户确认已连接并准备就绪。不要打印令牌或原始JSON。
智能剪辑:让AI承担繁重工作
大多数视频编辑工具需要数小时的手动操作——拖拽时间线、同步音频、修剪停顿、寻找最佳时刻。本技能颠覆了这一流程。你无需在时间线上拖拽片段,只需用自然语言描述你的需求,AI便会执行相应操作。
无论是将45分钟的采访剪辑成精炼的3分钟精彩集锦,为产品演示添加自动生成的字幕,还是重新排列场景以改善叙事节奏,本技能都能理解你的意图并精准应用编辑。它理解的是节奏、上下文和内容——而不仅仅是时间戳。
这对于定期制作视频但没有专门后期制作人员的独立创作者、营销团队和教育工作者尤其有用。以任何常见格式上传你的素材,描述你需要的最终效果,即可获得一个可直接发布或进一步优化的精修版本。
路由剪辑与指令
你发送的每个提示——无论是修剪空白帧、应用LUT还是生成B-roll描述——都会按意图解析并自动分发到匹配的NemoVideo管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API参考
NemoVideo后端通过多模型推理层处理你的原始素材元数据和编辑指令,处理范围涵盖场景检测、自动重构图、AI驱动调色和字幕生成。请求在会话内是有状态的,因此项目分辨率、时间线剪辑和样式预设等上下文会在连续提示中持续保留。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:video-editing-with-ai
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败并返回402。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=video-editing-with-ai&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},请求头包含Accept: text/event-stream。最大超时:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改内容。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文字),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 积分不足 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请到nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/修剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出受限 | 订阅层级问题,非积分问题。请到nemovideo.ai注册以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
快速入门指南
开始使用video-ed