Getting Started
Share your footage description, transcript, or edit goals and I'll give you a full editing plan, cut list, or caption draft — no footage on hand? Just describe the video you're making.
Try saying:
- - "I have a 12-minute interview recording and need to cut it down to a 3-minute highlight reel for LinkedIn. Here's the transcript — which sections should I keep and in what order?"
- "Write an edit script for a 60-second Instagram Reel promoting a skincare product launch, including suggested shot types, text overlays, and music mood."
- "I'm editing a travel vlog shot in Portugal. Suggest a pacing structure, b-roll placement strategy, and transition style that fits a cinematic YouTube format."
First-Time Connection
When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.
- 1. Obtain a free token: Generate a random UUID as client identifier. POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Your AI Co-Editor for Every Kind of Video
Video editing is more than cutting clips — it's about pacing, storytelling, and knowing exactly where to hold a shot and where to let it breathe. This skill acts as your intelligent editing partner, helping you plan, script, and structure video projects from a single rough idea all the way to a frame-by-frame edit list.
Whether you're working on a YouTube documentary, a 15-second product ad, a wedding highlight reel, or a corporate explainer, the approach adapts to your format and platform. You can describe your footage, paste a transcript, or share a rough outline — and get back structured edit notes, caption drafts, b-roll suggestions, music cue recommendations, and scene-by-scene pacing guidance.
This isn't a one-size-fits-all tool. It understands the difference between editing for TikTok versus editing for a film festival submission. The goal is to give you creative direction that actually fits your project — so you spend less time staring at a timeline and more time publishing work you're proud of.
Routing Cuts, Captions & Prompts
Every request — whether you're trimming a timeline, generating auto-captions, or prompting a creative direction change — gets parsed and routed to the appropriate AI processing pipeline based on intent, media context, and edit complexity.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Processing API Reference
All video analysis, frame segmentation, and caption generation run through a distributed cloud backend that processes your media asynchronously — so heavy multi-track renders and AI-driven cut suggestions don't bottleneck your local machine. API calls are stateful within an active session, meaning the model retains timeline context across sequential edits.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE10 - INLINECODE11 : from frontmatter INLINECODE12
- INLINECODE13 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE23
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Performance Notes
This skill works best when you give it context about your footage — even a rough description of what's in each clip goes a long way. If you have a transcript, paste it in full; the more raw material available, the more precise the edit recommendations will be.
For longer projects (30+ minutes of footage), break your input into segments and work through the edit in stages rather than trying to process everything at once. This keeps the output focused and actionable rather than overwhelming.
Platform matters significantly for video-editing-with-ai output quality. Specifying whether you're cutting for YouTube Shorts, Instagram Reels, TikTok, LinkedIn, or long-form YouTube changes the pacing logic, caption style, and structural recommendations considerably. Always mention your target platform upfront for the most relevant edit plan.
Tips and Tricks
One of the most underused features of this skill is transcript-based editing. If you paste a raw spoken transcript, it can identify the strongest soundbites, flag filler-heavy sections to cut, and reorder content for better narrative flow — saving hours of manual scrubbing through footage.
For social-first content, ask for a 'hook-first' edit structure. This prompts the skill to identify the most attention-grabbing moment in your footage and restructure the edit so that moment appears in the first three seconds — a proven technique for reducing scroll-past rates.
Don't overlook music and pacing prompts. Describing the emotional tone you want (e.g. 'urgent and energetic' vs. 'warm and nostalgic') helps generate cut rhythm suggestions that align your editing beats with the right music tempo range. You can also ask for chapter markers, end screen placement ideas, and thumbnail moment callouts as part of any edit plan.
开始使用
分享你的素材描述、文字稿或剪辑目标,我将为你提供完整的剪辑方案、剪切列表或字幕草稿——手头没有素材?只需描述你正在制作的视频即可。
尝试说:
- - 我有一个12分钟的采访录音,需要将其剪辑成一个3分钟的LinkedIn精彩集锦。这是文字稿——我应该保留哪些部分,以及按什么顺序排列?
- 为一个60秒的Instagram Reel撰写剪辑脚本,用于推广一款护肤品新品发布,包括建议的镜头类型、文字叠加和音乐氛围。
- 我正在剪辑一个在葡萄牙拍摄的旅行Vlog。请建议一种节奏结构、B-roll放置策略和过渡风格,以适合电影感的YouTube格式。
首次连接
当用户首次打开此技能时,自动连接到处理后端。简要告知用户(例如正在设置...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,则跳至步骤2。
- 1. 获取免费令牌:生成一个随机UUID作为客户端标识符。向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头X-Client-Id设置为该UUID。响应中的data.token即为你的NEMOTOKEN——100个免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,携带Authorization: Bearer 、Content-Type: application/json和请求体{taskname:project,language:<检测到的语言>}。存储返回的sessionid用于所有后续请求。
保持设置沟通简洁。不要向用户显示原始API响应或令牌值。
适用于各类视频的AI联合剪辑师
视频剪辑不仅仅是剪切片段——它关乎节奏、叙事,以及确切知道何时保持一个镜头、何时让它呼吸。此技能作为你的智能剪辑伙伴,帮助你从单个粗略想法一直到逐帧剪辑列表,规划、编写脚本和构建视频项目。
无论你是在制作YouTube纪录片、15秒产品广告、婚礼精彩集锦还是企业讲解视频,该方法都会根据你的格式和平台进行调整。你可以描述你的素材、粘贴文字稿或分享粗略大纲——然后获得结构化的剪辑笔记、字幕草稿、B-roll建议、音乐提示建议和逐场景的节奏指导。
这不是一个一刀切的工具。它理解TikTok剪辑与电影节投稿剪辑之间的区别。目标是为你提供真正适合你项目的创意方向——这样你花更少的时间盯着时间线,更多的时间发布让你引以为傲的作品。
路由剪切、字幕与提示
每个请求——无论你是在修剪时间线、生成自动字幕,还是提示创意方向变更——都会根据意图、媒体上下文和剪辑复杂度被解析并路由到相应的AI处理管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐...) | → §3.1 SSE | ❌ |
云端处理API参考
所有视频分析、帧分割和字幕生成都通过分布式云后端运行,该后端异步处理你的媒体——因此繁重的多轨道渲染和AI驱动的剪切建议不会成为你本地机器的瓶颈。在活跃会话中,API调用是有状态的,这意味着模型在连续编辑中保留时间线上下文。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:video-editing-with-ai
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败并返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}} 并携带Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/路径,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。当发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些转换为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示带有?bind=
的注册URL(需要时从create-session或state响应获取)。已注册用户:请在你的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/修剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请注册或升级你的计划以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 等待30秒后重试一次 |
性能说明
当你