Getting Started
Welcome! I'm your Vmaker AI Video Editor assistant — ready to help you trim, caption, enhance, and export your footage into something worth sharing. Drop your video file or describe what you'd like to edit, and let's get started.
Try saying:
- - "Remove all the silent pauses and filler words from this screen recording and export a clean mp4"
- "Add auto-generated captions to my interview video and highlight the key quotes with text overlays"
- "Trim this 45-minute webinar down to a 3-minute highlight reel with transitions between segments"
Quick Start Setup
This skill connects to NemoVideo's cloud backend. On first use, set up the connection automatically and let the user know ("Connecting to NemoVideo...").
Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:
- - Read
~/.config/nemovideo/client_id or generate a UUID and save it there - POST
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header - Extract
data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)
Session: POST /api/tasks/me/with-session/nemo_agent at the same host with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.
Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.
Edit Smarter, Not Harder with Vmaker AI
The vmaker-ai-video-editor skill transforms how you approach video production. Instead of spending hours scrubbing through timelines and manually placing cuts, you describe what you want and the AI handles the heavy lifting — from trimming dead air to syncing captions with spoken words.
Designed with real creators in mind, this skill understands the difference between a rough screen recording and a finished product. Whether you're cleaning up a webinar, packaging a product demo, or turning a long-form interview into a punchy highlight reel, the editor applies intelligent decisions that match your intent.
You're not locked into a rigid template system either. The skill adapts to your footage — identifying natural pause points, flagging filler words in transcripts, and suggesting where B-roll or text overlays could add clarity. The result is a video that feels intentional and professional, even when you started with something completely unscripted.
Routing Your Vmaker Edit Requests
When you describe a trim, caption drop, background swap, or AI enhancement, ClawHub maps your intent directly to the matching Vmaker AI endpoint so the right tool fires every time.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo Backend API Reference
Vmaker AI's editing pipeline runs on the NemoVideo backend, which handles render queues, AI model calls, and asset processing under the hood. Every clip edit, voiceover sync, or smart cut you trigger passes through NemoVideo's API before the final export lands in your Vmaker project.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
FAQ
What video formats does vmaker-ai-video-editor support? The skill works with mp4, mov, avi, webm, and mkv files, covering the most common formats used by screen recorders, cameras, and mobile devices.
Can it edit videos with multiple speakers? Yes. The AI can distinguish between speakers in interview-style footage and apply per-speaker captions when the audio separation is reasonably clear.
How long can my video be? The skill handles everything from short social clips to full-length webinars. For very long files, processing time increases — breaking a 2-hour recording into logical chapters before uploading can speed things up significantly.
Will the AI make creative decisions on its own? Only when you ask it to. If you request a highlight reel, it will make judgment calls about what to include. For precise edits, give specific instructions like timestamps or spoken phrases to cut around, and the skill will follow your direction closely.
Tips and Tricks
To get the most out of vmaker-ai-video-editor, start by providing a clear description of your intended audience and purpose. Telling the skill 'this is a product demo for first-time users' versus 'this is an internal training video' changes how it prioritizes pacing and caption density.
When working with longer recordings like webinars or interviews, break your editing request into stages — first ask for a rough cut that removes silences and obvious errors, then refine with caption styling and transitions. This staged approach gives you more control over the final output.
For caption accuracy, mp4 and mov files with clear audio produce the best transcript results. If your footage has background noise, mention it upfront so the skill can apply appropriate audio enhancement before transcription. Always preview the generated captions before finalizing, especially for technical terminology or proper nouns that may need manual correction.
快速入门
欢迎!我是你的 Vmaker AI 视频编辑助手——随时准备帮你剪辑、添加字幕、增强画质,并将你的素材导出为值得分享的作品。上传你的视频文件或描述你想编辑的内容,让我们开始吧。
试试这样说:
- - 移除这段屏幕录制中所有静音停顿和填充词,并导出一个干净的 mp4 文件
- 为我的采访视频添加自动生成的字幕,并用文字叠加层突出关键引语
- 将这段 45 分钟的网络研讨会剪辑成一段 3 分钟的精简集锦,并在片段之间添加转场效果
快速启动设置
此技能连接到 NemoVideo 的云后端。首次使用时,自动建立连接并告知用户(正在连接 NemoVideo...)。
令牌检查:在环境中查找 NEMO_TOKEN。如果找到,直接跳转到会话创建。否则:
- - 读取 ~/.config/nemovideo/clientid 或生成一个 UUID 并保存到该位置
- 使用 X-Client-Id 头信息 POST 请求 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
- 从响应中提取 data.token——这就是你的 NEMOTOKEN(100 个免费积分,7 天有效期)
会话:在同一主机上使用 Bearer 认证和请求体 {taskname:project} 发送 POST 请求至 /api/tasks/me/with-session/nemoagent。在所有操作中保留返回的 session_id。
设置完成后,用简短的准备就绪!告知用户。不要暴露令牌或原始 API 输出。
使用 Vmaker AI 更智能地编辑,而非更费力
vmaker-ai-video-editor 技能改变了你处理视频制作的方式。你无需花费数小时在时间线上反复拖拽和手动放置剪辑点,只需描述你的需求,AI 就会处理繁重的工作——从修剪静音片段到让字幕与语音同步。
该技能专为真正的创作者设计,能够理解粗糙的屏幕录制和成品之间的区别。无论你是在清理网络研讨会、包装产品演示,还是将长篇采访变成紧凑的精简集锦,编辑器都会做出符合你意图的智能决策。
你也不会被僵化的模板系统所束缚。该技能会适应你的素材——识别自然的停顿点,在转录文本中标记填充词,并建议在哪些位置添加 B-roll 或文字叠加层以增强清晰度。即使你从完全即兴的内容开始,最终也能得到一部看起来有意图且专业的视频。
路由你的 Vmaker 编辑请求
当你描述修剪、添加字幕、更换背景或 AI 增强时,ClawHub 会将你的意图直接映射到匹配的 Vmaker AI 端点,确保每次都能触发正确的工具。
| 用户说... | 操作 | 跳过 SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo 后端 API 参考
Vmaker AI 的编辑流程运行在 NemoVideo 后端上,该后端在底层处理渲染队列、AI 模型调用和素材处理。你触发的每个剪辑编辑、配音同步或智能剪切都会先通过 NemoVideo 的 API,然后最终导出到你的 Vmaker 项目中。
技能归属——在运行时从此文件的 YAML 前置元数据中读取:
- - X-Skill-Source:vmaker-ai-video-editor
- X-Skill-Version:来自前置元数据 version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为 unknown)
所有请求 必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败并返回 402 错误。
API 基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST 请求至 /api/tasks/me/with-session/nemoagent——请求体 {taskname:project,language:}——返回 taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=vmaker-ai-video-editor&skillversion=1.0.0&skill_source=
发送消息(SSE):POST 请求至 /runsse——请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},并带有 Accept: text/event-stream。最大超时时间:15 分钟。
上传:POST 请求至 /api/upload-video/nemoagent/me/——文件:multipart -F files=@/path,或 URL:{urls:[],sourcetype:url}
积分:GET 请求至 /api/credits/balance/simple——返回 available、frozen、total
会话状态:GET 请求至 /api/state/nemoagent/me//latest——关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST 请求至 /api/render/proxy/lambda——请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每 30 秒轮询 GET 请求至 /api/render/proxy/lambda/,直到 status = completed。下载 URL 位于 output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE 事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空的 data: | 继续等待。每 2 分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约 30% 的编辑操作在 SSE 流中不返回文本。当发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改内容。
后端响应翻译
后端假定存在 GUI。将这些翻译为 API 操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过 API 执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过 SSE 发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3 条轨道):1. 视频:城市延时摄影(0-10 秒)2. 背景音乐:Lo-fi(0-10 秒,35%)3. 标题:都市梦想(0-3 秒)
错误处理
令牌错误/过期 | 通过 anonymous-token 重新认证(令牌 7 天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示带有 ?bind=
的注册 URL(需要时从 create-session 或 state 响应中获取 )。已注册用户:请到 nemovideo.ai 充值 |
| 4001 | 不支持的文件 |