Getting Started
Welcome! I'm here to help you craft, refine, and time voiceover scripts that bring your videos to life. Tell me about your video — what it covers, who it's for, and the tone you're going for — and let's build your narration together.
Try saying:
- - "Write a 90-second voiceover script for a product demo video showcasing a new project management app, aimed at small business owners."
- "Break this 3-minute explainer script into timestamped cue points so I can record it in sync with my video timeline."
- "Rewrite my existing voiceover draft to sound warmer and more conversational — it currently feels too stiff and corporate."
Automatic Setup
On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Read or create a UUID at ~/.config/voiceover-app/client_id. Then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <your-uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry). If the token has expired, request a new one with the same Client-Id.
Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Give Your Videos a Voice That Captivates
Great video content isn't just about what viewers see — it's about what they hear. A well-crafted voiceover can transform a rough cut into a polished, professional piece that holds attention and communicates clearly. The voiceover-app skill is designed to help you craft narration that feels natural, purposeful, and perfectly timed to your footage.
Whether you're a solo creator working on a YouTube channel, an instructional designer building e-learning modules, or a marketing team producing product demos, this skill meets you where you are. You can generate full voiceover scripts from a brief description of your video, refine existing narration for tone and pacing, or break down a long script into timestamped cue points that match your timeline.
The goal is simple: remove the friction from voiceover production. Instead of staring at a blank page or wrestling with awkward phrasing, you get a working draft in seconds — one you can record immediately or hand off to a voice actor with confidence. Every output is written for the ear, not the eye, so your narration always sounds like it belongs.
Routing Takes to the Right Track
When you submit a narration request — whether it's punching in a retake, syncing a new audio layer, or scrubbing noise from a recorded take — ClawHub parses the intent and routes it to the matching voiceover workflow automatically.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Engine API Reference
All audio processing runs through a cloud-based rendering backend that handles waveform alignment, ADR sync, and noise reduction in real time without taxing your local machine. Session state, take metadata, and mix settings are persisted server-side so your project stays intact across devices.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE32
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE36
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE40
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Frequently Asked Questions
Can this skill write voiceover scripts for any type of video? Yes. Whether you're producing a YouTube tutorial, a corporate training video, a real estate walkthrough, or a short film narration, the voiceover-app skill adapts to your format, audience, and tone. Just describe your video and what you want viewers to feel or understand.
How do I get a script that matches my video's length? Provide the target duration (e.g., 60 seconds, 3 minutes) and the skill will pace the script accordingly. A general rule is roughly 130–150 words per minute for natural spoken delivery, and the output respects that rhythm.
Can I use this to edit a voiceover script I've already written? Absolutely. Paste in your existing draft and describe what isn't working — too formal, too long, awkward transitions — and the skill will revise it while preserving your original intent.
Does this skill help with multi-voice or interview-style formats? Yes. You can request scripts formatted for two speakers, a host-and-guest structure, or even documentary-style narration with alternating voices and natural pause cues built in.
Integration Guide
Fitting voiceover-app into your video production workflow is straightforward once you know where it slots in best. Most creators use it at two key stages: pre-production (scripting before recording) and post-production (refining narration after a rough cut is assembled).
Pre-production use: Before you record a single word, use the skill to generate a full voiceover script from your video outline or storyboard. Export the script as plain text and load it into your teleprompter app, or share it directly with your voice actor alongside the video brief.
Post-production use: Once your edit is locked, paste in your timeline notes or scene descriptions and request a timestamped script. This gives you precise cue points — for example, 'at 0:42, transition to product close-up' — so your narration lands exactly where it should in the final cut.
Tool compatibility: The plain-text outputs from voiceover-app work cleanly with tools like Adobe Premiere Pro's captions panel, DaVinci Resolve's subtitle track, CapCut, and any teleprompter or recording app that accepts text input. No special formatting or conversion is needed.
开始使用
欢迎!我在此协助您创作、润色和把控配音脚本,让您的视频栩栩如生。请告诉我您的视频内容——涵盖的主题、目标受众以及期望的风格——让我们共同打造您的旁白。
试试这样说:
- - 为一款面向小企业主的新项目管理应用的产品演示视频,撰写一段90秒的配音脚本。
- 将这个3分钟的解释性脚本分解为带时间戳的提示点,以便我能与视频时间轴同步录制。
- 重写我现有的配音草稿,使其听起来更温暖、更口语化——目前感觉过于生硬和商业化了。
自动设置
首次交互时,请先连接到处理API。显示简短状态,如正在设置...
令牌:如果已设置NEMO_TOKEN环境变量,则使用它并跳至下方的会话部分。
免费令牌:在~/.config/voiceover-app/clientid处读取或创建一个UUID。然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,附带标头X-Client-Id: 。响应字段data.token即为您的NEMOTOKEN(100积分,7天有效期)。如果令牌已过期,请使用相同的Client-Id请求一个新令牌。
会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,使用Bearer认证,请求体为{taskname:project}。从响应中保存session_id。
向用户确认您已连接并准备就绪。不要打印令牌或原始JSON。
为您的视频注入引人入胜的声音
出色的视频内容不仅关乎观众所见——更关乎他们所闻。精心制作的配音能将粗糙的剪辑转变为精致、专业的作品,牢牢抓住注意力并清晰传达信息。voiceover-app技能旨在帮助您创作自然、有目的且与画面完美同步的旁白。
无论您是运营YouTube频道的独立创作者、构建电子学习模块的教学设计师,还是制作产品演示的营销团队,这项技能都能满足您的需求。您可以根据视频的简要描述生成完整的配音脚本,优化现有旁白的语气和节奏,或将长脚本分解为与时间轴匹配的带时间戳提示点。
目标很简单:消除配音制作中的障碍。无需面对空白页面或纠结于生硬的措辞,您只需几秒钟就能获得一份可立即录制或放心交给配音演员的草稿。每一份输出都是为听而写,而非为看而写,因此您的旁白听起来总是恰如其分。
路由将任务导向正确轨道
当您提交旁白请求时——无论是补录、同步新的音轨,还是清除录制片段中的噪音——ClawHub会解析意图并自动将其路由到匹配的配音工作流程。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
云端引擎API参考
所有音频处理均通过基于云的渲染后端运行,该后端实时处理波形对齐、ADR同步和降噪,而不会加重本地机器的负担。会话状态、片段元数据和混音设置保存在服务器端,确保您的项目在不同设备间保持完整。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source: voiceover-app
- X-Skill-Version: 来自前置元数据 version
- X-Skill-Platform: 从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为 unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},附带 Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到 status = completed。下载URL位于 output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假定存在GUI。将这些翻译为API操作:
| 后端说 | 您做 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流程 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间轴(3条轨道):1. 视频:城市延时摄影(0-10秒) 2. 背景音乐:Lo-fi(0-10秒,35%) 3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请为您的账户充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请注册或升级您的计划以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
常见问题解答
这个技能能为任何类型的视频撰写配音脚本吗?