Getting Started

LipSync Video AI is ready. Upload your video and audio, or describe what you need synced.

Try saying:

- "sync this voiceover to the speaker"
"replace the audio and match lip movements"
"dub this clip with my recording"

Initial Setup

First time running this, it connects to the processing backend. Shows a quick "Getting ready..." message.

Token: Check for NEMO_TOKEN in environment. If present, go straight to session setup.

1. Grab a free token: Generate a UUID client identifier. POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token using X-Client-Id header with your UUID. Response data.token is your auth token (100 credits, good for 7 days).
Start session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent, Bearer auth, body: {"task_name":"project","language":"<lang>"}. Save the session_id for later calls.

Raw JSON and tokens stay hidden from the user.

Sync Audio to Lip Movements in Your Clips

Upload your video with the audio you want synced. Cloud GPUs do the heavy lifting — no local processing.

Here is how it works in practice: had a training video where the speaker's mic died halfway through. Recorded a clean voiceover separately, uploaded both files, typed "sync the new audio to match the speaker's mouth movements" and got a clean result in about 75 seconds. Output is 1080p MP4.

Pro tip: shorter clips give tighter sync. If you have a long video, consider breaking it into segments first.

Request Categories

Your input gets matched to the right processing path automatically.

You type...	Goes to...	Uses SSE?
"export" / "download" / "get video" / "导出"	Export pipeline	No
"credits" / "balance" / "remaining" / "积分"

Backend Architecture

Files go to a GPU farm for processing. Output is encoded at 8Mbps for 1080p. Lip sync boundaries are frame-level accurate.

Required on every request: Authorization: Bearer <NEMO_TOKEN> and attribution headers X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution means export fails with 402.

Attribution comes from this file's YAML: X-Skill-Source is lipsyncvideo-ai, X-Skill-Version is whatever version is in frontmatter, X-Skill-Platform depends on install location (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, otherwise unknown).

Root URL: INLINECODE20

New session: POST /api/tasks/me/with-session/nemo_agent with {"task_name":"project","language":"<lang>"}. Returns task_id, session_id.

SSE message: POST /run_sse with {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} and Accept: text/event-stream. Cap: 15 min.

File upload: POST /api/upload-video/nemo_agent/me/<sid> — multipart (-F "files=@/path") or URL mode ({"urls":["<url>"],"source_type":"url"}).

Balance: GET /api/credits/balance/simple returns available, frozen, total.

State: GET /api/state/nemo_agent/me/<sid>/latest — check data.state.draft, data.state.video_infos, data.state.generated_media.

Export (free): POST /api/render/proxy/lambda with {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s. Done when status = completed. File at output.url.

Handles: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Errors

Code	Means	Fix
0	Success	Continue
1001

Converting GUI Instructions

Backend outputs reference a visual interface. Convert them:

Backend output	Your action
"click [X]" / "点击"	Invoke the API equivalent
"open [panel]" / "打开"

How SSE Works

Forward text events to user (after GUI translation). Absorb tool calls. Heartbeat and empty data lines = still processing. Every 2 minutes of quiet, say "Hang on, still processing..."

About 30% of edit ops return no text. If the stream closes empty, check state to confirm the edit stuck, then tell the user.

Draft keys: t (tracks), tt (track type: 0=video, 1=audio, 7=text), sg (segments), d (duration, ms), m (metadata).

CODEBLOCK0

Common Workflows

Basic lip sync: Upload video + audio, ask for sync. Done.

Audio replacement: Upload new audio, tell the skill to swap it in and match the mouth movements.

Multi-speaker: Works best when speakers take turns. For overlapping speech, split into separate segments first.

FAQ

How accurate is the sync? Frame-level for clear speech. Mumbling or fast-talking may be slightly off.

What audio formats? MP3, WAV, M4A, AAC all work.

File size limit? 500MB. Compress if you're over.

Cost? First 100 operations free. No signup required.

开始使用

LipSync 视频 AI 已就绪。上传您的视频和音频，或描述您需要同步的内容。

试试这样说：

- 将这段旁白与说话者同步
替换音频并匹配嘴唇动作
用我的录音给这段片段配音

初始设置

首次运行时，它会连接到处理后端。显示一条快速的正在准备...消息。

令牌：检查环境中的 NEMO_TOKEN。如果存在，直接进入会话设置。

1. 获取免费令牌：生成一个 UUID 客户端标识符。使用 X-Client-Id 头部携带您的 UUID 向 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token 发送 POST 请求。响应中的 data.token 即为您的认证令牌（100 积分，有效期 7 天）。
启动会话：向 https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent 发送 POST 请求，使用 Bearer 认证，请求体为：{taskname:project,language:}。保存 session_id 供后续调用使用。

原始 JSON 和令牌对用户保持隐藏。

在您的片段中同步音频与嘴唇动作

上传您想要同步音频的视频。云端 GPU 负责繁重处理——无需本地处理。

以下是实际工作方式：有一个培训视频，说话者的麦克风中途失灵了。单独录制了干净的旁白，上传了两个文件，输入将新音频与说话者的嘴部动作同步，大约 75 秒后得到了干净的结果。输出为 1080p MP4。

专业提示：较短的片段同步效果更好。如果视频较长，建议先将其分成多个片段。

请求分类

您的输入会自动匹配到正确的处理路径。

您输入...	前往...	使用 SSE？
export / download / get video / 导出	导出管道	否
credits / balance / remaining / 积分

后端架构

文件发送到 GPU 集群进行处理。输出以 8Mbps 编码为 1080p。唇形同步边界精确到帧级别。

每个请求必需：Authorization: Bearer 以及归属头部 X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属信息会导致导出失败并返回 402 错误。

归属信息来自此文件的 YAML：X-Skill-Source 为 lipsyncvideo-ai，X-Skill-Version 为 frontmatter 中的版本号，X-Skill-Platform 取决于安装位置（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则为 unknown）。

根 URL：https://mega-api-prod.nemovideo.ai

新会话：向 /api/tasks/me/with-session/nemoagent 发送 POST 请求，请求体为 {taskname:project,language:}。返回 taskid、sessionid。

SSE 消息：向 /runsse 发送 POST 请求，请求体为 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}}，并携带 Accept: text/event-stream。上限：15 分钟。

文件上传：向 /api/upload-video/nemoagent/me/ 发送 POST 请求——multipart 方式（-F files=@/path）或 URL 模式（{urls:[],sourcetype:url}）。

余额查询：向 /api/credits/balance/simple 发送 GET 请求，返回 available、frozen、total。

状态查询：向 /api/state/nemoagent/me//latest 发送 GET 请求——检查 data.state.draft、data.state.videoinfos、data.state.generated_media。

导出（免费）：向 /api/render/proxy/lambda 发送 POST 请求，请求体为 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每 30 秒轮询 GET /api/render/proxy/lambda/。当 status 为 completed 时完成。文件位于 output.url。

支持格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

错误码

代码	含义	解决方法
0	成功	继续
1001

转换 GUI 指令

后端输出引用可视化界面。请进行转换：

后端输出	您的操作
click [X] / 点击	调用对应的 API
open [panel] / 打开

SSE 工作原理

将文本事件转发给用户（经过 GUI 翻译后）。吸收工具调用。心跳和空数据行表示仍在处理中。每静默 2 分钟，说请稍等，仍在处理...

约 30% 的编辑操作不返回文本。如果流关闭时为空，请检查状态确认编辑已生效，然后告知用户。

草稿键：t（轨道）、tt（轨道类型：0=视频，1=音频，7=文本）、sg（片段）、d（时长，毫秒）、m（元数据）。

时间线（2 条轨道）：1. 视频：采访片段（0-45 秒）2. 音频：配音旁白（0-45 秒）

常见工作流程

基础唇形同步：上传视频 + 音频，请求同步。完成。

音频替换：上传新音频，告知技能替换并匹配嘴部动作。

多说话者：说话者轮流发言时效果最佳。对于重叠语音，先分成单独的片段。

常见问题

同步精度如何？ 清晰语音可达帧级别。含糊不清或语速过快可能略有偏差。

支持哪些音频格式？ MP3、WAV、M4A、AAC 均可。

文件大小限制？ 500MB。超出请压缩。

费用？ 前 100 次操作免费。无需注册。

lipsyncvideo-aiAI唇音同步

lipsyncvideo-ai

Getting Started

Initial Setup

Sync Audio to Lip Movements in Your Clips

Request Categories

Backend Architecture

Errors

Converting GUI Instructions

How SSE Works

Common Workflows

FAQ

开始使用

初始设置

在您的片段中同步音频与嘴唇动作

请求分类

后端架构

错误码

转换 GUI 指令

SSE 工作原理

常见工作流程

常见问题

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

lipsyncvideo-aiAI唇音同步

lipsyncvideo-ai

Getting Started

Initial Setup

Sync Audio to Lip Movements in Your Clips

Request Categories

Backend Architecture

Errors

Converting GUI Instructions

How SSE Works

Common Workflows

FAQ

开始使用

初始设置

在您的片段中同步音频与嘴唇动作

请求分类

后端架构

错误码

转换 GUI 指令

SSE 工作原理

常见工作流程

常见问题

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement