Getting Started
Welcome! Ready to turn your song into a scroll-stopping lyric video? Drop your video file, share your lyrics, and tell me the vibe you're going for — let's make your music impossible to ignore.
Try saying:
- - "Create a lyric video for my pop track using this mp4 background — bold white text that fades in line by line on each beat"
- "Make a lyric video with a dark moody aesthetic, neon pink lyrics, and smooth slide-in transitions for each verse"
- "Generate a lyric video from my live concert footage with the chorus lyrics highlighted in a different color than the verses"
Quick Start Setup
This skill connects to NemoVideo's cloud backend. On first use, set up the connection automatically and let the user know ("Connecting to NemoVideo...").
Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:
- - Read
~/.config/nemovideo/client_id or generate a UUID and save it there - POST
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header - Extract
data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)
Session: POST /api/tasks/me/with-session/nemo_agent at the same host with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.
Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.
Bring Your Lyrics to Life on Screen
Every song tells a story, and this skill makes sure your audience reads every word at exactly the right moment. The Lyric Video Maker lets you take any music track paired with a video background — whether it's a live performance clip, abstract visualizer footage, or a simple color gradient — and overlay your lyrics with precise, beat-matched timing.
Unlike generic subtitle tools, this skill is built specifically for music content. You can control how each line of text enters and exits the frame, choose from bold display fonts or elegant script styles, and adjust colors to complement your album artwork or brand palette. The result feels intentional and crafted, not auto-generated.
Whether you're releasing a new single, building a YouTube presence, or creating lyric content for Instagram Reels and TikTok, this tool meets you where you are. No timeline scrubbing, no keyframe headaches — just upload your video, paste your lyrics, describe your preferred style, and let the skill handle the rest.
Routing Your Lyric Sync Requests
When you drop in a track and paste your lyrics, the skill parses your timing cues, animation style preferences, and font choices to route each request to the correct rendering pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo Backend API Reference
The NemoVideo backend handles frame-accurate lyric stamping, beat-sync detection, and animated text rendering — every syllable marker and transition effect you set gets processed through its video composition engine. Calls are authenticated per session, so your project state, timeline edits, and export queue persist until the session closes.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Best Practices
Keep lyrics grouped by sung phrase, not by sentence.
Breaking lyrics into the natural phrases a singer delivers — rather than full grammatical sentences — makes the on-screen text feel natural and easy to follow. Short bursts of 3–6 words per line tend to read best at video speed.
Match your text style to your genre.
A heavy metal track calls for aggressive, high-contrast typography, while an acoustic folk song might suit a soft, handwritten font on a muted background. Describe the emotional tone of your song and the skill can suggest a matching visual direction.
Use high-contrast backgrounds for readability.
Dark backgrounds with light text (or vice versa) ensure lyrics are legible across all screen sizes, including mobile. If your background footage is busy or mid-toned, ask for a subtle text shadow or semi-transparent backing bar behind the lyrics.
Plan for platform aspect ratios.
Mention upfront whether your lyric video is destined for YouTube (16:9 landscape), Instagram Reels (9:16 vertical), or a square format. This affects how text is positioned and sized throughout the video.
FAQ
What video formats does the Lyric Video Maker support?
You can upload video backgrounds in mp4, mov, avi, webm, or mkv format. Most standard exports from phones, cameras, and editing software will work without any conversion needed.
Do I need to time-stamp every lyric manually?
Not necessarily. You can provide rough timestamps for each line or verse, or simply describe the song's structure (e.g., 'the chorus starts at 0:45') and the skill will handle placement. For precise sync, providing a timestamped lyric sheet gives the best results.
Can I customize fonts, colors, and animation styles?
Yes — describe your preferred look in plain language. For example: 'serif font, cream text, slow fade-in per line' or 'bold uppercase, glowing yellow, quick pop-on effect.' The skill interprets style descriptions and applies them consistently throughout the video.
What's the ideal video length for best results?
The skill handles videos from short social clips (under 60 seconds) up to full song lengths (typically 3–5 minutes). Very long files may require a moment to process.
Quick Start Guide
Step 1 — Prepare Your Files
Have your video background ready in mp4, mov, avi, webm, or mkv format. This can be anything from abstract motion graphics to a performance video. Also prepare your full lyrics as plain text.
Step 2 — Describe Your Timing
Paste your lyrics and indicate where key sections fall in the song. Even rough markers like 'verse 1 runs from 0:00–0:45, chorus at 0:45–1:10' give the skill enough to work with. A full timestamped lyric file produces the tightest sync.
Step 3 — Define Your Visual Style
Tell the skill what you want the text to look like. Mention font style (bold, script, sans-serif), color, text size, and how you want lines to animate (fade, slide, pop, typewriter, etc.).
Step 4 — Review and Refine
Once the lyric video is generated, review the timing and style. You can request adjustments — 'make the chorus text larger' or 'slow down the fade-out on each line' — and the skill will revise accordingly until it matches your vision.
开始使用
欢迎!准备好将你的歌曲变成令人驻足观看的歌词视频了吗?上传你的视频文件,分享歌词,告诉我你想要的风格——让我们一起让你的音乐令人无法忽视。
试试这样说:
- - 用这段mp4背景为我的流行歌曲制作歌词视频——粗体白色文字,在每拍上逐行淡入
- 制作一个暗黑氛围风格的歌词视频,霓虹粉色歌词,每段主歌使用平滑滑入过渡效果
- 用我的现场演唱会片段生成歌词视频,副歌歌词使用与主歌不同的颜色突出显示
快速启动设置
此技能连接到NemoVideo的云后端。首次使用时,自动建立连接并通知用户(正在连接到NemoVideo...)。
令牌检查:在环境中查找NEMO_TOKEN。如果找到,跳转到会话创建。否则:
- - 读取~/.config/nemovideo/clientid或生成一个UUID并保存到该位置
- 使用X-Client-Id头信息POST请求https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
- 从响应中提取data.token——这就是你的NEMOTOKEN(100个免费积分,7天有效期)
会话:在同一主机上使用Bearer认证和请求体{taskname:project} POST请求/api/tasks/me/with-session/nemoagent。保留返回的session_id用于所有操作。
设置完成后,用简短的准备就绪!通知用户。不要暴露令牌或原始API输出。
让歌词在屏幕上生动呈现
每首歌都在讲述一个故事,而这项技能确保你的观众在恰到好处的时刻读到每一个字。歌词视频制作器让你将任何音乐曲目与视频背景配对——无论是现场表演片段、抽象视觉素材,还是简单的渐变色——并以精确的节拍同步叠加歌词。
与通用的字幕工具不同,这项技能专为音乐内容而构建。你可以控制每行文字进入和离开画面的方式,从粗体显示字体到优雅的手写风格中选择,并调整颜色以配合你的专辑封面或品牌调色板。最终效果显得精心设计,而非自动生成。
无论你是发布新单曲、建立YouTube影响力,还是为Instagram Reels和TikTok制作歌词内容,这个工具都能满足你的需求。无需时间线拖拽,无需关键帧烦恼——只需上传视频,粘贴歌词,描述你偏好的风格,让技能处理其余部分。
路由你的歌词同步请求
当你上传曲目并粘贴歌词时,技能会解析你的时间提示、动画风格偏好和字体选择,将每个请求路由到正确的渲染管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo后端API参考
NemoVideo后端处理帧精确的歌词标记、节拍同步检测和动画文本渲染——你设置的每个音节标记和过渡效果都通过其视频合成引擎处理。调用按会话进行认证,因此你的项目状态、时间线编辑和导出队列在会话关闭前持续存在。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:lyric-video-maker
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则 unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=&task=id>&session=id>&skillname=lyric-video-maker&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}} 并设置 Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将其翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3个轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请在nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请在nemovideo.ai注册以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30