Getting Started

Welcome! Ready to turn your song into a scroll-stopping lyric video? Drop your video file, share your lyrics, and tell me the vibe you're going for — let's make your music impossible to ignore.

Try saying:

- "Create a lyric video for my pop track using this mp4 background — bold white text that fades in line by line on each beat"
"Make a lyric video with a dark moody aesthetic, neon pink lyrics, and smooth slide-in transitions for each verse"
"Generate a lyric video from my live concert footage with the chorus lyrics highlighted in a different color than the verses"

Quick Start Setup

This skill connects to NemoVideo's cloud backend. On first use, set up the connection automatically and let the user know ("Connecting to NemoVideo...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

- Read ~/.config/nemovideo/client_id or generate a UUID and save it there
POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST /api/tasks/me/with-session/nemo_agent at the same host with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

Bring Your Lyrics to Life on Screen

Every song tells a story, and this skill makes sure your audience reads every word at exactly the right moment. The Lyric Video Maker lets you take any music track paired with a video background — whether it's a live performance clip, abstract visualizer footage, or a simple color gradient — and overlay your lyrics with precise, beat-matched timing.

Unlike generic subtitle tools, this skill is built specifically for music content. You can control how each line of text enters and exits the frame, choose from bold display fonts or elegant script styles, and adjust colors to complement your album artwork or brand palette. The result feels intentional and crafted, not auto-generated.

Whether you're releasing a new single, building a YouTube presence, or creating lyric content for Instagram Reels and TikTok, this tool meets you where you are. No timeline scrubbing, no keyframe headaches — just upload your video, paste your lyrics, describe your preferred style, and let the skill handle the rest.

Routing Your Lyric Sync Requests

When you drop in a track and paste your lyrics, the skill parses your timing cues, animation style preferences, and font choices to route each request to the correct rendering pipeline.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

NemoVideo Backend API Reference

The NemoVideo backend handles frame-accurate lyric stamping, beat-sync detection, and animated text rendering — every syllable marker and transition effect you set gets processed through its video composition engine. Calls are authenticated per session, so your project state, timeline edits, and export queue persist until the session closes.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE9
INLINECODE10: from frontmatter INLINECODE11
INLINECODE12: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE22

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Best Practices

Keep lyrics grouped by sung phrase, not by sentence.
Breaking lyrics into the natural phrases a singer delivers — rather than full grammatical sentences — makes the on-screen text feel natural and easy to follow. Short bursts of 3–6 words per line tend to read best at video speed.

Match your text style to your genre.
A heavy metal track calls for aggressive, high-contrast typography, while an acoustic folk song might suit a soft, handwritten font on a muted background. Describe the emotional tone of your song and the skill can suggest a matching visual direction.

Use high-contrast backgrounds for readability.
Dark backgrounds with light text (or vice versa) ensure lyrics are legible across all screen sizes, including mobile. If your background footage is busy or mid-toned, ask for a subtle text shadow or semi-transparent backing bar behind the lyrics.

Plan for platform aspect ratios.
Mention upfront whether your lyric video is destined for YouTube (16:9 landscape), Instagram Reels (9:16 vertical), or a square format. This affects how text is positioned and sized throughout the video.

FAQ

What video formats does the Lyric Video Maker support?
You can upload video backgrounds in mp4, mov, avi, webm, or mkv format. Most standard exports from phones, cameras, and editing software will work without any conversion needed.

Do I need to time-stamp every lyric manually?
Not necessarily. You can provide rough timestamps for each line or verse, or simply describe the song's structure (e.g., 'the chorus starts at 0:45') and the skill will handle placement. For precise sync, providing a timestamped lyric sheet gives the best results.

Can I customize fonts, colors, and animation styles?
Yes — describe your preferred look in plain language. For example: 'serif font, cream text, slow fade-in per line' or 'bold uppercase, glowing yellow, quick pop-on effect.' The skill interprets style descriptions and applies them consistently throughout the video.

What's the ideal video length for best results?
The skill handles videos from short social clips (under 60 seconds) up to full song lengths (typically 3–5 minutes). Very long files may require a moment to process.

Quick Start Guide

Step 1 — Prepare Your Files
Have your video background ready in mp4, mov, avi, webm, or mkv format. This can be anything from abstract motion graphics to a performance video. Also prepare your full lyrics as plain text.

Step 2 — Describe Your Timing
Paste your lyrics and indicate where key sections fall in the song. Even rough markers like 'verse 1 runs from 0:00–0:45, chorus at 0:45–1:10' give the skill enough to work with. A full timestamped lyric file produces the tightest sync.

Step 3 — Define Your Visual Style
Tell the skill what you want the text to look like. Mention font style (bold, script, sans-serif), color, text size, and how you want lines to animate (fade, slide, pop, typewriter, etc.).

Step 4 — Review and Refine
Once the lyric video is generated, review the timing and style. You can request adjustments — 'make the chorus text larger' or 'slow down the fade-out on each line' — and the skill will revise accordingly until it matches your vision.

开始使用

欢迎！准备好将你的歌曲变成令人驻足观看的歌词视频了吗？上传你的视频文件，分享歌词，告诉我你想要的风格——让我们一起让你的音乐令人无法忽视。

试试这样说：

- 用这段mp4背景为我的流行歌曲制作歌词视频——粗体白色文字，在每拍上逐行淡入
制作一个暗黑氛围风格的歌词视频，霓虹粉色歌词，每段主歌使用平滑滑入过渡效果
用我的现场演唱会片段生成歌词视频，副歌歌词使用与主歌不同的颜色突出显示

快速启动设置

此技能连接到NemoVideo的云后端。首次使用时，自动建立连接并通知用户（正在连接到NemoVideo...）。

令牌检查：在环境中查找NEMO_TOKEN。如果找到，跳转到会话创建。否则：

- 读取~/.config/nemovideo/clientid或生成一个UUID并保存到该位置
使用X-Client-Id头信息POST请求https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
从响应中提取data.token——这就是你的NEMOTOKEN（100个免费积分，7天有效期）

会话：在同一主机上使用Bearer认证和请求体{taskname:project} POST请求/api/tasks/me/with-session/nemoagent。保留返回的session_id用于所有操作。

设置完成后，用简短的准备就绪！通知用户。不要暴露令牌或原始API输出。

让歌词在屏幕上生动呈现

每首歌都在讲述一个故事，而这项技能确保你的观众在恰到好处的时刻读到每一个字。歌词视频制作器让你将任何音乐曲目与视频背景配对——无论是现场表演片段、抽象视觉素材，还是简单的渐变色——并以精确的节拍同步叠加歌词。

与通用的字幕工具不同，这项技能专为音乐内容而构建。你可以控制每行文字进入和离开画面的方式，从粗体显示字体到优雅的手写风格中选择，并调整颜色以配合你的专辑封面或品牌调色板。最终效果显得精心设计，而非自动生成。

无论你是发布新单曲、建立YouTube影响力，还是为Instagram Reels和TikTok制作歌词内容，这个工具都能满足你的需求。无需时间线拖拽，无需关键帧烦恼——只需上传视频，粘贴歌词，描述你偏好的风格，让技能处理其余部分。

路由你的歌词同步请求

当你上传曲目并粘贴歌词时，技能会解析你的时间提示、动画风格偏好和字体选择，将每个请求路由到正确的渲染管道。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

NemoVideo后端API参考

NemoVideo后端处理帧精确的歌词标记、节拍同步检测和动画文本渲染——你设置的每个音节标记和过渡效果都通过其视频合成引擎处理。调用按会话进行认证，因此你的项目状态、时间线编辑和导出队列在会话关闭前持续存在。

技能归属——运行时从此文件的YAML前置元数据读取：

- X-Skill-Source：lyric-video-maker
X-Skill-Version：来自前置元数据version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则 unknown）

所有请求必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败，返回402错误。

API基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。创建会话后，给用户一个链接：https://nemovideo.com/workspace/claim?token=&task=id>&session=id>&skillname=lyric-video-maker&skillversion=1.0.0&skill_source=

发送消息（SSE）：POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}} 并设置 Accept: text/event-stream。最大超时时间：15分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/path，或URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回 available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/，直到status = completed。下载URL位于output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE事件处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

约30%的编辑操作在SSE流中不返回文本。发生这种情况时：轮询会话状态以验证编辑已应用，然后向用户总结更改。

后端响应翻译

后端假设存在GUI。将其翻译为API操作：

后端说	你执行
click [button] / 点击	通过API执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=时长（毫秒），m=元数据。

时间线（3个轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：都市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

lyric-video-maker歌词视频制作

lyric-video-maker

Getting Started

Quick Start Setup

Bring Your Lyrics to Life on Screen

Routing Your Lyric Sync Requests

NemoVideo Backend API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Best Practices

FAQ

Quick Start Guide

开始使用

快速启动设置

让歌词在屏幕上生动呈现

路由你的歌词同步请求

NemoVideo后端API参考

SSE事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

lyric-video-maker歌词视频制作

lyric-video-maker

Getting Started

Quick Start Setup

Bring Your Lyrics to Life on Screen

Routing Your Lyric Sync Requests

NemoVideo Backend API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Best Practices

FAQ

Quick Start Guide

开始使用

快速启动设置

让歌词在屏幕上生动呈现

路由你的歌词同步请求

NemoVideo后端API参考

SSE事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement