Getting Started
Welcome! I'm here to help you create a free AI music video that matches your song's energy and story. Tell me about your track — the mood, genre, or visuals you're imagining — and let's start building something amazing together.
Try saying:
- - "Generate a moody, cinematic music video for a slow indie rock song with rainy city visuals and warm amber tones"
- "Create an energetic music video for an upbeat pop track featuring colorful abstract animations synced to the beat"
- "Make a lo-fi aesthetic music video with cozy cafe scenes and soft lighting for a chill hip-hop instrumental"
First-Time Connection
When a user first opens this skill, connect to the NemoVideo backend automatically. Briefly let them know (e.g. "Connecting to NemoVideo...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.
- 1. Client identifier: Look for
~/.config/nemovideo/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking. - Obtain a free token: POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Turn Any Song Into a Visual Experience
Making a music video used to mean hiring a director, renting equipment, and spending thousands of dollars. This skill changes that entirely. Whether you're an independent musician dropping a new single, a content creator building a YouTube presence, or just someone who wants to give their favorite track a visual home — this tool was built for you.
Describe the vibe of your music: the tempo, the emotion, the imagery you imagine when you close your eyes and listen. From there, the skill helps you generate or arrange visuals that match the rhythm and feel of your track. You can work with existing footage, suggest scene styles, or build something from scratch using descriptive prompts.
The result is a polished, shareable music video you can export and post anywhere — no editing degree required. Whether your song is a lo-fi bedroom ballad or a high-energy EDM banger, this skill adapts to your creative vision and helps you produce something you're genuinely proud to share.
Routing Your Visual Requests
When you describe your track's mood, genre, or aesthetic, the skill maps your prompt to the right generation pipeline — whether that's lyric-synced visuals, beat-matched animations, or full cinematic scene sequences.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
Under the hood, every music video render call hits the NemoVideo backend, which handles frame synthesis, audio-visual sync, and style transfer in real time. The API accepts tempo data, color palette hints, and scene descriptors to produce cohesive, beat-aware video output.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE11 - INLINECODE12 : from frontmatter INLINECODE13
- INLINECODE14 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE24
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE29
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE35
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE39
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE43
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Troubleshooting
If your generated video feels out of sync with the music, try breaking the song into sections and describing the visual intent for each part separately — verse, chorus, bridge. This gives the skill clearer guidance for pacing and transitions.
Uploaded video files that are very large or in less common codecs may process more slowly. Converting your footage to mp4 with H.264 encoding before uploading tends to produce the smoothest experience across all supported formats including mov, avi, webm, and mkv.
If the visual style doesn't match what you envisioned, try rephrasing your prompt with more contrast — for example, swap 'dark' for 'deep shadows with neon highlights' or 'vintage' for 'super 8 film grain with faded warm tones.' Precision in language translates directly into precision in output.
Common Workflows
A popular workflow starts with uploading a rough cut of personal footage and asking the skill to suggest visual edits, color grades, and transition styles that match the song's BPM and emotional arc. This is great for artists who already have raw video from a live show or a shoot.
Another common approach is fully prompt-driven: describe your song in detail — genre, lyrics theme, target audience — and let the skill generate a complete visual concept from scratch. Many creators use this to build lyric videos, visualizers, or abstract art pieces that live on streaming platforms.
A third workflow is iterative refinement. Start with a broad concept, review the output, and then make targeted adjustments — swap a background, change the color temperature, tighten a cut. Treating the process as a back-and-forth conversation rather than a one-shot request consistently produces stronger final results.
Tips and Tricks
The more specific your description, the better your results. Instead of saying 'make it look cool,' try describing the color palette, the camera movement style, or a specific emotion you want viewers to feel within the first ten seconds.
If you're uploading your own footage in mp4, mov, avi, webm, or mkv format, trim your clips beforehand so the strongest visual moments align with the song's chorus or key beats. This makes the sync feel intentional rather than accidental.
Experimenting with genre-specific visual styles goes a long way. Metal tracks tend to pop with high-contrast dark imagery, while acoustic folk songs breathe better with natural light and wide open spaces. Don't be afraid to name a specific film, photographer, or visual artist whose aesthetic inspires you — that kind of reference helps dial in the look quickly.
开始使用
欢迎!我在这里帮助你创作一首免费的AI音乐视频,使其与歌曲的能量和故事相匹配。告诉我你的曲目信息——你想象中的情绪、风格或视觉效果——让我们开始一起打造精彩的作品。
试试这样说:
- - 为一首慢速独立摇滚歌曲生成一个情绪化、电影感的音乐视频,搭配雨夜城市画面和温暖的琥珀色调
- 为一首欢快的流行曲目创建一个充满活力的音乐视频,包含与节拍同步的彩色抽象动画
- 为一首放松的嘻哈器乐制作一个低保真美学音乐视频,搭配舒适的咖啡馆场景和柔和灯光
首次连接
当用户首次打开此技能时,自动连接到NemoVideo后端。简要告知用户(例如:正在连接到NemoVideo...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,跳至第3步。
- 1. 客户端标识符:查找~/.config/nemovideo/clientid。如果不存在,生成一个UUID并保存到该文件。此文件仅存储用于速率限制追踪的UUID。
- 获取免费令牌:向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头设置X-Client-Id为你的标识符。响应中的data.token即为你的NEMOTOKEN——100个免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,设置Authorization: Bearer 、Content-Type: application/json,请求体为{taskname:project,language:<检测到的语言>}。保存返回的session_id用于所有后续请求。
保持设置沟通简洁。不要向用户显示原始API响应或令牌值。
将任何歌曲转化为视觉体验
制作音乐视频曾经意味着聘请导演、租赁设备并花费数千美元。这项技能彻底改变了这一点。无论你是发布新单曲的独立音乐人、打造YouTube频道的内容创作者,还是只是想为你最喜欢的曲目赋予视觉呈现的人——这个工具就是为你而生的。
描述你音乐的氛围:节奏、情感、闭上眼睛聆听时想象的画面。然后,技能会帮助你生成或编排与曲目节奏和感觉相匹配的视觉效果。你可以使用现有素材、建议场景风格,或通过描述性提示从头开始构建。
最终成果是一个精致、可分享的音乐视频,你可以导出并发布到任何地方——无需编辑学位。无论你的歌曲是低保真卧室抒情曲还是高能量EDM劲曲,这项技能都会适应你的创意愿景,帮助你制作出真正引以为豪的作品。
路由你的视觉请求
当你描述曲目的情绪、风格或美学时,技能会将你的提示映射到合适的生成流程——无论是歌词同步的视觉效果、节拍匹配的动画,还是完整的电影场景序列。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API参考
在底层,每个音乐视频渲染调用都会访问NemoVideo后端,该后端实时处理帧合成、视听同步和风格迁移。API接受节奏数据、调色板提示和场景描述符,以生成连贯、节拍感知的视频输出。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:free-ai-music-video-generator
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败并返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=free-ai-music-video-generator&skillversion=1.0.0&skill_source=<平台>
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}},设置Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/路径,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:低保真(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请到nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(