Getting Started
Welcome! I'm your AI music composer — ready to generate original tracks tailored to your exact mood, style, and project needs using the best-ai-music-generator skill. Tell me what you're working on and let's create something that sounds exactly right.
Try saying:
- - "Generate a 90-second upbeat electronic track with a driving beat for a fitness promo video"
- "Create a melancholic acoustic guitar piece with light piano, suitable for a travel documentary outro"
- "Make a tense, cinematic orchestral loop around 60 BPM for a suspense scene in a short film"
Getting Connected
Before handling any user request, establish a connection to the NemoVideo backend. Show a brief status like "Connecting...".
If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:
- - Ensure a client identifier exists at
~/.config/nemovideo/client_id (create one as a UUID if needed) - POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header - The response includes a
token with 100 free credits valid for 7 days — use it as NEMO_TOKEN
Then create a session by POSTing to /api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.
Tell the user you're ready. Keep the technical details out of the chat.
Your Personal Composer, Available Anytime
Finding the right music for a project used to mean hours of browsing stock libraries, paying per-track licensing fees, or hiring a composer. The Best AI Music Generator skill changes that entirely — you describe what you want, and it builds something original around your brief.
Whether you need a tense cinematic underscore, an upbeat lo-fi loop for a study video, or a gentle acoustic piece for a wedding slideshow, this skill interprets your creative direction and generates music that genuinely fits. You can specify genre, energy level, instrumentation, tempo, and emotional tone in plain language — no music theory knowledge required.
This skill is built for content creators, video editors, podcasters, game developers, and anyone who needs custom audio without the complexity or cost of traditional music production. The result is a track that feels intentional and crafted, not randomly assembled — giving your project a professional sonic identity from the very first listen.
Routing Your Music Generation Requests
When you describe a mood, genre, tempo, or instrumentation, your prompt is parsed and routed to the optimal generation pipeline based on track length, style complexity, and whether you need stems, loops, or a full composition.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Music Reference
The NemoVideo backend processes your music generation requests by translating natural-language prompts into structured audio synthesis parameters — handling BPM mapping, key signatures, and timbre modeling under the hood. Latency varies with track duration and polyphonic complexity, so longer cinematic pieces take more render time than short ambient loops.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Tips and Tricks
The more specific your prompt, the better your result. Instead of saying 'make something happy,' try 'create a bright, bouncy ukulele track at 120 BPM with hand claps, suitable for a children's educational video.' Naming a reference mood, a decade, or even a film genre gives the generator useful creative anchors.
If you're generating music to sync with a video, upload your clip alongside your prompt. Describing the pacing of key moments — like 'the beat should drop at the 30-second mark' — helps align the musical energy with your visual cuts.
Don't be afraid to iterate. Generate a first version, then refine with follow-up prompts like 'make it slightly slower' or 'add more bass presence in the low end.' Treating it like a conversation with a composer consistently produces better results than a single broad request.
For looping background music, mention that explicitly — the generator can structure tracks with clean loop points so they repeat seamlessly in presentations or apps.
Performance Notes
Generation time varies depending on track length and complexity. Short loops under 60 seconds typically render quickly, while longer full-length compositions with layered instrumentation may take additional processing time. Keeping initial requests under 3 minutes is recommended for fastest turnaround.
Highly complex prompts requesting multiple distinct sections — like a track with a quiet intro, building middle, and explosive finale — will take longer than a single-mood piece. If speed matters, generate sections individually and combine them in your editing workflow.
Audio output quality is optimized for standard video and podcast production use. For projects requiring broadcast-grade mastering or specific technical audio specs, plan to run the generated track through a dedicated mastering tool as a final step. The skill focuses on creative composition, not post-production processing.
Uploaded video files in mp4/mov/avi/webm/mkv formats are used for reference and sync previewing only — they are not altered or re-exported by this skill.
开始使用
欢迎!我是你的AI音乐作曲人——随时准备根据你的情绪、风格和项目需求,使用最佳AI音乐生成技能为你量身打造原创曲目。告诉我你在做什么,让我们一起创作出恰到好处的音乐。
试试这样说:
- - 为健身宣传视频生成一段90秒、节奏强劲的欢快电子乐
- 创作一首带有轻钢琴伴奏的忧郁原声吉他曲,适合旅行纪录片片尾
- 为短片中的悬疑场景制作一段约60BPM、充满紧张感的电影级管弦乐循环
建立连接
在处理任何用户请求前,先与NemoVideo后端建立连接。显示简短状态如正在连接...。
如果环境中存在NEMO_TOKEN,直接使用并创建会话。否则,获取免费启动令牌:
- - 确保~/.config/nemovideo/clientid中存在客户端标识符(如无则创建UUID)
- 使用X-Client-Id头向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求
- 响应包含一个token,附带100个免费积分,有效期7天——将其用作NEMOTOKEN
然后创建会话:使用Bearer授权向/api/tasks/me/with-session/nemoagent发送POST请求,请求体为{taskname:project,language:en}。后续所有请求都需要响应中的session_id。
告知用户你已就绪。不要在聊天中透露技术细节。
你的私人作曲人,随时待命
为项目找到合适的音乐,过去意味着花费数小时浏览素材库、按曲目支付授权费,或雇佣作曲人。最佳AI音乐生成技能彻底改变了这一点——你描述需求,它围绕你的要求创作原创作品。
无论你需要紧张的影视背景音乐、学习视频的欢快低保真循环,还是婚礼幻灯片的轻柔原声曲目,这项技能都能理解你的创作方向,生成真正契合的音乐。你可以用日常语言指定风格、能量水平、配器、节奏和情感基调——无需任何音乐理论知识。
这项技能专为内容创作者、视频编辑、播客制作者、游戏开发者以及任何需要定制音频但不想面对传统音乐制作的复杂性和成本的人打造。最终生成的曲目听起来有意图、有匠心,而非随机拼凑——从第一次聆听起就为你的项目赋予专业的听觉身份。
路由你的音乐生成请求
当你描述情绪、风格、节奏或配器时,你的提示词会被解析并路由到最优生成管道,具体取决于曲目长度、风格复杂度,以及你是否需要分轨、循环或完整作品。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有(生成、编辑、添加背景音乐…) | → §3.1 SSE | ❌ |
NemoVideo API 音乐参考
NemoVideo后端通过将自然语言提示词转化为结构化音频合成参数来处理你的音乐生成请求——在底层处理BPM映射、调性和音色建模。延迟时间随曲目时长和复调复杂度变化,因此较长的电影级作品比短环境循环需要更多渲染时间。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:best-ai-music-generator
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=best-ai-music-generator&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},附带Accept: text/event-stream。最大超时:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以确认编辑已应用,然后向用户总结变更。
后端响应翻译
后端假定存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示曲目摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:低保真(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请前往nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出受限 | 订阅层级问题,非积分问题。请前往nemovideo.ai注册以解锁导出功能。 |
| 429 | 速率限制(1令牌/客户端/7天) | 等待