Getting Started
Welcome! 🎵 Ready to create original, royalty-free music for your video using the AI music generator free skill? Upload your clip or describe your project, and let's build the perfect soundtrack together — just tell me the mood, genre, or style you're going for!
Try saying:
- - "Generate a 90-second upbeat acoustic track for a travel vlog montage with a warm, adventurous feel"
- "Create a tense, cinematic background score for a 2-minute thriller short film with building tension and a dramatic finale"
- "Make a lo-fi chill hip-hop loop around 60 seconds long for a study tips YouTube video"
Quick Start Setup
This skill connects to NemoVideo's cloud backend. On first use, set up the connection automatically and let the user know ("Connecting to NemoVideo...").
Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:
- - Read
~/.config/nemovideo/client_id or generate a UUID and save it there - POST
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header - Extract
data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)
Session: POST /api/tasks/me/with-session/nemo_agent at the same host with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.
Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.
Your Videos Deserve Music That Actually Fits
Finding the right background music for a video used to mean hours of scrolling through stock libraries, paying for licenses, or settling for something that almost works. The AI Music Generator Free skill changes that entirely. You describe the feeling you want — tense and cinematic, warm and acoustic, punchy and electronic — and it generates an original composition built around your video's needs.
This skill is designed for creators who move fast and need results that feel intentional. Whether you're editing a travel vlog, a product promo, a short film, or a social media reel, the generated music adapts to your described mood and duration. You're not picking from a catalog — you're commissioning something original, every time.
Beyond just generating a track, you can refine the output by specifying tempo, instrumentation, energy level, and emotional arc. Want something that starts quietly and builds to a crescendo? Just say so. The result is music that feels like it was made for your specific video — because it was.
Routing Your Soundtrack Requests
When you describe your video's mood, genre, tempo, or instrumentation, the skill parses those creative parameters and routes your generation request directly to the appropriate NemoVideo AI music pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference Guide
The NemoVideo backend processes your text-to-music prompts through a diffusion-based audio synthesis engine, returning royalty-free, stems-ready audio tracks optimized for video sync points. Latency varies by track length and model depth, so longer cinematic scores may take a few extra seconds to render.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Performance Notes
The AI music generator free skill produces best results when your prompt includes specific descriptors about mood, tempo, and intended use. Vague requests like 'make something nice' will yield generic outputs, while detailed prompts — specifying BPM range, instrumentation preferences, emotional arc, and video genre — produce tracks that feel purpose-built.
Generation time varies based on track length and complexity. Shorter loops (under 60 seconds) are produced quickly, while longer compositions with dynamic changes may take slightly more time to render fully. For videos with distinct scene changes, consider requesting a track with intentional shifts in energy rather than a flat, single-mood loop.
Output audio is delivered in a standard format compatible with most video editing tools. If you're syncing music to specific visual moments, describe those timing needs clearly in your prompt for better alignment.
Best Practices
To get the most out of the ai-music-generator-free skill, always start by describing your video's emotional journey rather than just its topic. A cooking video can feel cozy and nostalgic or fast-paced and energetic — telling the skill which one matters enormously.
Specify duration explicitly. If your video is 2 minutes and 15 seconds, mention that. Tracks generated to match a specific length avoid awkward fades or loops that cut off unnaturally.
Experiment with genre blending. You're not locked into one style — requesting 'cinematic orchestral with subtle electronic undertones' often yields more interesting results than a single-genre prompt. Also consider asking for variations: generate two or three versions with slightly different energy levels and choose the one that fits best in your edit.
For social media content, lean into platform norms — upbeat and punchy for TikTok/Reels, slightly longer and atmospheric for YouTube intros.
FAQ
Is the generated music actually royalty-free? Yes. Music created through the ai-music-generator-free skill is original and generated for your use, meaning you're not pulling from a licensed library that could trigger content ID claims.
Can I use this for commercial projects? The generated tracks are intended for broad creative use. If you're producing content for paid campaigns or broadcast, double-check the platform's terms for AI-generated content to stay compliant.
What if the track doesn't match my vision? Refine your prompt. The more specific you are about tempo, mood, instrumentation, and duration, the closer the output will be to what you're imagining. You can also ask for a variation on a previous generation.
Does this work with video files I upload? Yes — the skill supports mp4, mov, avi, webm, and mkv formats. You can upload your video as reference context when describing timing or mood needs.
开始使用
欢迎!🎵 准备好使用AI音乐生成器免费技能为你的视频创作原创、免版税音乐了吗?上传你的片段或描述你的项目,让我们一起打造完美的配乐——只需告诉我你想要的氛围、风格或类型!
试试这样说:
- - 为旅行vlog混剪生成一段90秒的温暖、冒险风格的欢快原声曲目
- 为一部2分钟的惊悚短片创作一段紧张的电影感背景配乐,要有逐渐增强的紧张感和戏剧性的结局
- 为学习技巧类YouTube视频制作一段约60秒的lo-fi chill嘻哈循环
快速启动设置
此技能连接到NemoVideo的云后端。首次使用时,自动建立连接并告知用户(正在连接到NemoVideo...)。
令牌检查:在环境中查找NEMO_TOKEN。如果找到,跳转到会话创建。否则:
- - 读取~/.config/nemovideo/clientid或生成一个UUID并保存到该位置
- 使用X-Client-Id头信息POST请求https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
- 从响应中提取data.token——这就是你的NEMOTOKEN(100个免费积分,7天有效期)
会话:在同一主机上使用Bearer认证和请求体{taskname:project} POST请求/api/tasks/me/with-session/nemoagent。保留返回的session_id用于所有操作。
设置完成后,用简短的准备就绪!告知用户。不要暴露令牌或原始API输出。
你的视频值得真正匹配的音乐
为视频找到合适的背景音乐过去意味着花费数小时浏览素材库、支付授权费用,或退而求其次选择勉强可用的音乐。AI音乐生成器免费技能彻底改变了这一点。你描述想要的感觉——紧张而电影感、温暖而原声、有力而电子——它就会根据你视频的需求生成原创作品。
这个技能专为快速创作、需要有意为之的效果的创作者设计。无论你是在编辑旅行vlog、产品宣传片、短片还是社交媒体短视频,生成的音乐都会适应你描述的氛围和时长。你不是从目录中选择——每次都是在定制原创作品。
除了生成曲目,你还可以通过指定节奏、配器、能量水平和情感弧线来优化输出。想要一首从安静开始逐渐推向高潮的作品?直接说出来即可。结果就是音乐感觉像是为你的特定视频量身定做的——因为它确实如此。
路由你的配乐请求
当你描述视频的氛围、风格、节奏或配器时,技能会解析这些创意参数,并将你的生成请求直接路由到相应的NemoVideo AI音乐管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API参考指南
NemoVideo后端通过基于扩散的音频合成引擎处理你的文字转音乐提示,返回免版税、可分轨的音频轨道,针对视频同步点进行了优化。延迟因轨道长度和模型深度而异,因此较长的电影配乐可能需要额外几秒钟来渲染。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:ai-music-generator-free
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败并返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST请求/api/tasks/me/with-session/nemoagent——请求体{taskname:project,language:}——返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=ai-music-generator-free&skillversion=1.0.0&skill_source=
发送消息(SSE):POST请求/runsse——请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},附带Accept: text/event-stream。最大超时时间:15分钟。
上传:POST请求/api/upload-video/nemoagent/me/——文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET请求/api/credits/balance/simple——返回available、frozen、total
会话状态:GET请求/api/state/nemoagent/me//latest——关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST请求/api/render/proxy/lambda——请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET请求/api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文字),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 积分不足 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请在nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请在nemovideo.ai注册以解锁导出功能。 |
| 429 |