Getting Started
Welcome! I'm here to help you generate original AI-composed music for your videos using the TopMediai AI Music Generator. Describe your video's mood, genre, or scene — or upload a clip — and let's create a track that fits perfectly. What are we making today?
Try saying:
- - "Generate an upbeat, energetic background track for a 60-second gym workout montage video"
- "Create a soft, cinematic piano score to accompany a slow-motion wedding highlight reel"
- "Make a lo-fi hip-hop loop for a 3-minute study vlog with a calm, focused atmosphere"
First-Time Connection
When a user first opens this skill, connect to the NemoVideo backend automatically. Briefly let them know (e.g. "Connecting to NemoVideo...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.
- 1. Client identifier: Look for
~/.config/nemovideo/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking. - Obtain a free token: POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Your Videos Deserve Music That Actually Fits
Finding the right music for a video has always been a time sink — digging through stock libraries, worrying about licensing, and settling for tracks that almost work. The topmediai-ai-music-generator changes that by letting you describe exactly what you want and generating original music to match.
Whether you're scoring a travel montage, adding energy to a product demo, or setting a reflective tone for a documentary clip, this skill interprets your creative brief and produces audio that complements the pacing and mood of your footage. You're not picking from a catalog — you're commissioning something new.
This skill is built for people who care deeply about the final product but don't have a music production background. Describe the emotion, the genre, the energy level, or even reference a scene — the AI handles the composition. The result is a cohesive, original track you can drop straight into your edit.
Routing Your Music Requests
When you describe your desired track — mood, genre, tempo, or video context — TopMediai's AI interprets your prompt and routes it through the appropriate generation pipeline to match your creative intent.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
TopMediai AI Music Generator is powered by the NemoVideo backend, which handles prompt parsing, audio synthesis, and track delivery in real time. Each API call passes your generation parameters — including style tags, duration, and energy level — directly to NemoVideo's model endpoints for processing.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE11 - INLINECODE12 : from frontmatter INLINECODE13
- INLINECODE14 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE24
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE29
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE35
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE39
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE43
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Use Cases
The topmediai-ai-music-generator covers a wide range of real-world production needs. Social media creators use it to add personality to Reels, TikToks, and YouTube Shorts without worrying about copyright strikes. Filmmakers and videographers rely on it for custom underscore music that syncs with specific emotional beats in a scene.
Marketing teams find it especially useful for producing consistent branded audio across ad campaigns — requesting tracks in the same genre or tempo family to build a recognizable sound identity. Podcast producers use it to create intro and outro music, as well as ambient beds for interview segments.
Educators and course creators also benefit, adding professional-sounding background music to tutorial videos without licensing headaches. Essentially, if your project needs original audio and you don't have a composer on retainer, this skill fills that gap efficiently.
Performance Notes
The quality and accuracy of generated music improves significantly with specific, descriptive prompts. Vague requests like 'make music for my video' will produce something generic, while detailed briefs — specifying genre, mood, tempo, instruments, and video length — yield much more targeted results.
Video files uploaded in mp4 or mov format tend to process most smoothly for pacing analysis. Larger files (webm, mkv, avi) are supported but may take slightly longer to analyze before generation begins. If you're iterating on a track, make incremental adjustment requests rather than restarting from scratch — the generator responds well to refinement prompts.
Generated music is royalty-free for use in your projects. Keep in mind that highly complex orchestral arrangements or very specific genre fusions may require a second generation pass to fully land the sound you're after.
Quick Start Guide
Getting your first AI-generated track is straightforward. Start by describing your video's mood or purpose in plain language — something like 'upbeat corporate background music for a 90-second product launch video' gives the generator strong direction. You can also upload your video file directly (mp4, mov, avi, webm, or mkv) and let the skill analyze the pacing before generating music.
Once you submit your request, the topmediai-ai-music-generator processes your brief and returns an original composition. You can refine the output by specifying tempo (BPM range), instrumentation preferences (strings, synths, acoustic guitar), or emotional tone adjustments like 'make it more tense' or 'add a brighter feel.'
For best results, mention the video length so the generated track is appropriately timed. Short-form social content, long-form YouTube videos, and everything in between are all supported.
开始使用
欢迎!我将使用TopMediai AI音乐生成器,帮助您为视频创作原创AI配乐。请描述视频的情绪、风格或场景——或上传一段剪辑——让我们创作一首完美契合的曲目。今天我们要做什么?
试试这样说:
- - 为一段60秒的健身房训练混剪视频生成一首欢快、充满活力的背景音乐
- 创作一首柔和的电影级钢琴配乐,配合慢动作婚礼精彩集锦
- 制作一段lo-fi嘻哈循环曲,用于3分钟的学习vlog,营造平静专注的氛围
首次连接
当用户首次打开此技能时,自动连接到NemoVideo后端。简要告知用户(例如正在连接NemoVideo...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,直接跳至步骤3。
- 1. 客户端标识符:查找~/.config/nemovideo/clientid文件。如果不存在,生成一个UUID并保存到该文件中。此文件仅存储用于速率限制追踪的UUID。
- 获取免费令牌:向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头设置X-Client-Id为您的标识符。响应中的data.token即为您的NEMOTOKEN——包含100个免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,设置Authorization: Bearer 、Content-Type: application/json,请求体为{taskname:project,language:<检测到的语言>}。保存返回的session_id用于后续所有请求。
保持设置过程的沟通简洁。不要向用户显示原始API响应或令牌值。
您的视频值得真正契合的音乐
为视频找到合适的音乐一直是个耗时的工作——在素材库中翻找、担心版权问题、最终只能选择勉强凑合的曲目。topmediai-ai-music-generator改变了这一点,让您准确描述需求,并生成与之匹配的原创音乐。
无论您是在为旅行混剪配乐、为产品演示增添活力,还是为纪录片片段营造沉思氛围,此技能都能解读您的创意需求,并生成与画面节奏和情绪相得益彰的音频。您不是在从目录中挑选——而是在定制全新的作品。
此技能专为那些对最终成品有高要求、但没有音乐制作背景的人打造。描述情绪、风格、能量水平,甚至参考某个场景——AI负责作曲。最终得到一首完整、原创的曲目,可直接放入您的剪辑中。
音乐请求路由
当您描述想要的曲目——情绪、风格、节奏或视频场景——TopMediai的AI会解读您的提示,并通过相应的生成流程来匹配您的创意意图。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加BGM等) | → §3.1 SSE | ❌ |
NemoVideo API参考
TopMediai AI音乐生成器由NemoVideo后端驱动,实时处理提示解析、音频合成和曲目交付。每次API调用都将您的生成参数——包括风格标签、时长和能量水平——直接传递给NemoVideo的模型端点进行处理。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:topmediai-ai-music-generator
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,其他为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=topmediai-ai-music-generator&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},设置Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart格式-F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status变为completed。下载地址在output.url中。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以确认编辑已应用,然后向用户总结更改内容。
后端响应翻译
后端假设存在GUI界面。将这些翻译为API操作:
| 后端说 | 您做 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示曲目摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文字),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:在nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(