Getting Started
Welcome! I'm your MP3 Converter assistant — ready to help you extract, convert, and fine-tune audio into MP3 format from virtually any source file. Drop a file or describe what you need converted and let's get started!
Try saying:
- - "Convert this MP4 video to MP3 at 192kbps and keep the original filename"
- "Extract just the first 3 minutes of this WAV file and save it as an MP3"
- "Convert all the FLAC files in this folder to MP3 at 320kbps with metadata preserved"
Automatic Setup
On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Read or create a UUID at ~/.config/mp3-converter/client_id. Then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <your-uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry). If the token has expired, request a new one with the same Client-Id.
Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Convert Anything to MP3 Without the Hassle
Whether you're pulling a soundtrack from a YouTube-style video file, extracting interview audio from an MP4 recording, or converting a FLAC album for your phone, the MP3 Converter skill gets it done through a straightforward conversation — no software installs, no confusing menus.
Just tell the skill what you have and what you need. You can specify output quality (from 64kbps voice recordings all the way up to 320kbps studio-grade audio), choose whether to preserve metadata like artist and track title, and even trim the file to a specific time range before converting. It handles common source formats including MP4, WAV, OGG, AAC, FLAC, WMA, and MOV.
This skill is built for real workflows — not just one-off conversions. Batch processing, consistent naming conventions, and quality presets mean you can set a standard once and reuse it every time. Whether you're a podcaster archiving episodes, a teacher clipping lecture audio, or a musician bouncing demos, this tool fits naturally into how you already work.
Routing Your Conversion Requests
When you submit an audio or video file, the skill parses the source format, bitrate preferences, and output parameters before dispatching the job to the appropriate conversion pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Processing Backend Reference
The MP3 Converter skill connects to a cloud-based transcoding engine that handles format demuxing, audio stream extraction, and re-encoding to MP3 at your specified bitrate — all without local processing. Jobs are queued, processed, and returned as a downloadable MP3 link typically within seconds, depending on file size and server load.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE32
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE36
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE40
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Troubleshooting
Output file sounds distorted or has artifacts: This usually happens when converting from a low-quality source at a high bitrate setting. The converter can't add quality that wasn't there — try matching your output bitrate closer to the source's original quality, or use 128kbps for voice content and 192–320kbps for music.
Conversion fails or stalls on large files: Very large video files (over 2GB) may time out depending on your environment. Try trimming the file to the relevant segment first, then converting — this also speeds up the process significantly for long recordings.
Metadata (artist, title) not showing up in the MP3: Some source formats store metadata differently. If tags aren't carrying over automatically, you can explicitly tell the skill which tags to write — for example: 'Set artist to John Smith and title to Episode 12 on the output file.'
Wrong audio track extracted from a multi-track video: Broadcast recordings and some MP4 files contain multiple audio streams. Specify which track you want — for example, 'Use audio track 2' — and the skill will target that stream during conversion.
Common Workflows
Podcast episode archiving: Many podcasters record in WAV or AIFF for editing, then need compressed MP3s for distribution. A typical workflow: convert the final edited WAV to MP3 at 128kbps mono (ideal for voice), embed the episode title and number as metadata, and output with a date-stamped filename like 2024-06-01_episode-42.mp3.
Stripping audio from screen recordings or tutorials: If you've recorded a software walkthrough or lecture as an MP4 and only need the audio for a transcript or audio-only version, ask the skill to extract the audio track and convert it to MP3 — this dramatically reduces file size and makes it easy to upload to transcription tools.
Music library format standardization: If you have a mixed library of FLAC, OGG, and AAC files and want everything in MP3 for device compatibility, describe your naming convention and quality preference once. The skill can apply the same settings consistently across a batch, saving significant manual effort.
Ringtone or clip creation: Specify a start time and end time (e.g., '0:45 to 1:10') along with your conversion request, and the skill will trim and convert in a single step — no separate editing needed.
开始使用
欢迎!我是您的MP3转换助手——随时准备从几乎任何源文件中提取、转换和微调音频为MP3格式。上传文件或描述您需要转换的内容,让我们开始吧!
试试说:
- - 将这个MP4视频转换为192kbps的MP3,并保留原始文件名
- 提取这个WAV文件的前3分钟,保存为MP3
- 将此文件夹中的所有FLAC文件转换为320kbps的MP3,并保留元数据
自动设置
首次交互时,先连接到处理API再执行其他操作。显示简短状态如正在设置...
令牌:如果已设置NEMO_TOKEN环境变量,则使用它并跳至下方的会话部分。
免费令牌:在~/.config/mp3-converter/clientid中读取或创建UUID。然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,附带标头X-Client-Id: 。响应字段data.token即为您的NEMOTOKEN(100积分,7天有效期)。如果令牌已过期,使用相同的Client-Id请求新令牌。
会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,使用Bearer认证和正文{taskname:project}。保存响应中的session_id。
向用户确认您已连接并准备就绪。不要打印令牌或原始JSON。
轻松将任何内容转换为MP3
无论您是从YouTube风格的视频文件中提取配乐、从MP4录音中提取采访音频,还是将FLAC专辑转换为手机可用的格式,MP3转换技能都能通过简单的对话完成——无需安装软件,无需复杂的菜单。
只需告诉技能您有什么文件以及需要什么。您可以指定输出质量(从64kbps语音录制到320kbps录音室级音频),选择是否保留艺术家和曲目标题等元数据,甚至可以在转换前将文件裁剪到特定时间范围。它支持常见的源格式,包括MP4、WAV、OGG、AAC、FLAC、WMA和MOV。
该技能专为实际工作流程而设计——不仅仅是单次转换。批量处理、一致的命名规则和质量预设意味着您可以一次设置标准并每次重复使用。无论您是归档剧集的播客主、剪辑讲座音频的教师,还是导出小样的音乐人,这个工具都能自然地融入您现有的工作方式。
路由您的转换请求
当您提交音频或视频文件时,技能会解析源格式、比特率偏好和输出参数,然后将任务分派到相应的转换管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
云端处理后端参考
MP3转换技能连接到基于云的转码引擎,该引擎处理格式解复用、音频流提取以及按您指定的比特率重新编码为MP3——全部在云端完成,无需本地处理。任务被排队、处理,并作为可下载的MP3链接返回,通常在几秒钟内完成,具体取决于文件大小和服务器负载。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:mp3-converter
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败并返回402。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 正文{taskname:project,language:} — 返回taskid、sessionid。
发送消息(SSE):POST /runsse — 正文{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},附带Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 正文{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 您做 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流程 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3个轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/已过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 积分不足 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或状态响应获取)。已注册用户:请为您的账户充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。注册或升级您的计划以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
故障排除
输出文件声音失真或有杂音: 这通常发生在从低质量源以高比特率设置转换时。转换器无法增加原本不存在的质量