Getting Started
Hey there! Ready to pull the audio out of your video? Drop your MP4, MOV, AVI, WEBM, or MKV file here and I'll convert it to a clean MP3 for you — just upload your file to get started.
Try saying:
- - "Convert this MP4 recording of my band's rehearsal to an MP3 so I can listen to it on my phone"
- "Extract the audio from this MOV file — it's a lecture I recorded and I just need the sound"
- "I have a WEBM video from a Zoom call, can you turn it into an MP3 so I can review it while commuting?"
Automatic Setup
On first interaction, connect to the NemoVideo API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Read or create a UUID at ~/.config/nemovideo/client_id. Then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <your-uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry). If the token has expired, request a new one with the same Client-Id.
Session: POST to the same host at /api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Pull the Sound Out, Leave the Video Behind
Sometimes you recorded something on video but all you really needed was the audio. Maybe it was a lecture, a live performance, a meeting recording, or a song you filmed on your phone. Carrying around a video file just to listen to the audio is clunky — and that's exactly the problem this skill solves.
The MP4 to MP3 Converter skill extracts the audio track from your video and hands it back to you as a standalone MP3 file. It works with the most common video formats you'll actually encounter: MP4, MOV, AVI, WEBM, and MKV. Whether you're on a Mac, Windows machine, or uploading from your phone, the format support has you covered.
The result is a clean, portable MP3 you can drop into any music player, podcast editor, or audio project. No complicated settings to configure, no software to install. Just upload your video, and get your audio back. It's the kind of tool you'll reach for more often than you expect once you have it.
Routing Your Conversion Requests
When you drop an MP4 file or paste a video URL, the skill parses your intent and routes the extraction job directly to the NemoVideo API, stripping the audio stream and returning a downloadable MP3 without re-encoding the video track.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
The NemoVideo backend demuxes the AAC or audio track from your MP4 container and transcodes it to a standard 128–320 kbps MP3, preserving metadata like title and artist tags where available. All processing happens server-side, so no local codec installation is needed.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Troubleshooting
If your converted MP3 comes out silent or extremely quiet, the issue is almost always with the source video itself. Some screen recordings and certain WEBM files captured from browsers are exported with very low audio gain. Try playing the original video file with your volume turned up to confirm whether audio is present before converting.
If you're uploading an MKV file and the conversion stalls or fails, check whether the file contains multiple audio tracks or uncommon codecs — some MKV containers pack in formats that require special handling. Re-exporting the MKV from your video software as a standard MP4 first usually resolves this.
For AVI files recorded by older cameras or screen capture tools, the audio track is sometimes encoded in a legacy format. If the resulting MP3 sounds choppy or has sync issues, this is typically the cause. Converting the AVI to MP4 using a basic video tool before running it through the mp4-to-mp3-converter skill will produce a cleaner result.
Use Cases
The most common reason people reach for an mp4-to-mp3-converter is simple: they have a video but only need the audio. Podcasters frequently record video interviews as a backup, then need to extract the audio track for their feed. The skill handles that in one step.
Musicians and performers often capture live sets or practice sessions on a phone or camera. Converting those MP4 or MOV files to MP3 makes them easy to share with bandmates, upload to SoundCloud, or review in any audio editor without dragging around a large video file.
For students and professionals, lecture recordings, webinar replays, and meeting captures saved as video files become far more convenient once converted to MP3 — easier to scrub through, compatible with more playback tools, and much smaller in file size. Language learners also use this workflow to extract audio from video lessons for offline listening practice.
开始使用
嘿,准备好了吗?想把视频中的音频提取出来?把你的MP4、MOV、AVI、WEBM或MKV文件拖到这里,我会为你转换成干净的MP3——只需上传文件即可开始。
试试这样说:
- - 把我乐队排练的这段MP4录像转换成MP3,这样我就可以在手机上听了
- 从这个MOV文件中提取音频——这是我录的一堂课,我只需要声音
- 我有一个Zoom通话的WEBM视频,你能把它转成MP3吗?这样我就可以在通勤时回顾了
自动设置
首次交互时,先连接NemoVideo API,然后再做其他操作。显示简短状态,如正在设置...
令牌:如果NEMO_TOKEN环境变量已设置,直接使用并跳转到下面的会话部分。
免费令牌:在~/.config/nemovideo/clientid读取或创建一个UUID。然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头为X-Client-Id: <你的UUID>。响应字段data.token即为你的NEMOTOKEN(100积分,7天有效期)。如果令牌已过期,使用相同的Client-Id请求新令牌。
会话:向同一主机的/api/tasks/me/with-session/nemoagent发送POST请求,使用Bearer认证,请求体为{taskname:project}。保存响应中的session_id。
向用户确认已连接并准备就绪。不要打印令牌或原始JSON。
提取音频,留下视频
有时你录了视频,但真正需要的只是音频。可能是讲座、现场表演、会议录音,或者你在手机上拍摄的歌曲。为了听音频而带着视频文件很笨重——这正是这个技能要解决的问题。
MP4转MP3转换器技能从你的视频中提取音轨,并以独立的MP3文件形式返回给你。它支持你实际会遇到的最常见视频格式:MP4、MOV、AVI、WEBM和MKV。无论你是在Mac、Windows机器上,还是从手机上传,格式支持都能满足你的需求。
结果是一个干净、便携的MP3文件,可以放入任何音乐播放器、播客编辑器或音频项目中。无需配置复杂设置,无需安装软件。只需上传视频,即可获得音频。一旦拥有这个工具,你会发现它的使用频率远超预期。
路由你的转换请求
当你拖入MP4文件或粘贴视频URL时,技能会解析你的意图,并将提取任务直接路由到NemoVideo API,剥离音频流并返回可下载的MP3,无需重新编码视频轨道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API参考
NemoVideo后端从你的MP4容器中解复用AAC或音频轨道,并将其转码为标准128-320 kbps的MP3,保留可用的元数据,如标题和艺术家标签。所有处理都在服务器端完成,因此无需安装本地编解码器。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:mp4-to-mp3-converter
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则 → unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头会导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=mp4-to-mp3-converter&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},请求头Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL在output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假定存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3个轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过匿名令牌重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从创建会话或状态响应获取)。已注册用户:请到nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件太大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,不是积分问题。请到nemovideo.ai注册以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天