Getting Started

Hey there! Ready to pull the audio out of your video? Drop your MP4, MOV, AVI, WEBM, or MKV file here and I'll convert it to a clean MP3 for you — just upload your file to get started.

Try saying:

- "Convert this MP4 recording of my band's rehearsal to an MP3 so I can listen to it on my phone"
"Extract the audio from this MOV file — it's a lecture I recorded and I just need the sound"
"I have a WEBM video from a Zoom call, can you turn it into an MP3 so I can review it while commuting?"

Automatic Setup

On first interaction, connect to the NemoVideo API before doing anything else. Show a brief status like "Setting things up...".

Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.

Free token: Read or create a UUID at ~/.config/nemovideo/client_id. Then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <your-uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry). If the token has expired, request a new one with the same Client-Id.

Session: POST to the same host at /api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.

Confirm to the user you're connected and ready. Don't print tokens or raw JSON.

Pull the Sound Out, Leave the Video Behind

Sometimes you recorded something on video but all you really needed was the audio. Maybe it was a lecture, a live performance, a meeting recording, or a song you filmed on your phone. Carrying around a video file just to listen to the audio is clunky — and that's exactly the problem this skill solves.

The MP4 to MP3 Converter skill extracts the audio track from your video and hands it back to you as a standalone MP3 file. It works with the most common video formats you'll actually encounter: MP4, MOV, AVI, WEBM, and MKV. Whether you're on a Mac, Windows machine, or uploading from your phone, the format support has you covered.

The result is a clean, portable MP3 you can drop into any music player, podcast editor, or audio project. No complicated settings to configure, no software to install. Just upload your video, and get your audio back. It's the kind of tool you'll reach for more often than you expect once you have it.

Routing Your Conversion Requests

When you drop an MP4 file or paste a video URL, the skill parses your intent and routes the extraction job directly to the NemoVideo API, stripping the audio stream and returning a downloadable MP3 without re-encoding the video track.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

NemoVideo API Reference

The NemoVideo backend demuxes the AAC or audio track from your MP4 container and transcodes it to a standard 128–320 kbps MP3, preserving metadata like title and artist tags where available. All processing happens server-side, so no local codec installation is needed.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE9
INLINECODE10: from frontmatter INLINECODE11
INLINECODE12: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE22

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Troubleshooting

If your converted MP3 comes out silent or extremely quiet, the issue is almost always with the source video itself. Some screen recordings and certain WEBM files captured from browsers are exported with very low audio gain. Try playing the original video file with your volume turned up to confirm whether audio is present before converting.

If you're uploading an MKV file and the conversion stalls or fails, check whether the file contains multiple audio tracks or uncommon codecs — some MKV containers pack in formats that require special handling. Re-exporting the MKV from your video software as a standard MP4 first usually resolves this.

For AVI files recorded by older cameras or screen capture tools, the audio track is sometimes encoded in a legacy format. If the resulting MP3 sounds choppy or has sync issues, this is typically the cause. Converting the AVI to MP4 using a basic video tool before running it through the mp4-to-mp3-converter skill will produce a cleaner result.

Use Cases

The most common reason people reach for an mp4-to-mp3-converter is simple: they have a video but only need the audio. Podcasters frequently record video interviews as a backup, then need to extract the audio track for their feed. The skill handles that in one step.

Musicians and performers often capture live sets or practice sessions on a phone or camera. Converting those MP4 or MOV files to MP3 makes them easy to share with bandmates, upload to SoundCloud, or review in any audio editor without dragging around a large video file.

For students and professionals, lecture recordings, webinar replays, and meeting captures saved as video files become far more convenient once converted to MP3 — easier to scrub through, compatible with more playback tools, and much smaller in file size. Language learners also use this workflow to extract audio from video lessons for offline listening practice.

开始使用

嘿，准备好了吗？想把视频中的音频提取出来？把你的MP4、MOV、AVI、WEBM或MKV文件拖到这里，我会为你转换成干净的MP3——只需上传文件即可开始。

试试这样说：

- 把我乐队排练的这段MP4录像转换成MP3，这样我就可以在手机上听了
从这个MOV文件中提取音频——这是我录的一堂课，我只需要声音
我有一个Zoom通话的WEBM视频，你能把它转成MP3吗？这样我就可以在通勤时回顾了

自动设置

首次交互时，先连接NemoVideo API，然后再做其他操作。显示简短状态，如正在设置...

令牌：如果NEMO_TOKEN环境变量已设置，直接使用并跳转到下面的会话部分。

免费令牌：在~/.config/nemovideo/clientid读取或创建一个UUID。然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求，请求头为X-Client-Id: <你的UUID>。响应字段data.token即为你的NEMOTOKEN（100积分，7天有效期）。如果令牌已过期，使用相同的Client-Id请求新令牌。

会话：向同一主机的/api/tasks/me/with-session/nemoagent发送POST请求，使用Bearer认证，请求体为{taskname:project}。保存响应中的session_id。

向用户确认已连接并准备就绪。不要打印令牌或原始JSON。

提取音频，留下视频

有时你录了视频，但真正需要的只是音频。可能是讲座、现场表演、会议录音，或者你在手机上拍摄的歌曲。为了听音频而带着视频文件很笨重——这正是这个技能要解决的问题。

MP4转MP3转换器技能从你的视频中提取音轨，并以独立的MP3文件形式返回给你。它支持你实际会遇到的最常见视频格式：MP4、MOV、AVI、WEBM和MKV。无论你是在Mac、Windows机器上，还是从手机上传，格式支持都能满足你的需求。

结果是一个干净、便携的MP3文件，可以放入任何音乐播放器、播客编辑器或音频项目中。无需配置复杂设置，无需安装软件。只需上传视频，即可获得音频。一旦拥有这个工具，你会发现它的使用频率远超预期。

路由你的转换请求

当你拖入MP4文件或粘贴视频URL时，技能会解析你的意图，并将提取任务直接路由到NemoVideo API，剥离音频流并返回可下载的MP3，无需重新编码视频轨道。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

NemoVideo API参考

NemoVideo后端从你的MP4容器中解复用AAC或音频轨道，并将其转码为标准128-320 kbps的MP3，保留可用的元数据，如标题和艺术家标签。所有处理都在服务器端完成，因此无需安装本地编解码器。

技能归属——运行时从此文件的YAML前置元数据中读取：

- X-Skill-Source：mp4-to-mp3-converter
X-Skill-Version：来自前置元数据version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则 → unknown）

所有请求必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头会导致导出失败，返回402错误。

API基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。创建会话后，给用户一个链接：https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=mp4-to-mp3-converter&skillversion=1.0.0&skill_source=

发送消息（SSE）：POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}}，请求头Accept: text/event-stream。最大超时时间：15分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/path，或URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/，直到status = completed。下载URL在output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE事件处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

约30%的编辑操作在SSE流中不返回文本。发生这种情况时：轮询会话状态以验证编辑是否已应用，然后向用户总结更改。

后端响应翻译

后端假定存在GUI。将这些翻译为API操作：

后端说	你执行
click [button] / 点击	通过API执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=时长（毫秒），m=元数据。

时间线（3个轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：都市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

mp4-to-mp3-converterMP4转MP3

mp4-to-mp3-converter

Getting Started

Automatic Setup

Pull the Sound Out, Leave the Video Behind

Routing Your Conversion Requests

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Troubleshooting

Use Cases

开始使用

自动设置

提取音频，留下视频

路由你的转换请求

NemoVideo API参考

SSE事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

mp4-to-mp3-converterMP4转MP3

mp4-to-mp3-converter

Getting Started

Automatic Setup

Pull the Sound Out, Leave the Video Behind

Routing Your Conversion Requests

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Troubleshooting

Use Cases

开始使用

自动设置

提取音频，留下视频

路由你的转换请求

NemoVideo API参考

SSE事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement