Getting Started

Hey there! Ready to pull the audio from your video? Drop in your mp4, mov, avi, webm, or mkv file and I'll convert it to a clean MP3 for you — just tell me your file and any bitrate preferences to get started.

Try saying:

- "Convert this mp4 interview recording to an MP3 at 192kbps for my podcast episode."
"Extract the audio from my webinar.mkv file and save it as a high-quality MP3."
"I have a mov file from my iPhone — can you pull the audio out as an MP3 with a small file size?"

Getting Connected

Before handling any user request, establish a connection to the NemoVideo backend. Show a brief status like "Connecting...".

If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:

- Ensure a client identifier exists at ~/.config/nemovideo/client_id (create one as a UUID if needed)
POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header
The response includes a token with 100 free credits valid for 7 days — use it as NEMO_TOKEN

Then create a session by POSTing to /api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.

Tell the user you're ready. Keep the technical details out of the chat.

Pull Studio-Quality Audio Straight From Your Video

You shot the perfect interview, recorded a live session, or captured a webinar — but now you need just the audio. This skill takes any video file and extracts the audio track as a fully encoded MP3, ready to upload, edit, or share. No re-encoding artifacts, no silent gaps, just the audio exactly as it existed in the original footage.

Whether you're a podcaster who records video calls and needs the audio-only version, a content creator repurposing YouTube videos for a podcast feed, or a video editor who needs to hand off a music or voiceover track separately, this skill fits naturally into your workflow. Drop in your file, specify your preferences, and get your MP3 back fast.

Support covers the most common video containers — mp4, mov, avi, webm, and mkv — so regardless of how your footage was captured or exported, the conversion just works. Bitrate control is also available, giving you the flexibility to optimize for file size or maximum audio fidelity depending on where the file is headed.

Routing Your Conversion Requests

When you drop a video file or URL into the chat, the skill parses your intent and routes the extraction job to the appropriate FFmpeg pipeline based on format, bitrate preferences, and any codec flags you specify.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

NemoVideo API Reference

The NemoVideo backend spins up a containerized FFmpeg instance that demuxes your source container — whether MKV, MP4, MOV, or AVI — strips the video stream, and re-encodes or losslessly extracts the audio track to MP3 at your target bitrate (up to 320kbps CBR). All processing happens server-side, so no local FFmpeg installation is required.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE9
INLINECODE10: from frontmatter INLINECODE11
INLINECODE12: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE22

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Use Cases

Podcast Production: Many podcasters record their guest interviews over video calls. Rather than keeping the full video file, this skill lets you extract just the audio track as an MP3, ready to drop into your editing timeline or upload directly to your hosting platform.

Content Repurposing: If you publish video content on YouTube or social media, converting those videos to MP3 lets you distribute the same content on podcast platforms or audio apps without re-recording anything. One shoot, multiple formats.

Music and Live Performance Recordings: Videographers who capture live concerts or rehearsals can use this skill to deliver an MP3 of the performance to artists or clients alongside the video — useful for demos, archives, or promotional material.

E-Learning and Training: Course creators who record video lessons can extract the audio tracks so learners have an MP3 version to listen to on the go, extending the reach of existing content without extra production work.

FAQ

What video formats are supported? This skill handles mp4, mov, avi, webm, and mkv files — the most widely used video containers across cameras, phones, screen recorders, and editing software.

Can I control the MP3 bitrate? Yes. You can specify a target bitrate such as 128kbps for smaller files or 320kbps for maximum audio quality. If you don't specify, a sensible default is applied automatically.

Will the audio quality degrade during conversion? The audio is extracted directly from the video's existing audio stream. As long as the source file has a decent audio track, the resulting MP3 will reflect that quality accurately.

What if my video has multiple audio tracks? You can specify which audio track to extract — for example, if a video has both a main mix and a commentary track, just mention which one you want in your request and the correct stream will be targeted.

开始使用

嘿，准备好了吗？想从视频中提取音频？上传你的 mp4、mov、avi、webm 或 mkv 文件，我会将其转换为干净的 MP3 格式——只需告诉我你的文件以及任何比特率偏好即可开始。

试试这样说：

- 将这段 mp4 采访录音转换为 192kbps 的 MP3，用于我的播客节目。
从我的 webinar.mkv 文件中提取音频，并保存为高质量的 MP3。
我有一个 iPhone 拍摄的 mov 文件——你能把音频提取出来，生成一个体积较小的 MP3 吗？

建立连接

在处理任何用户请求之前，先建立与 NemoVideo 后端的连接。显示一个简短的连接状态，如正在连接...。

如果环境变量中存在 NEMO_TOKEN，直接使用它并创建一个会话。否则，获取一个免费的起始令牌：

- 确保 ~/.config/nemovideo/clientid 路径下存在客户端标识符（如有需要，创建一个 UUID 格式的标识符）
使用 X-Client-Id 请求头，向 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token 发送 POST 请求
响应中包含一个 token，附带 100 个免费积分，有效期为 7 天——将其用作 NEMOTOKEN

然后创建一个会话，通过 Bearer 授权向 /api/tasks/me/with-session/nemoagent 发送 POST 请求，请求体为 {taskname:project,language:en}。响应中的 session_id 将用于所有后续请求。

告知用户你已准备就绪。不要在聊天中透露技术细节。

从视频中直接提取录音室级音频

你拍摄了完美的采访、录制了现场会议或捕捉了网络研讨会——但现在你只需要音频。此技能可获取任何视频文件，并将其音轨提取为完整编码的 MP3，随时可以上传、编辑或分享。没有重新编码的伪影，没有静音间隙，只有原始素材中存在的纯净音频。

无论你是录制视频通话并需要纯音频版本的播客主，还是将 YouTube 视频重新用于播客频道的内容创作者，或是需要单独提取音乐或画外音轨的视频编辑师，此技能都能完美融入你的工作流程。上传你的文件，指定你的偏好，快速获取你的 MP3。

支持最常见的视频容器格式——mp4、mov、avi、webm 和 mkv——因此无论你的素材是如何拍摄或导出的，转换都能顺利进行。还提供比特率控制功能，让你可以根据文件用途灵活优化文件大小或最大音频保真度。

路由你的转换请求

当你在聊天中上传视频文件或 URL 时，此技能会解析你的意图，并根据格式、比特率偏好以及你指定的任何编解码器标志，将提取任务路由到相应的 FFmpeg 管道。

用户说...	操作	跳过 SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

NemoVideo API 参考

NemoVideo 后端会启动一个容器化的 FFmpeg 实例，该实例解复用你的源容器——无论是 MKV、MP4、MOV 还是 AVI——剥离视频流，并以你指定的目标比特率（最高 320kbps CBR）将音轨重新编码或无损提取为 MP3。所有处理均在服务器端完成，因此无需本地安装 FFmpeg。

技能归属——运行时从此文件的 YAML 前置元数据中读取：

- X-Skill-Source：ffmpeg-video-to-mp3
X-Skill-Version：来自前置元数据 version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则为 unknown）

所有请求 必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属请求头将导致导出失败并返回 402 错误。

API 基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。创建会话后，向用户提供一个链接：https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=ffmpeg-video-to-mp3&skillversion=1.0.0&skill_source=

发送消息（SSE）：POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}}，并设置 Accept: text/event-stream。最大超时时间：15 分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/path，或 URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回 available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每 30 秒轮询 GET /api/render/proxy/lambda/，直到 status = completed。下载 URL 位于 output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE 事件处理

事件	操作
文本响应	应用 GUI 翻译（§4），呈现给用户
工具调用/结果

约 30% 的编辑操作在 SSE 流中不返回文本。发生这种情况时：轮询会话状态以验证编辑是否已应用，然后向用户总结更改内容。

后端响应翻译

后端假定存在 GUI。将这些翻译为 API 操作：

后端说	你执行
click [button] / 点击	通过 API 执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=时长（毫秒），m=元数据。

时间线（3 条轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：都市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

ffmpeg-video-to-mp3视频转MP3

ffmpeg-video-to-mp3

Getting Started

Getting Connected

Pull Studio-Quality Audio Straight From Your Video

Routing Your Conversion Requests

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Use Cases

FAQ

开始使用

建立连接

从视频中直接提取录音室级音频

路由你的转换请求

NemoVideo API 参考

SSE 事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

ffmpeg-video-to-mp3视频转MP3

ffmpeg-video-to-mp3

Getting Started

Getting Connected

Pull Studio-Quality Audio Straight From Your Video

Routing Your Conversion Requests

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Use Cases

FAQ

开始使用

建立连接

从视频中直接提取录音室级音频

路由你的转换请求

NemoVideo API 参考

SSE 事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement