Requirements
Required:
- -
ffmpeg / ffprobe — core audio processing
Optional (for advanced features):
- -
sox — additional noise reduction - INLINECODE3 — local transcription (or use API)
- INLINECODE4 — stem separation
Quick Reference
| Situation | Load |
|---|
| FFmpeg commands by task | INLINECODE5 |
| Loudness standards by platform |
loudness.md |
| Podcast production workflow |
podcast.md |
| Transcription workflow |
transcription.md |
Core Capabilities
| Task | Method |
|---|
| Convert formats | FFmpeg (-acodec) |
| Remove noise |
FFmpeg filters or SoX |
| Normalize loudness |
ffmpeg-normalize or
-af loudnorm |
| Transcribe | Whisper → text, SRT, VTT |
| Separate stems | Demucs (vocals, drums, bass, other) |
Execution Pattern
- 1. Clarify goal — What format? What loudness? What platform?
- Analyze source —
ffprobe for codec, sample rate, channels, duration - Process — FFmpeg/SoX for transformation
- Verify — Check output plays, meets specs, sounds correct
- Deliver — Provide file to user
Common Requests → Actions
| User says | Agent does |
|---|
| "Convert to MP3" | INLINECODE13 |
| "Remove background noise" |
Apply highpass/lowpass or dedicated denoiser |
| "Normalize for podcast" |
-af loudnorm=I=-16:TP=-1.5:LRA=11 |
| "Transcribe this" | Whisper → output SRT/VTT/TXT |
| "Extract audio from video" |
-vn -acodec copy or re-encode |
| "Make it smaller" | Lower bitrate:
-b:a 128k or
-b:a 96k |
| "Speed up 1.5x" |
-af atempo=1.5 |
Format Quick Reference
| Format | Use Case | Quality |
|---|
| WAV | Master, editing | Lossless |
| FLAC |
Archive, audiophile | Lossless compressed |
| MP3 | Universal sharing | Lossy, 128-320 kbps |
| AAC/M4A | Apple, podcasts | Lossy, efficient |
| OGG/Opus | WhatsApp, Discord | Lossy, very efficient |
Quality Defaults
- - Podcast: -16 LUFS (Spotify), -19 LUFS (Apple)
- Music: -14 LUFS (Spotify), -16 LUFS (Apple Music)
- MP3 quality: VBR
-q:a 2 (~190 kbps) or CBR INLINECODE20 - Sample rate: 44.1kHz for music, 48kHz for video sync
Scope
This skill:
- - Processes audio files user explicitly provides
- Runs FFmpeg commands on user request
- Does NOT access cloud services without user knowing
- Does NOT store files persistently (user manages their files)
要求
必需:
- - ffmpeg / ffprobe — 核心音频处理
可选(用于高级功能):
- - sox — 额外降噪
- whisper — 本地转录(或使用API)
- demucs — 音轨分离
快速参考
| 场景 | 加载 |
|---|
| 按任务分类的FFmpeg命令 | commands.md |
| 各平台响度标准 |
loudness.md |
| 播客制作工作流 | podcast.md |
| 转录工作流 | transcription.md |
核心能力
| 任务 | 方法 |
|---|
| 格式转换 | FFmpeg(-acodec) |
| 降噪 |
FFmpeg滤镜或SoX |
| 响度标准化 | ffmpeg-normalize 或 -af loudnorm |
| 转录 | Whisper → 文本、SRT、VTT |
| 音轨分离 | Demucs(人声、鼓、贝斯、其他) |
执行模式
- 1. 明确目标 — 什么格式?什么响度?什么平台?
- 分析源文件 — 使用 ffprobe 检查编码器、采样率、声道、时长
- 处理 — FFmpeg/SoX 进行转换
- 验证 — 检查输出能否播放、符合规格、音质正确
- 交付 — 向用户提供文件
常见请求 → 操作
| 用户说 | 代理执行 |
|---|
| 转换为MP3 | -acodec libmp3lame -q:a 2 |
| 去除背景噪音 |
应用高通/低通或专用降噪器 |
| 为播客标准化 | -af loudnorm=I=-16:TP=-1.5:LRA=11 |
| 转录这个 | Whisper → 输出SRT/VTT/TXT |
| 从视频中提取音频 | -vn -acodec copy 或重新编码 |
| 让它更小 | 降低比特率:-b:a 128k 或 -b:a 96k |
| 加速1.5倍 | -af atempo=1.5 |
格式快速参考
存档、发烧友 | 无损压缩 |
| MP3 | 通用分享 | 有损,128-320 kbps |
| AAC/M4A | Apple、播客 | 有损,高效 |
| OGG/Opus | WhatsApp、Discord | 有损,非常高效 |
质量默认值
- - 播客: -16 LUFS(Spotify),-19 LUFS(Apple)
- 音乐: -14 LUFS(Spotify),-16 LUFS(Apple Music)
- MP3质量: VBR -q:a 2(约190 kbps)或 CBR -b:a 192k
- 采样率: 音乐44.1kHz,视频同步48kHz
范围
本技能:
- - 处理用户明确提供的音频文件
- 根据用户请求运行FFmpeg命令
- 未经用户知晓不会访问云服务
- 不会持久存储文件(用户自行管理文件)