🎵 Voice Note to MIDI

Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.

What It Does

This skill provides a complete audio-to-MIDI conversion pipeline that:

1. Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds
ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction
Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles
Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction
Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output

Pipeline Architecture

CODEBLOCK0

Setup

Prerequisites

- Python 3.11+ (Python 3.14+ recommended)
FFmpeg (for audio format support)
pip

Installation

Quick Install (Recommended):

CODEBLOCK1

This automated script will:

- Check Python 3.11+ is installed
Create the ~/melody-pipeline directory
Set up the virtual environment
Install all dependencies (basic-pitch, librosa, music21, etc.)
Download and configure the hum2midi script
Add melody-pipeline to your PATH

Manual Install:

If you prefer manual setup:

CODEBLOCK2

5. Add to your PATH (optional):

CODEBLOCK3

Verify Installation

CODEBLOCK4

Usage

Basic Usage

Convert a voice memo to MIDI:

CODEBLOCK5

This creates my_humming.mid with 16th-note quantization.

Specify Output File

CODEBLOCK6

Command-Line Options

Option	Description	Default
INLINECODE2	Quantization grid: `1/4`, `1/8`, `1/16`, INLINECODE6	INLINECODE7
INLINECODE8

Usage Examples

Quantize to eighth notes

CODEBLOCK7

Key-aware quantization (recommended for tonal music)

CODEBLOCK8

Require longer minimum notes

CODEBLOCK9

Skip analysis for faster processing

CODEBLOCK10

Combine options

CODEBLOCK11

Processing MIDI Input

You can also process existing MIDI files through the quantization pipeline:

CODEBLOCK12

This skips the audio processing steps and goes directly to analysis and quantization.

Sample Output

CODEBLOCK13

Notes & Limitations

Audio Quality Matters

- Clear, loud melody produces the best results
Background noise can cause false note detection
Reverb and effects may confuse pitch detection
Close-mic'd vocals work significantly better than room recordings

Musical Considerations

- Monophonic sources work best (single melody line)
Polyphonic audio (chords, multiple instruments) will produce messy results
Vibrato and pitch bends may be quantized to stepped pitches
Rapid note passages may be missed or merged

Technical Limitations

- Tempo is fixed at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW)
Note velocities are normalized but may need manual adjustment
Very short notes (<50ms) may be filtered out by default
Extreme pitch ranges may cause octave detection issues

Post-Processing Recommendations

After generating MIDI, you may want to:

1. Import into your DAW and adjust tempo to match your original recording
Quantize further if stricter timing is needed
Adjust note velocities for dynamics
Apply swing/groove templates if the rigid grid sounds too mechanical
Edit individual notes that were misdetected (common with fast runs)

Supported Audio Formats

Input formats supported via FFmpeg:

- WAV, AIFF, FLAC (uncompressed, best quality)
MP3, M4A, AAC (compressed, acceptable)
OGG, OPUS (open source formats)
Most other formats FFmpeg supports

Troubleshooting

No notes detected

- Check that input file isn't silent or corrupted
Try increasing --min-note threshold
Verify audio has clear melodic content (not just noise)

Too many notes / messy output

- Enable octave pruning and overlap pruning (on by default)
Use --key-aware to constrain to musical scale
Check for background noise in source audio

Wrong key detected

- Key detection works best with at least 8-10 measures of music
Chromatic passages may confuse the detector
Manually review and adjust in your DAW if needed

Notes in wrong octave

- Basic Pitch sometimes detects harmonics instead of fundamentals
The pipeline includes pruning, but some may slip through
Use your DAW's transpose function for simple octave shifts

References

- Basic Pitch - Spotify's polyphonic pitch detection model
librosa HPSS - Harmonic-Percussive Source Separation
Krumhansl-Kessler Key Profiles - Key detection algorithm

License

This skill integrates Basic Pitch by Spotify, which is licensed under Apache 2.0. The pipeline script and documentation are provided under MIT license.

🎵 语音笔记转MIDI

将您的语音备忘录、哼唱和旋律录音转换为干净、量化后的MIDI文件，可直接用于您的数字音频工作站。

功能说明

本技能提供完整的音频转MIDI转换流程，包含：

1. 音源分离 - 使用HPSS（谐波-打击乐源分离）技术，从鼓点、噪音和背景声中分离出旋律内容
基于机器学习的音高检测 - 利用Spotify的Basic Pitch模型进行精确的基频提取
调性检测 - 使用Krumhansl-Kessler调性配置文件自动检测录音的音乐调性
智能量化 - 将音符对齐到可配置的时值网格，并支持可选的调性感知音高修正
后处理 - 应用八度修剪、基于重叠的和声移除以及连奏音符合并，确保输出干净

流程架构

音频输入（WAV/M4A/MP3）
↓
┌─────────────────────────────────────┐
│ 步骤1：音源分离（HPSS） │
│ - 分离谐波内容 │
│ - 移除鼓点/打击乐 │
│ - 噪声门控 │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ 步骤2：音高检测 │
│ - Basic Pitch机器学习模型（Spotify）│
│ - 复音音符检测 │
│ - 起音/偏移估计 │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ 步骤3：分析 │
│ - 音高类别分布 │
│ - 调性检测 │
│ - 主音识别 │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ 步骤4：量化与清理 │
│ - 时值网格对齐 │
│ - 调性感知音高修正 │
│ - 八度修剪（和声移除） │
│ - 基于重叠的修剪 │
│ - 音符合并（连奏） │
│ - 力度归一化 │
└─────────────────────────────────────┘
↓
MIDI输出（标准MIDI文件）

环境配置

前置条件

- Python 3.11+（推荐Python 3.14+）
FFmpeg（用于音频格式支持）
pip

安装方法

快速安装（推荐）：

bash
cd /path/to/voice-note-to-midi
./setup.sh

此自动化脚本将：

- 检查Python 3.11+是否已安装
创建~/melody-pipeline目录
设置虚拟环境
安装所有依赖项（basic-pitch、librosa、music21等）
下载并配置hum2midi脚本
将melody-pipeline添加到您的PATH环境变量

手动安装：

如果您偏好手动设置：

bash
mkdir -p ~/melody-pipeline
cd ~/melody-pipeline
python3 -m venv venv-bp
source venv-bp/bin/activate
pip install basic-pitch librosa soundfile mido music21
chmod +x ~/melody-pipeline/hum2midi

5. 添加到PATH（可选）：

bash
echo export PATH=$HOME/melody-pipeline:$PATH >> ~/.bashrc
source ~/.bashrc

验证安装

bash
cd ~/melody-pipeline
./hum2midi --help

使用方法

基本用法

将语音备忘录转换为MIDI：

bash
./hum2midi my_humming.wav

这将创建my_humming.mid文件，使用十六分音符量化。

指定输出文件

bash
./hum2midi input.wav output.mid

命令行选项

选项	描述	默认值
--grid <值>	量化网格：1/4、1/8、1/16、1/32	1/16
--min-note <毫秒>

使用示例

量化为八分音符

bash ./hum2midi melody.wav --grid 1/8

调性感知量化（推荐用于调性音乐）

bash ./hum2midi song.wav --key-aware

要求更长的最小音符

bash ./hum2midi humming.wav --min-note 100

跳过分析以加快处理速度

bash ./hum2midi quick.wav --no-analysis

组合选项

bash ./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80

处理MIDI输入

您也可以通过量化流程处理现有的MIDI文件：

bash
./hum2midi input.mid output.mid --grid 1/16 --key-aware

这将跳过音频处理步骤，直接进入分析和量化阶段。

示例输出

═══════════════════════════════════════════════════════════════
hum2midi - 旋律转MIDI流程（Basic Pitch版）
[调性感知模式已启用]
═══════════════════════════════════════════════════════════════

输入： my_humming.wav
输出： my_humming.mid

→ 步骤1：音源分离（HPSS）
正在分离旋律内容...
已加载：5.23秒 @ 44100Hz
✓ 旋律音轨已提取 → 5.23秒

→ 步骤2：音频转MIDI转换（Basic Pitch）
正在对旋律音轨运行Spotify的Basic Pitch机器学习模型...
✓ 原始MIDI已生成（Basic Pitch）

→ 步骤3：音高分析与调性检测
检测到的音符：共42个，7个独特音高
音符范围：C3 - G4
音高类别：C3、E3、G3、A3、C4、D4、G4
主音：G3（占音符的23.8%）
检测到的调性：G大调

→ 步骤4：量化与清理
八度修剪：移除了67以上的3个和声音符（中位数+12）
重叠修剪：移除了重叠位置的2个和声音符
音符合并：将5个断奏片段合并为连奏音符（间隔<=60个时钟滴答）
网格： 240个时钟滴答（1/16）
音符： 38个音符
调性： G大调
调性感知：2个音符已修正至音阶
速度： 120 BPM
✓ 量化后的MIDI已保存

═══════════════════════════════════════════════════════════════
✓ 完成！输出：my_humming.mid
═══════════════════════════════════════════════════════════════

📊 分析摘要
─────────────────────────────────────────────────────────────
检测到的音符：C3、E3、G3、A3、C4、D4、G4
检测到的调性：G大调
量化方式：调性感知模式（音符对齐至音阶）

MIDI信息：38个音符，7个独特音高，120 BPM
音高：C3、E3、G3、A3、C4、D4、G4

注意事项与限制

音频质量至关重要

- 清晰、响亮的旋律能产生最佳效果
背景噪音可能导致错误的音符检测
混响和效果可能干扰音高检测
近距离麦克风录制的人声效果远优于房间录音

音乐方面的考虑

- 单声源效果最佳（单一旋律线）
复音音频（和弦、多种乐器）会产生杂乱的结果
颤音和音高弯曲可能被量化为阶梯式音高
快速音符段落可能被遗漏或合并

技术限制

- 速度固定为输出中的120 BPM（时间位置保持不变，但速度可能需要在您的DAW中调整）
音符力度已归一化，但可能需要手动调整
极短的音符（<50毫秒）默认可能被过滤掉
极端音高范围可能导致八度检测问题

###

voice-note-to-midi语音转MIDI