Lyrics Transcription Skill
Transcribe audio files to timestamped lyrics (LRC/SRT/JSON) via OpenAI Whisper or ElevenLabs Scribe API.
API Key Setup Guide
Before transcribing, you MUST check whether the user's API key is configured. Run the following command to check:
CODEBLOCK0
This command only reports whether the active provider's API key is set or empty — it does NOT print the actual key value. NEVER read or display the user's API key content. Do not use config --get on key fields or read config.json directly. The config --list command is safe — it automatically masks API keys as *** in output.
If the command reports the key is empty, you MUST stop and guide the user to configure it before proceeding. Do NOT attempt transcription without a valid key — it will fail.
Use AskUserQuestion to ask the user to provide their API key, with the following options and guidance:
- 1. Tell the user which provider is currently active (openai or elevenlabs) and that its API key is not configured. Explain that transcription cannot proceed without it.
- Provide clear instructions on where to obtain a key:
-
OpenAI: Get an API key at https://platform.openai.com/api-keys — requires an OpenAI account with billing enabled. The Whisper API costs ~$0.006/min.
-
ElevenLabs: Get an API key at https://elevenlabs.io/app/settings/api-keys — requires an ElevenLabs account. Free tier includes limited credits.
- 3. Also offer the option to switch to the other provider if they already have a key for it.
- Once the user provides the key, configure it using:
cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set <provider>.api_key <KEY>
- 5. If the user wants to switch providers, also run:
cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set provider <provider_name>
- 6. After configuring, re-run
config --check-key to verify the key is set before proceeding.
If the API key is already configured, proceed directly to transcription without asking.
Quick Start
CODEBLOCK3
Prerequisites
- - curl, jq, python3 (or python)
- An API key for OpenAI or ElevenLabs
Script Usage
CODEBLOCK4
Post-Transcription Lyrics Correction (MANDATORY)
CRITICAL: After transcription, you MUST manually correct the LRC file before using it for MV rendering. Transcription models frequently produce errors on sung lyrics:
- - Proper nouns: "ACE-Step" → "AC step", "Spotify" → "spot a fly"
- Similar-sounding words: "arrives" → "eyes", "open source" → "open sores"
- Merged/split words: "lighting up" → "lightin' nup"
Correction Workflow
- 1. Read the transcribed LRC file using the Read tool
- Read the original lyrics from the ACE-Step output JSON file
- Use original lyrics as a whole reference: Do NOT attempt line-by-line alignment — transcription often splits, merges, or reorders lines differently from the original. Instead, read the original lyrics in full to understand the correct wording, then scan each LRC line and fix any misrecognized words based on your knowledge of what the original lyrics say.
- Fix transcription errors: Replace misrecognized words with the correct original words, keeping the timestamps intact
- Write the corrected LRC back using the Write tool
What to Correct
- - Replace misrecognized words with their correct original versions
- Keep all
[MM:SS.cc] timestamps exactly as-is (timestamps from transcription are accurate) - Do NOT add structure tags like
[Verse] or [Chorus] — the LRC should only have timestamped text lines
Example
Transcribed (wrong):
CODEBLOCK5
Original lyrics reference:
CODEBLOCK6
Corrected (right):
CODEBLOCK7
Configuration
Config file: INLINECODE9
CODEBLOCK8
| Option | Default | Description |
|---|
| INLINECODE10 | INLINECODE11 | Active provider: openai or INLINECODE13 |
| INLINECODE14 |
lrc | Default output:
lrc,
srt, or
json |
|
openai.api_key |
"" | OpenAI API key |
|
openai.api_url |
https://api.openai.com/v1 | OpenAI API base URL |
|
openai.model |
whisper-1 | OpenAI model (whisper-1 for word timestamps) |
|
elevenlabs.api_key |
"" | ElevenLabs API key |
|
elevenlabs.api_url |
https://api.elevenlabs.io/v1 | ElevenLabs API base URL |
|
elevenlabs.model |
scribe_v2 | ElevenLabs model |
Provider Notes
| Provider | Model | Word Timestamps | Pricing |
|---|
| OpenAI | whisper-1 | Yes (segment + word) | $0.006/min |
| ElevenLabs |
scribe_v2 | Yes (word-level) | Varies by plan |
- - OpenAI
whisper-1 is the only OpenAI model supporting word-level timestamps - ElevenLabs
scribe_v2 returns word-level timestamps with type filtering - Both support multilingual transcription
Examples
CODEBLOCK9
歌词转录技能
通过OpenAI Whisper或ElevenLabs Scribe API将音频文件转录为带时间戳的歌词(LRC/SRT/JSON格式)。
API密钥设置指南
转录前,你必须检查用户的API密钥是否已配置。 运行以下命令进行检查:
bash
cd {project_root}/{.claude或.codex}/skills/acestep-lyrics-transcription/ && bash ./scripts/acestep-lyrics-transcription.sh config --check-key
此命令仅报告当前激活提供商的API密钥是否已设置或为空——它不会打印实际的密钥值。切勿读取或显示用户的API密钥内容。 不要对密钥字段使用config --get命令,也不要直接读取config.json文件。config --list命令是安全的——它会在输出中自动将API密钥屏蔽为*。
如果命令报告密钥为空,你必须停止操作并引导用户先配置密钥,然后再继续。在没有有效密钥的情况下尝试转录将会失败。
使用AskUserQuestion询问用户提供其API密钥,并提供以下选项和指导:
- 1. 告知用户当前激活的提供商(openai或elevenlabs)及其API密钥尚未配置。解释没有密钥就无法进行转录。
- 提供获取密钥的明确说明:
-
OpenAI:在https://platform.openai.com/api-keys获取API密钥——需要已启用计费的OpenAI账户。Whisper API费用约为$0.006/分钟。
-
ElevenLabs:在https://elevenlabs.io/app/settings/api-keys获取API密钥——需要ElevenLabs账户。免费套餐包含有限额度。
- 3. 同时提供切换到另一个提供商的选项(如果用户已有其密钥)。
- 用户提供密钥后,使用以下命令进行配置:
bash
cd {project
root}/{.claude或.codex}/skills/acestep-lyrics-transcription/ && bash ./scripts/acestep-lyrics-transcription.sh config --set .apikey
- 5. 如果用户想切换提供商,还需运行:
bash
cd {projectroot}/{.claude或.codex}/skills/acestep-lyrics-transcription/ && bash ./scripts/acestep-lyrics-transcription.sh config --set provider name>
- 6. 配置完成后,重新运行config --check-key以验证密钥已设置,然后再继续。
如果API密钥已配置,直接进行转录,无需询问。
快速开始
bash
1. 切换到本技能的目录
cd {project_root}/{.claude或.codex}/skills/acestep-lyrics-transcription/
2. 配置API密钥(选择其一)
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
或
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs
3. 转录
./scripts/acestep-lyrics-transcription.sh transcribe --audio /path/to/song.mp3 --language zh
4. 输出保存至:{projectroot}/acestepoutput/.lrc
前置条件
- - curl、jq、python3(或python)
- OpenAI或ElevenLabs的API密钥
脚本用法
bash
./scripts/acestep-lyrics-transcription.sh transcribe --audio [options]
选项:
-a, --audio 音频文件路径(必填)
-l, --language 语言代码(zh、en、ja等)
-f, --format 输出格式:lrc、srt、json(默认:lrc)
-p, --provider API提供商:openai、elevenlabs(覆盖配置)
-o, --output 输出文件路径(默认:acestep_output/.lrc)
转录后歌词修正(必做)
关键:转录完成后,你必须在用于MV渲染之前手动修正LRC文件。转录模型在处理演唱歌词时经常出错:
- - 专有名词:ACE-Step → AC step,Spotify → spot a fly
- 谐音词:arrives → eyes,open source → open sores
- 合并/拆分词:lighting up → lightin nup
修正流程
- 1. 使用读取工具读取转录的LRC文件
- 从ACE-Step输出JSON文件中读取原始歌词
- 将原始歌词作为整体参考:不要尝试逐行对齐——转录经常以与原始歌词不同的方式拆分、合并或重新排序行。相反,完整阅读原始歌词以理解正确的措辞,然后扫描每个LRC行,根据你对原始歌词内容的了解修正任何识别错误的词。
- 修正转录错误:用正确的原始词替换识别错误的词,保持时间戳不变
- 使用写入工具将修正后的LRC写回
需要修正的内容
- - 将识别错误的词替换为正确的原始版本
- 保持所有[MM:SS.cc]时间戳完全不变(转录的时间戳是准确的)
- 不要添加像[Verse]或[Chorus]这样的结构标签——LRC只应有带时间戳的文本行
示例
转录(错误):
[00:46.96]AC step alive,
[00:50.80]one point five eyes.
原始歌词参考:
ACE-Step alive
One point five arrives
修正后(正确):
[00:46.96]ACE-Step alive,
[00:50.80]One point five arrives.
配置
配置文件:scripts/config.json
bash
切换提供商
./scripts/acestep-lyrics-transcription.sh config --set provider openai
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs
设置API密钥
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
查看配置
./scripts/acestep-lyrics-transcription.sh config --list
| 选项 | 默认值 | 描述 |
|---|
| provider | openai | 激活的提供商:openai或elevenlabs |
| output_format |
lrc | 默认输出格式:lrc、srt或json |
| openai.api_key | | OpenAI API密钥 |
| openai.api_url | https://api.openai.com/v1 | OpenAI API基础URL |
| openai.model | whisper-1 | OpenAI模型(whisper-1支持词级时间戳) |
| elevenlabs.api_key | | ElevenLabs API密钥 |
| elevenlabs.api_url | https://api.elevenlabs.io/v1 | ElevenLabs API基础URL |
| elevenlabs.model | scribe_v2 | ElevenLabs模型 |
提供商说明
| 提供商 | 模型 | 词级时间戳 | 定价 |
|---|
| OpenAI | whisper-1 | 是(段落+词级) | $0.006/分钟 |
| ElevenLabs |
scribe_v2 | 是(词级) | 因套餐而异 |
- - OpenAI whisper-1是唯一支持词级时间戳的OpenAI模型
- ElevenLabs scribe_v2返回带类型过滤的词级时间戳
- 两者均支持多语言转录
示例
bash
基本转录(使用配置默认值)
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3
中文歌曲转LRC
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --language zh
使用ElevenLabs,输出SRT
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --provider elevenlabs --format srt
自定义输出路径
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --output ./my_lyrics.lrc