Lyrics Transcription Skill

Transcribe audio files to timestamped lyrics (LRC/SRT/JSON) via OpenAI Whisper or ElevenLabs Scribe API.

API Key Setup Guide

Before transcribing, you MUST check whether the user's API key is configured. Run the following command to check:

CODEBLOCK0

This command only reports whether the active provider's API key is set or empty — it does NOT print the actual key value. NEVER read or display the user's API key content. Do not use config --get on key fields or read config.json directly. The config --list command is safe — it automatically masks API keys as *** in output.

If the command reports the key is empty, you MUST stop and guide the user to configure it before proceeding. Do NOT attempt transcription without a valid key — it will fail.

Use AskUserQuestion to ask the user to provide their API key, with the following options and guidance:

1. Tell the user which provider is currently active (openai or elevenlabs) and that its API key is not configured. Explain that transcription cannot proceed without it.
Provide clear instructions on where to obtain a key:

- OpenAI: Get an API key at https://platform.openai.com/api-keys — requires an OpenAI account with billing enabled. The Whisper API costs ~$0.006/min. - ElevenLabs: Get an API key at https://elevenlabs.io/app/settings/api-keys — requires an ElevenLabs account. Free tier includes limited credits.

3. Also offer the option to switch to the other provider if they already have a key for it.
Once the user provides the key, configure it using:

   cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set <provider>.api_key <KEY>

5. If the user wants to switch providers, also run:

   cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set provider <provider_name>

6. After configuring, re-run config --check-key to verify the key is set before proceeding.

If the API key is already configured, proceed directly to transcription without asking.

Quick Start

CODEBLOCK3

Prerequisites

- curl, jq, python3 (or python)
An API key for OpenAI or ElevenLabs

Script Usage

CODEBLOCK4

Post-Transcription Lyrics Correction (MANDATORY)

CRITICAL: After transcription, you MUST manually correct the LRC file before using it for MV rendering. Transcription models frequently produce errors on sung lyrics:

- Proper nouns: "ACE-Step" → "AC step", "Spotify" → "spot a fly"
Similar-sounding words: "arrives" → "eyes", "open source" → "open sores"
Merged/split words: "lighting up" → "lightin' nup"

Correction Workflow

1. Read the transcribed LRC file using the Read tool
Read the original lyrics from the ACE-Step output JSON file
Use original lyrics as a whole reference: Do NOT attempt line-by-line alignment — transcription often splits, merges, or reorders lines differently from the original. Instead, read the original lyrics in full to understand the correct wording, then scan each LRC line and fix any misrecognized words based on your knowledge of what the original lyrics say.
Fix transcription errors: Replace misrecognized words with the correct original words, keeping the timestamps intact
Write the corrected LRC back using the Write tool

What to Correct

- Replace misrecognized words with their correct original versions
Keep all [MM:SS.cc] timestamps exactly as-is (timestamps from transcription are accurate)
Do NOT add structure tags like [Verse] or [Chorus] — the LRC should only have timestamped text lines

Example

Transcribed (wrong):
CODEBLOCK5

Original lyrics reference:
CODEBLOCK6

Corrected (right):
CODEBLOCK7

Configuration

Config file: INLINECODE9

CODEBLOCK8

Option	Default	Description
INLINECODE10	INLINECODE11	Active provider: `openai` or INLINECODE13
INLINECODE14

Provider Notes

Provider	Model	Word Timestamps	Pricing
OpenAI	whisper-1	Yes (segment + word)	$0.006/min
ElevenLabs

scribe_v2 | Yes (word-level) | Varies by plan |

- OpenAI whisper-1 is the only OpenAI model supporting word-level timestamps
ElevenLabs scribe_v2 returns word-level timestamps with type filtering
Both support multilingual transcription

Examples

CODEBLOCK9

歌词转录技能

通过OpenAI Whisper或ElevenLabs Scribe API将音频文件转录为带时间戳的歌词（LRC/SRT/JSON格式）。

API密钥设置指南

转录前，你必须检查用户的API密钥是否已配置。 运行以下命令进行检查：

bash
cd {project_root}/{.claude或.codex}/skills/acestep-lyrics-transcription/ && bash ./scripts/acestep-lyrics-transcription.sh config --check-key

此命令仅报告当前激活提供商的API密钥是否已设置或为空——它不会打印实际的密钥值。切勿读取或显示用户的API密钥内容。 不要对密钥字段使用config --get命令，也不要直接读取config.json文件。config --list命令是安全的——它会在输出中自动将API密钥屏蔽为*。

如果命令报告密钥为空，你必须停止操作并引导用户先配置密钥，然后再继续。在没有有效密钥的情况下尝试转录将会失败。

使用AskUserQuestion询问用户提供其API密钥，并提供以下选项和指导：

1. 告知用户当前激活的提供商（openai或elevenlabs）及其API密钥尚未配置。解释没有密钥就无法进行转录。
提供获取密钥的明确说明：

- OpenAI：在https://platform.openai.com/api-keys获取API密钥——需要已启用计费的OpenAI账户。Whisper API费用约为$0.006/分钟。 - ElevenLabs：在https://elevenlabs.io/app/settings/api-keys获取API密钥——需要ElevenLabs账户。免费套餐包含有限额度。

3. 同时提供切换到另一个提供商的选项（如果用户已有其密钥）。
用户提供密钥后，使用以下命令进行配置：

bash cd {projectroot}/{.claude或.codex}/skills/acestep-lyrics-transcription/ && bash ./scripts/acestep-lyrics-transcription.sh config --set .apikey

5. 如果用户想切换提供商，还需运行：

bash cd {projectroot}/{.claude或.codex}/skills/acestep-lyrics-transcription/ && bash ./scripts/acestep-lyrics-transcription.sh config --set provider name>
6. 配置完成后，重新运行config --check-key以验证密钥已设置，然后再继续。

如果API密钥已配置，直接进行转录，无需询问。

快速开始

bash

1. 切换到本技能的目录

cd {project_root}/{.claude或.codex}/skills/acestep-lyrics-transcription/

2. 配置API密钥（选择其一）
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
或
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ... ./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs
3. 转录
./scripts/acestep-lyrics-transcription.sh transcribe --audio /path/to/song.mp3 --language zh
4. 输出保存至：{projectroot}/acestepoutput/.lrc

前置条件

- curl、jq、python3（或python）
OpenAI或ElevenLabs的API密钥

脚本用法

bash
./scripts/acestep-lyrics-transcription.sh transcribe --audio [options]

选项：
-a, --audio 音频文件路径（必填）
-l, --language 语言代码（zh、en、ja等）
-f, --format 输出格式：lrc、srt、json（默认：lrc）
-p, --provider API提供商：openai、elevenlabs（覆盖配置）
-o, --output 输出文件路径（默认：acestep_output/.lrc）

转录后歌词修正（必做）

关键：转录完成后，你必须在用于MV渲染之前手动修正LRC文件。转录模型在处理演唱歌词时经常出错：

- 专有名词：ACE-Step → AC step，Spotify → spot a fly
谐音词：arrives → eyes，open source → open sores
合并/拆分词：lighting up → lightin nup

修正流程

1. 使用读取工具读取转录的LRC文件
从ACE-Step输出JSON文件中读取原始歌词
将原始歌词作为整体参考：不要尝试逐行对齐——转录经常以与原始歌词不同的方式拆分、合并或重新排序行。相反，完整阅读原始歌词以理解正确的措辞，然后扫描每个LRC行，根据你对原始歌词内容的了解修正任何识别错误的词。
修正转录错误：用正确的原始词替换识别错误的词，保持时间戳不变
使用写入工具将修正后的LRC写回

需要修正的内容

- 将识别错误的词替换为正确的原始版本
保持所有[MM:SS.cc]时间戳完全不变（转录的时间戳是准确的）
不要添加像[Verse]或[Chorus]这样的结构标签——LRC只应有带时间戳的文本行

示例

转录（错误）：

[00:46.96]AC step alive,
[00:50.80]one point five eyes.

原始歌词参考：

ACE-Step alive
One point five arrives

修正后（正确）：

[00:46.96]ACE-Step alive,
[00:50.80]One point five arrives.

配置

配置文件：scripts/config.json

bash

切换提供商

./scripts/acestep-lyrics-transcription.sh config --set provider openai
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

设置API密钥
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-... ./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
查看配置
./scripts/acestep-lyrics-transcription.sh config --list
选项默认值描述
provider openai 激活的提供商：openai或elevenlabs
output_format
lrc | 默认输出格式：lrc、srt或json | | openai.api_key | | OpenAI API密钥 | | openai.api_url | https://api.openai.com/v1 | OpenAI API基础URL | | openai.model | whisper-1 | OpenAI模型（whisper-1支持词级时间戳） | | elevenlabs.api_key | | ElevenLabs API密钥 | | elevenlabs.api_url | https://api.elevenlabs.io/v1 | ElevenLabs API基础URL | | elevenlabs.model | scribe_v2 | ElevenLabs模型 |
提供商说明

提供商模型词级时间戳定价
OpenAI whisper-1 是（段落+词级） $0.006/分钟
ElevenLabs
scribe_v2 | 是（词级） | 因套餐而异 |
- OpenAI whisper-1是唯一支持词级时间戳的OpenAI模型
ElevenLabs scribe_v2返回带类型过滤的词级时间戳
两者均支持多语言转录

示例

bash

基本转录（使用配置默认值）

./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3

中文歌曲转LRC
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --language zh
使用ElevenLabs，输出SRT
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --provider elevenlabs --format srt
自定义输出路径
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --output ./my_lyrics.lrc

选项	默认值	描述
provider	openai	激活的提供商：openai或elevenlabs
output_format

提供商	模型	词级时间戳	定价
OpenAI	whisper-1	是（段落+词级）	$0.006/分钟
ElevenLabs

acestep-lyrics-transcription音频转歌词

acestep-lyrics-transcription

Lyrics Transcription Skill

API Key Setup Guide

Quick Start

Prerequisites

Script Usage

Post-Transcription Lyrics Correction (MANDATORY)

Correction Workflow

What to Correct

Example

Configuration

Provider Notes

Examples

歌词转录技能

API密钥设置指南

快速开始

1. 切换到本技能的目录

2. 配置API密钥（选择其一）

或

3. 转录

4. 输出保存至：{projectroot}/acestepoutput/.lrc

前置条件

脚本用法

转录后歌词修正（必做）

修正流程

需要修正的内容

示例

配置

切换提供商

设置API密钥

查看配置

提供商说明

示例

基本转录（使用配置默认值）

中文歌曲转LRC

使用ElevenLabs，输出SRT

自定义输出路径

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement