Whisper STT Skill

Free, local speech-to-text using OpenAI Whisper.

Prerequisites

Install dependencies (one-time setup):

CODEBLOCK0

Optional: Install ffmpeg for broader format support:

- macOS: INLINECODE0
Ubuntu: INLINECODE1

Usage

Transcribe an audio file

CODEBLOCK1

Options

Option	Description
INLINECODE2	Model size: tiny, base, small, medium, large, large-v3-turbo (default: base)
INLINECODE3

Language code: zh, en, ja, etc. (auto-detect if not specified) | | --output, -o | Output format: json, txt, srt, vtt (default: json) |

Examples

Chinese audio to text:
CODEBLOCK2

Generate subtitles (SRT):
CODEBLOCK3

Use faster model:
CODEBLOCK4

High accuracy (slower):
CODEBLOCK5

Model Selection Guide

Model	Speed	Accuracy	VRAM/RAM	Best For
tiny	~32x	Basic	~1GB	Quick tests, low resource
base

~16x | Good | ~1GB | Balanced speed/accuracy | | small | ~6x | Better | ~2GB | Better accuracy | | medium | ~2x | Very Good | ~5GB | High accuracy | | large | 1x | Excellent | ~10GB | Best quality | | large-v3-turbo | ~8x | Excellent | ~6GB | Fast + accurate (recommended) |

Troubleshooting

"ModuleNotFoundError: No module named 'whisper'"
→ Run: INLINECODE5

"ffmpeg not found"
→ Install ffmpeg or convert audio to WAV format first

Slow transcription
→ Use smaller model (tiny/base) or ensure GPU is available (Apple Silicon MPS, NVIDIA CUDA)

Poor accuracy on Chinese
→ Use --language zh explicitly and consider larger model (medium/large)

Output Formats

- json: Full result with segments, timestamps, and metadata
txt: Plain text transcription only
srt: SubRip subtitle format with timing
vtt: WebVTT subtitle format for web players

Credits

Whisper STT 技能

使用OpenAI Whisper实现免费、本地的语音转文字功能。

前置条件

安装依赖（一次性配置）：

bash
pip install openai-whisper torch

可选：安装ffmpeg以获得更广泛的格式支持：

- macOS：brew install ffmpeg
Ubuntu：sudo apt install ffmpeg

使用方法

转录音频文件

bash
python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py <音频文件>

选项参数

选项	说明
--model	模型大小：tiny、base、small、medium、large、large-v3-turbo（默认：base）
--language, -l

语言代码：zh、en、ja等（未指定时自动检测） | | --output, -o | 输出格式：json、txt、srt、vtt（默认：json） |

示例

中文音频转文字：
bash
python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py recording.m4a --language zh --output txt

生成字幕（SRT格式）：
bash
python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py video.mp4 --output srt > subtitles.srt

使用更快的模型：
bash
python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model tiny --output txt

高精度（较慢）：
bash
python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model large-v3 --output txt

模型选择指南

模型	速度	准确度	显存/内存	最佳用途
tiny	~32倍	基础	~1GB	快速测试、低资源环境
base

~16倍 | 良好 | ~1GB | 速度与准确度平衡 | | small | ~6倍 | 较好 | ~2GB | 更高准确度 | | medium | ~2倍 | 很好 | ~5GB | 高准确度 | | large | 1倍 | 优秀 | ~10GB | 最佳质量 | | large-v3-turbo | ~8倍 | 优秀 | ~6GB | 快速且准确（推荐） |

故障排除

ModuleNotFoundError: No module named whisper
→ 运行：pip install openai-whisper torch

ffmpeg not found
→ 安装ffmpeg或先将音频转换为WAV格式

转录速度慢
→ 使用更小的模型（tiny/base）或确保GPU可用（Apple Silicon MPS、NVIDIA CUDA）

中文准确度差
→ 明确使用--language zh参数，并考虑使用更大的模型（medium/large）

输出格式

- json：包含分段、时间戳和元数据的完整结果
txt：纯文本转录结果
srt：SubRip字幕格式，带时间信息
vtt：WebVTT字幕格式，适用于网页播放器

致谢

由OpenAI Whisper提供技术支持——开源语音识别系统。

whisper-stt耳语语音转文字

whisper-stt

Whisper STT Skill

Prerequisites

Usage

Transcribe an audio file

Options

Examples

Model Selection Guide

Troubleshooting

Output Formats

Credits

Whisper STT 技能

前置条件

使用方法

转录音频文件

选项参数

示例

模型选择指南

故障排除

输出格式

致谢

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

whisper-stt耳语语音转文字

whisper-stt

Whisper STT Skill

Prerequisites

Usage

Transcribe an audio file

Options

Examples

Model Selection Guide

Troubleshooting

Output Formats

Credits

Whisper STT 技能

前置条件

使用方法

转录音频文件

选项参数

示例

模型选择指南

故障排除

输出格式

致谢

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement