macOS Local Voice

Fully local speech-to-text (STT) and text-to-speech (TTS) on macOS. No API keys, no network, no cloud. All processing happens on-device.

Requirements

- macOS (Apple Silicon recommended, Intel works too)
INLINECODE0 CLI in PATH — install via INLINECODE1
INLINECODE2 in PATH (optional, needed for ogg/opus output) — INLINECODE3
INLINECODE4 and osascript are macOS built-in

Speech-to-Text (STT)

Transcribe an audio file to text using Apple's on-device speech recognition.

CODEBLOCK0

- audio_file: path to audio (ogg, m4a, mp3, wav, etc.)
INLINECODE7: optional, e.g. zh_CN, en_US, ja_JP. If omitted, uses system default.
Outputs transcribed text to stdout.

Supported STT locales

Use node {baseDir}/scripts/stt.mjs --locales to list all supported locales.

Key locales: en_US, en_GB, zh_CN, zh_TW, zh_HK, ja_JP, ko_KR, fr_FR, de_DE, es_ES, pt_BR, ru_RU, vi_VN, th_TH.

Language detection tips

- If the user's recent messages are in Chinese → use INLINECODE26
If in English → use INLINECODE27
If mixed or unclear → try without locale (system default)

Text-to-Speech (TTS)

Convert text to an audio file using macOS native TTS.

CODEBLOCK1

- text: the text to speak
INLINECODE29: optional, e.g. Yue (Premium), Tingting, Ava (Premium). If omitted, auto-selects the best available voice based on text language.
INLINECODE33: optional, defaults to a timestamped file in INLINECODE34
Outputs the generated audio file path to stdout.
If ffmpeg is available, output is ogg/opus (ideal for messaging platforms). Otherwise aiff.

Sending as voice note

After generating the audio file, send it using the message tool:

CODEBLOCK2

Voice Management

List available voices, check readiness, or find the best voice for a language:

CODEBLOCK3

Quality levels

- 1 = compact (low quality, always available)
2 = enhanced (mid quality, may need download)
3 = premium (highest quality, needs download from System Settings)

If a voice is not available

Tell the user: "Voice X is not downloaded. Go to System Settings → Accessibility → Spoken Content → System Voice → Manage Voices to download it."

Notes

- The say command silently falls back to a default voice if the requested voice is not available (exit code 0, no error). Always use voices.mjs check before calling tts.mjs with a specific voice name.
Premium voices (e.g. Yue (Premium), Ava (Premium)) sound significantly better but must be manually downloaded by the user.
Siri voices are not accessible via the speech synthesis API.

macOS 本地语音

在 macOS 上完全本地化的语音转文字（STT）和文字转语音（TTS）。无需 API 密钥，无需网络，无需云端。所有处理均在设备本地完成。

系统要求

- macOS（推荐 Apple Silicon，Intel 也可用）
环境变量 PATH 中包含 yap 命令行工具 — 通过 brew install finnvoor/tools/yap 安装
环境变量 PATH 中包含 ffmpeg（可选，用于 ogg/opus 格式输出）— brew install ffmpeg
say 和 osascript 为 macOS 内置工具

语音转文字（STT）

使用 Apple 设备端语音识别将音频文件转录为文字。

bash
node {baseDir}/scripts/stt.mjs <音频文件> [语言区域]

- 音频文件：音频文件路径（支持 ogg、m4a、mp3、wav 等格式）
语言区域：可选参数，例如 zhCN、enUS、ja_JP。省略时使用系统默认设置。
将转录的文字输出到标准输出。

支持的 STT 语言区域

使用 node {baseDir}/scripts/stt.mjs --locales 列出所有支持的语言区域。

主要语言区域：enUS、enGB、zhCN、zhTW、zhHK、jaJP、koKR、frFR、deDE、esES、ptBR、ruRU、viVN、thTH。

语言检测建议

- 如果用户最近的消息为中文 → 使用 zhCN
如果为英文 → 使用 enUS
如果混合或不确定 → 尝试不指定语言区域（使用系统默认）

文字转语音（TTS）

使用 macOS 原生 TTS 将文字转换为音频文件。

bash
node {baseDir}/scripts/tts.mjs <文字> [语音名称] [输出路径]

- 文字：需要朗读的文字
语音名称：可选参数，例如 Yue (Premium)、Tingting、Ava (Premium)。省略时根据文字语言自动选择最佳可用语音。
输出路径：可选参数，默认为 ~/.openclaw/media/outbound/ 目录下带时间戳的文件。
将生成的音频文件路径输出到标准输出。
如果 ffmpeg 可用，输出格式为 ogg/opus（适合消息平台）。否则为 aiff 格式。

发送语音消息

生成音频文件后，使用 message 工具发送：

message action=send media=<来自 tts.sh 的路径> asVoice=true

语音管理

列出可用语音、检查就绪状态或查找特定语言的最佳语音：

bash
node {baseDir}/scripts/voices.mjs list [语言区域] # 列出语音，可按语言区域筛选
node {baseDir}/scripts/voices.mjs check <名称> # 检查指定语音是否已下载并就绪
node {baseDir}/scripts/voices.mjs best <语言区域> # 获取指定语言区域的最佳质量语音

质量等级

- 1 = 精简版（低质量，始终可用）
2 = 增强版（中等质量，可能需要下载）
3 = 高级版（最高质量，需从系统设置下载）

如果语音不可用

告知用户：语音 X 尚未下载。请前往 系统设置 → 辅助功能 → 朗读内容 → 系统语音 → 管理语音 进行下载。

注意事项

- 如果请求的语音不可用，say 命令会静默回退到默认语音（退出码为 0，无错误信息）。务必在调用 tts.mjs 指定语音名称前使用 voices.mjs check 进行检查。
高级语音（例如 Yue (Premium)、Ava (Premium)）音质明显更好，但需要用户手动下载。
Siri 语音无法通过语音合成 API 访问。

macos-local-voicemacOS本地语音

macos-local-voice

macOS Local Voice

Requirements