Telegram voice-to-voice (macOS Apple Silicon only)

This is an OpenClaw skill.

Requirements

- macOS on Apple Silicon.
INLINECODE0 CLI available in PATH (Speech.framework transcription).

- Project: https://github.com/finnvoor/yap (by finnvoor)

- ffmpeg available in PATH.

Compatibility note (important)

This skill is macOS-only (uses say + Speech.framework). The skill registry cannot enforce OS restrictions, so installing/running it on Linux/Windows will result in runtime failures.

Persistent reply mode (voice vs text)

Store a small per-user preference file in the workspace:

- State file: INLINECODE5
Key: Telegram sender user id (string)
Values:

- "voice" (default): reply with a Telegram voice note - "text": reply with a single text message

If the file does not exist or the sender id is missing: assume "voice".

Toggle commands

If an inbound text message is exactly:

- /audio off → set state to "text" and confirm with a short text reply.
INLINECODE11 → set state to "voice" and confirm with a short text reply.

Getting the inbound audio (.ogg)

Telegram voice notes often show up as <media:audio> in message text.
OpenClaw saves the attachment to disk (typically .ogg) under:

- INLINECODE15

Recommended approach:

1) If the inbound message context includes an attachment path, use it.
2) Otherwise, take the most recent *.ogg from ~/.openclaw/media/inbound/.

Transcription

Default locale: macOS system locale.

Optional env:

- YAP_LOCALE — override the transcription locale (e.g. it-IT, en-US).

Preferred:

- INLINECODE21

- If YAP_LOCALE is not set, the helper script will use the macOS system locale (from defaults read -g AppleLocale).

If transcription fails or is empty: ask the user to repeat or send text.

Helper script:

- INLINECODE24

Reply behavior

Mode: voice (default)

Voice default: SYSTEM (uses the current macOS system voice). You can override by passing a specific voice name to the helper script.

1) Generate the reply text.
2) Convert reply text to an OGG/Opus voice note using:

- INLINECODE25

The script prints the generated .ogg path to stdout.

3) Send the .ogg back to Telegram as a voice note (not a generic audio file):

- use the message tool with asVoice: true and INLINECODE30
optionally set replyTo to thread the response

Notes:

- Use SYSTEM to rely on the current macOS system voice (recommended).

Mode: text

Reply with a single text message:

- INLINECODE33
INLINECODE34

Telegram 语音转语音（仅限 macOS Apple Silicon）

这是一个 OpenClaw 技能。

系统要求

- 搭载 Apple Silicon 的 macOS。
yap CLI 需在 PATH 环境变量中可用（Speech.framework 转录功能）。

- 项目地址：https://github.com/finnvoor/yap（作者：finnvoor）

- ffmpeg 需在 PATH 环境变量中可用。

兼容性说明（重要）

本技能仅适用于 macOS（使用 say + Speech.framework）。技能注册表无法强制执行操作系统限制，因此在 Linux/Windows 上安装或运行将导致运行时错误。

持久回复模式（语音 vs 文本）

在工作区中存储每个用户的偏好文件：

- 状态文件：voice_state/telegram.json
键：Telegram 发送者用户 ID（字符串）
值：

- voice（默认）：以 Telegram 语音消息回复 - text：以单条文本消息回复

如果文件不存在或缺少发送者 ID：默认使用 voice。

切换命令

如果入站文本消息完全匹配以下内容：

- /audio off → 将状态设置为 text，并以简短文本回复确认。
/audio on → 将状态设置为 voice，并以简短文本回复确认。

获取入站音频（.ogg）

Telegram 语音消息通常以形式出现在消息文本中。
OpenClaw 将附件保存到磁盘（通常为 .ogg），路径如下：

- ~/.openclaw/media/inbound/

推荐方法：

1) 如果入站消息上下文包含附件路径，则使用该路径。
2) 否则，从 ~/.openclaw/media/inbound/ 中获取最新的 *.ogg 文件。

转录

默认区域设置：macOS 系统区域设置。

可选环境变量：

- YAP_LOCALE — 覆盖转录区域设置（例如 it-IT、en-US）。

推荐命令：

- yap transcribe --locale ${YAP_LOCALE:-}

- 如果未设置 YAP_LOCALE，辅助脚本将使用 macOS 系统区域设置（通过 defaults read -g AppleLocale 获取）。

如果转录失败或结果为空：请用户重复或发送文本。

辅助脚本：

- scripts/transcribetelegramogg.sh [path.ogg]

回复行为

模式：语音（默认）

语音默认值：SYSTEM（使用当前 macOS 系统语音）。您可以通过向辅助脚本传递特定语音名称来覆盖此设置。

1) 生成回复文本。
2) 使用以下命令将回复文本转换为 OGG/Opus 语音消息：

- scripts/ttstelegramvoice.sh <回复文本> [SYSTEM|语音名称]

该脚本将生成的 .ogg 路径输出到标准输出。

3) 将 .ogg 文件作为语音消息（而非普通音频文件）发送回 Telegram：

- 使用 message 工具，设置 asVoice: true 和 media:
可选：设置 replyTo 以关联回复

注意：

- 使用 SYSTEM 可依赖当前 macOS 系统语音（推荐）。

模式：文本

以单条文本消息回复：

- 转录内容：<...>
回复内容：<...>

telegram-voice-to-voice-macos电报语音转语音