AudioClaw Skills Voice Reply

When to use

Use this skill when AudioClaw already has the final reply text and now needs a voice version that can change on demand.

Common triggers:

- A chat bot should answer in different tones such as calm, warm, cheerful, serious, or promo.
The caller wants to switch voice_id or voice_family in a single request without editing code.
The same AudioClaw workflow should support both free voices and paid or custom voices.
The runtime should keep working even when a requested voice is not available to the current key.
A user has already said "以后一直给我发语音", and AudioClaw should remember that preference for future turns.
AudioClaw needs a workspace-local file plus a stable way to deliver it as a Feishu audio message.
The caller already has a cloned AudioClaw voice_id such as vc-... and wants the runtime to use it directly without falling back first.

Do not use this skill for ASR intake or long-form digest generation.

Workflow

1. Start from the final user-facing text. Do not pass hidden reasoning or raw markdown tables.
Build an AudioClaw voice request with:

- text - optional scene - optional voice_id - optional voice_family - optional emotion - optional speed, pitch, volume

3. Run scripts/openclaw_voice_switchboard.py.

- For AudioClaw on Feishu or Lark, prefer scripts/picoclaw_voice_reply.py.

4. Let the script resolve the request in this order:

- exact voice_id - voice_family plus matching emotion variant - scene plus emotion preset - validated fallback voice - Important: if the exact voice_id is a clone-style id such as vc-..., this skill now tries that id directly first, even if it is not part of the built-in official voice catalog.

5. If preference_key is present, the script can remember:

- reply_mode - default voice_id - default emotion - default scene

6. If the result should be sent through AudioClaw media upload, pass:

- --out inside the AudioClaw workspace - --openclaw-workspace-root pointing at the workspace root - --delivery-profile feishu_voice when the downstream channel prefers .ogg/.opus - optional --chmod 644 if you want to be explicit, though this skill now defaults to 0644 - if --openclaw-workspace-root is set and --out is omitted, this skill now writes to workspace/state/audio/ automatically

7. Use the returned JSON manifest in AudioClaw to:

- prefer scripts/picoclaw_voice_reply.py for AudioClaw on Feishu - let the wrapper send the Feishu audio message directly - do not send the local path or the MEDIA:... line through the message tool - only use send_file when you intentionally opt out of direct Feishu sending - log trace_id - persist the resolved voice choice for the next turn if desired

8. If the requested voice is not available to the current key, let the fallback stand unless the user explicitly requires strict failure.
If you want AudioClaw to remember a clone voice, either:

- set it as the user's default voice with --set-default-voice-id vc-... - or register it explicitly with INLINECODE43

Voice discovery

When AudioClaw needs to find a usable voice or confirm a voice_id, use this order:

1. Check the local catalog first:

- run python3 scripts/openclaw_voice_switchboard.py --list-voices - use this for fast lookup of built-in voices, known emotion variants, and clone voices already registered locally

2. If the user asks for the official public voice list, package availability, or a voice not found locally, check the official voice page:

- https://senseaudio.cn/docs/voice_api - page title: API 音色服务说明

3. Prefer the official page when AudioClaw needs to confirm:

- whether a voice_id is a free, VIP, or SVIP voice - whether a named speaker has multiple emotion variants - whether the request likely needs selected-voice purchase or custom voice authorization

4. After finding a likely voice_id, still let the runtime validate access at synthesis time, because account permissions can differ by key.

Practical rule:

- Local --list-voices is the first-stop runtime catalog.
INLINECODE51 is the canonical fallback reference for official voice names, voice_id, and package tier notes.

AudioClaw rule

When this skill is used inside AudioClaw for Feishu or Lark voice replies:

1. Run scripts/picoclaw_voice_reply.py.
Let the wrapper upload the generated .ogg/.opus file to Feishu and send it as msg_type=audio.
Do not call the send_file tool for that audio unless you explicitly passed --skip-direct-send.
Do not call the message tool with the local path or the MEDIA:... reference. AudioClaw will send them as plain text.
After the audio is sent, prefer no extra text confirmation.
If the host runtime still requires one final assistant message to finish the turn, send one short natural Chinese line such as 我已经用语音回复你了。 instead of leaving the turn empty.
Use media_reference only as debug metadata or future AudioClaw compatibility data.

This rule matters because this AudioClaw environment does not render MEDIA:... as media, and the generic send_file tool sends Feishu voice notes as plain files instead of audio messages. The reliable path here is direct Feishu upload plus msg_type=audio.

Runtime model

The official public TTS API exposes:

- INLINECODE66
INLINECODE67
INLINECODE68
INLINECODE69
INLINECODE70
INLINECODE71
one HTTP endpoint with two modes:

- non-stream with stream=false
- SSE with INLINECODE73

Important constraint:

- The public TTS API docs do not expose a standalone emotion request field.
Emotion switching is therefore handled by choosing a matching voice_id when one exists, or by keeping the voice and shaping speed / pitch / vol.
This skill requests final-file TTS in non-stream mode by default, because AudioClaw only needs the completed file and this avoids stream assembly edge cases.
For this server-side HTTP TTS path, the official docs still use Authorization: Bearer API_KEY. The generated Public Key is not required by this skill.
If the requested voice_id looks like a clone id such as vc-..., this skill now auto-routes TTS to SenseAudio-TTS-1.5 and records audio.model_used in the manifest.

API key lookup

This skill now treats SENSEAUDIO_API_KEY as the default API key source again.

Runtime rules:

- If the host app injects SENSEAUDIO_API_KEY as an AudioClaw login token such as v2.public..., the shared bootstrap will replace it with the real sk-... value from ~/.audioclaw/workspace/state/senseaudio_credentials.json before TTS starts.
INLINECODE88 still works, but the default runtime path is SENSEAUDIO_API_KEY.

If you need the exact same speaker timbre across many emotions, use a purchased multi-variant voice family or an authorized custom voice. Otherwise this skill will approximate the requested emotion with the best available voice and tuning.

Request contract

Minimal JSON request:

CODEBLOCK0

Full request:

CODEBLOCK1

Clone voice example:

CODEBLOCK2

Supported emotion presets:

- INLINECODE90
INLINECODE91
INLINECODE92
INLINECODE93
INLINECODE94
INLINECODE95
INLINECODE96
INLINECODE97
INLINECODE98

Supported scene hints:

- INLINECODE99
INLINECODE100
INLINECODE101
INLINECODE102
INLINECODE103
INLINECODE104
INLINECODE105
INLINECODE106
INLINECODE107

AudioClaw integration pattern

Recommended handoff:

1. AudioClaw generates final reply text.
AudioClaw decides whether this turn should speak and what mood it wants.
AudioClaw calls scripts/openclaw_voice_switchboard.py with a request JSON.
The script returns a manifest with:

- requested voice - resolved voice - emotion strategy - effective speed / pitch / volume - delivery profile and file mode - local audio path - AudioClaw-friendly media reference when the output is under a workspace root - trace_id

5. AudioClaw uploads the resulting file to Feishu or any downstream channel, or lets the AudioClaw wrapper do that directly for Feishu.

Operational rules:

- Cache by resolved voice plus text plus audio settings.
If a paid voice is unavailable, allow fallback unless the request is marked strict.
Always log the resolved voice and trace_id.
Force generated audio files to mode 0644 so the AudioClaw sender can read them reliably.
When --openclaw-workspace-root is set and --out stays inside that root, expose delivery.openclaw_media_reference.
When --delivery-profile feishu_voice is enabled, synthesize with AudioClaw first and then transcode to ogg/opus with system ffmpeg or imageio-ffmpeg.
This publishable skill intentionally does not bundle ffmpeg. Install ffmpeg or run python3 -m pip install imageio-ffmpeg on the target machine.
Avoid ad-hoc temp filenames under the workspace root. Prefer workspace/state/audio/, which this skill will now use automatically when --openclaw-workspace-root is given without --out.
For AudioClaw on Feishu, scripts/picoclaw_voice_reply.py now uses the local Feishu app credentials from ~/.audioclaw/config.json, uploads the audio through the official /open-apis/im/v1/files endpoint, and sends it as msg_type=audio.
The wrapper infers the active Feishu chat_id from the latest agent_main_feishu_direct_*.jsonl session log unless you pass --chat-id explicitly.

Commands

List voices:

CODEBLOCK3

Check the official voice catalog page:

CODEBLOCK4

List emotion presets:

CODEBLOCK5

Enable permanent voice reply for one user:

CODEBLOCK6

Enable permanent cloned-voice reply for one user:

CODEBLOCK7

CODEBLOCK8

Show registered clone voices:

CODEBLOCK9

Show current voice preference:

CODEBLOCK10

Generate one AudioClaw turn from a JSON request:

CODEBLOCK11

Direct CLI example:

CODEBLOCK12

AudioClaw Feishu one-step example:

CODEBLOCK13

Only generate, do not send:

CODEBLOCK14

Resources

- INLINECODE133

- Small importable client for https://api.senseaudio.cn/v1/t2a_v2 - Handles SSE chunks and writes audio bytes

- INLINECODE135

- TTS capability summary plus the official voice catalog reference at https://senseaudio.cn/docs/voice_api

- INLINECODE137

- Main runtime for AudioClaw - Resolves voice, emotion, fallback, caching, and output manifest

- INLINECODE138

- AudioClaw-first wrapper - Generates audio and, by default, sends it to Feishu as a real audio message

- INLINECODE140

- Direct Feishu sender for .ogg/.opus - Uses ~/.audioclaw/config.json app credentials by default, infers the active chat, uploads the file, and sends msg_type=audio

- INLINECODE144

- Official docs summary, request examples, and AudioClaw capability boundaries

AudioClaw 技能语音回复

使用时机

当 AudioClaw 已有最终回复文本，且需要可按需变更的语音版本时使用此技能。

常见触发场景：

- 聊天机器人需要用不同语气回复，如平静、温暖、愉快、严肃或促销
呼叫者希望在单次请求中切换 voiceid 或 voicefamily，无需修改代码
同一 AudioClaw 工作流需同时支持免费语音和付费或自定义语音
当请求的语音对当前密钥不可用时，运行时仍能正常工作
用户已说过以后一直给我发语音，AudioClaw 需在后续轮次记住此偏好
AudioClaw 需要工作区本地文件，以及将其作为飞书语音消息发送的稳定方式
呼叫者已有克隆的 AudioClaw voice_id（如 vc-...），希望运行时直接使用，不先回退

请勿将此技能用于 ASR 输入或长文本摘要生成。

工作流程

1. 从最终面向用户的文本开始。不传递隐藏推理或原始 markdown 表格。
构建 AudioClaw 语音请求，包含：

- text - 可选 scene - 可选 voice_id - 可选 voice_family - 可选 emotion - 可选 speed、pitch、volume

3. 运行 scripts/openclawvoiceswitchboard.py

- 对于飞书或 Lark 上的 AudioClaw，优先使用 scripts/picoclawvoicereply.py

4. 让脚本按以下顺序解析请求：

- 精确匹配 voice_id - voice_family 加匹配的情绪变体 - 场景加情绪预设 - 已验证的回退语音 - 重要：如果精确的 voice_id 是克隆类型 ID（如 vc-...），此技能现在会先直接尝试该 ID，即使它不在内置官方语音目录中

5. 如果存在 preference_key，脚本可记住：

- reply_mode - 默认 voice_id - 默认 emotion - 默认 scene

6. 如果结果需通过 AudioClaw 媒体上传发送，传递：

- --out 在 AudioClaw 工作区内 - --openclaw-workspace-root 指向工作区根目录 - --delivery-profile feishu_voice（当下游通道偏好 .ogg/.opus 时） - 可选 --chmod 644（如需明确指定，此技能现在默认使用 0644） - 如果设置了 --openclaw-workspace-root 但省略了 --out，此技能现在会自动写入 workspace/state/audio/

7. 在 AudioClaw 中使用返回的 JSON 清单：

- 对于飞书上的 AudioClaw，优先使用 scripts/picoclawvoicereply.py - 让包装器直接发送飞书音频消息 - 不通过 message 工具发送本地路径或 MEDIA:... 行 - 仅在有意选择不直接发送飞书时使用 send_file - 记录 trace_id - 如需，保留已解析的语音选择供下一轮使用

8. 如果请求的语音对当前密钥不可用，让回退生效，除非用户明确要求严格失败。
如果希望 AudioClaw 记住克隆语音，可：

- 使用 --set-default-voice-id vc-... 将其设置为用户的默认语音 - 或使用 --register-clone-voice-id vc-... 显式注册

语音发现

当 AudioClaw 需要找到可用语音或确认 voice_id 时，按此顺序操作：

- 运行 python3 scripts/openclawvoiceswitchboard.py --list-voices - 用于快速查找内置语音、已知情绪变体和已本地注册的克隆语音

2. 如果用户询问官方公开语音列表、套餐可用性或本地未找到的语音，检查官方语音页面：

- https://senseaudio.cn/docs/voice_api - 页面标题：API 音色服务说明

3. 当 AudioClaw 需要确认以下内容时，优先使用官方页面：

- voice_id 是免费、VIP 还是 SVIP 语音 - 指定说话人是否有多个情绪变体 - 请求是否需要精选语音购买或自定义语音授权

4. 找到可能的 voice_id 后，仍让运行时在合成时验证访问权限，因为账户权限可能因密钥而异。

实用规则：

- 本地 --list-voices 是运行时的首选目录。
https://senseaudio.cn/docs/voiceapi 是官方语音名称、voiceid 和套餐等级说明的权威参考。

AudioClaw 规则

当此技能在 AudioClaw 中用于飞书或 Lark 语音回复时：

1. 运行 scripts/picoclawvoicereply.py
让包装器将生成的 .ogg/.opus 文件上传到飞书，并以 msgtype=audio 发送
除非显式传递了 --skip-direct-send，否则不调用该音频的 sendfile 工具
不使用本地路径或 MEDIA:... 引用调用 message 工具。AudioClaw 会将其作为纯文本发送
音频发送后，优先不发送额外文本确认
如果宿主运行时仍需要一条最终助手消息来完成本轮，发送一句简短的自然中文，如我已经用语音回复你了。，而不是让本轮为空
仅将 media_reference 用作调试元数据或未来 AudioClaw 兼容性数据

此规则很重要，因为此 AudioClaw 环境不会将 MEDIA:... 渲染为媒体，而通用的 sendfile 工具会将飞书语音笔记作为普通文件而非 audio 消息发送。这里的可靠路径是直接飞书上传加 msgtype=audio。

运行时模型

官方公开 TTS API 暴露：

- voicesetting.voiceid
voicesetting.speed
voicesetting.vol
voicesetting.pitch
audiosetting.format
audiosetting.samplerate
一个 HTTP 端点，两种模式：

- 非流式，stream=false
- SSE，stream=true

重要约束：

- 公开 TTS API 文档未暴露独立的 emotion 请求字段
因此情绪切换通过选择匹配的 voiceid（如存在）或保持语音并调整 speed / pitch / vol 来处理
此技能默认以非流式模式请求最终文件 TTS，因为 AudioClaw 只需要完成的文件，这避免了流式组装边缘情况
对于此服务端 HTTP TTS 路径，官方文档仍使用 Authorization: Bearer APIKEY。此技能不需要生成的 Public Key
如果请求的 voiceid 看起来像克隆 ID（如 vc-...），此技能现在会自动将 TTS 路由到 SenseAudio-TTS-1.5，并在清单中记录 audio.modelused

API 密钥查找

此技能现在再次将 SENSEAUDIOAPIKEY 视为默认 API 密钥来源。

运行时规则：

- 如果宿主应用注入的 SENSEAUDIOAPIKEY 是 AudioClaw 登录令牌（如 v2.public...），共享引导程序会在 TTS 开始前将其替换为 ~/.audioclaw/workspace/state/senseaudiocredentials.json 中的真实 sk-... 值
--api-key-env 仍然有效，但默认运行时路径是 SENSEAUDIOAPI_KEY

如果需要在多种情绪下使用完全相同的说话人音色，请使用购买的多变体语音系列或授权的自定义语音。否则此技能将使用最佳可用语音和调音来近似请求的情绪。

请求契约

最小 JSON 请求：

json
{
text: 我们已经收到你的需求，今天下午会把结果发给你。,
scene: assistant,
emotion: calm
}

完整请求：

json
{
text: 新品今晚八点开售，现在下单还有首发赠品。,
scene: sales,
voiceid: male0027_b,
voicefamily: male0027,
emotion: promo,
speed: 1.08,
pitch: 1,
volume: 1.05,
audio_format: mp3,
sample_rate: 32000,
preference_key:

audioclaw-skills-voice-reply语音回复技能