AudioClaw Skills Voice Intake
When to use
Use this skill when the user sends a voice message and AudioClaw should understand the content before replying.
Common triggers:
- - A Feishu or chat bot receives an audio message instead of text.
- AudioClaw needs a transcript plus a clean user message payload.
- The workflow wants richer ASR features such as timestamps, sentiment, or speaker separation.
- The team wants one stable AudioClaw intake entrypoint instead of hand-written ASR requests.
- The channel stores inbound voice files as
.ogg or .opus, and AudioClaw still needs one stable ASR path.
Do not use this skill for speech output. Use $audioclaw-skills-voice-reply for TTS.
Workflow
- 1. Save the incoming audio file locally.
- Run
scripts/openclaw_voice_intake.py with the audio path. - Let the script choose the best model when no model is forced:
-
sense-asr-deepthink for normal single-speaker voice understanding
-
sense-asr when a language hint is provided
-
sense-asr-pro when timestamps, sentiment, speaker diarization, or punctuation are requested
-
sense-asr-lite when hotwords are requested
- 4. Use the JSON manifest it returns as the AudioClaw handoff:
-
transcript.normalized_text
-
openclaw.turn_payload
-
routing.selected_model
- 5. If
understanding.clarification_needed is true, ask the user to repeat or resend the audio.
Runtime model
Official HTTP ASR API:
- - Endpoint: INLINECODE13
- Content type: INLINECODE14
- File size limit: INLINECODE15
- Practical local input suffixes accepted by this skill:
.wav, .mp3, .ogg, .opus, .flac, .aac, .m4a, INLINECODE23
Supported response goals:
- - plain transcript
- richer raw response passthrough
- AudioClaw-ready turn payload
The skill keeps two layers separate:
- - ASR output from AudioClaw ASR
- AudioClaw packaging and clarification heuristics
API key lookup
This skill now treats SENSEAUDIO_API_KEY as the default API key source again.
Runtime rules:
- - If the host app injects
SENSEAUDIO_API_KEY as an AudioClaw login token such as v2.public..., the shared bootstrap will replace it with the real sk-... value from ~/.audioclaw/workspace/state/senseaudio_credentials.json before ASR starts. - INLINECODE29 still works, but the default runtime path is
SENSEAUDIO_API_KEY.
Commands
Basic voice intake:
CODEBLOCK0
Voice intake with richer AudioClaw structure:
CODEBLOCK1
Force a specific model:
CODEBLOCK2
AudioClaw integration pattern
Recommended handoff:
- 1. Channel adapter stores the inbound audio.
- AudioClaw calls
scripts/openclaw_voice_intake.py. - AudioClaw reads:
-
openclaw.turn_payload.role
-
openclaw.turn_payload.content
-
openclaw.turn_payload.metadata
- 4. The normal dialogue pipeline continues as if the user typed the recognized text.
Operational rules:
- - Keep the original audio path in metadata for debugging.
- Pass
language only when you are confident; otherwise let ASR auto-detect. - If you request timestamps, sentiment, or diarization, let the script choose
sense-asr-pro. - If transcript is empty, do not hallucinate a user intent. Ask for clarification.
Resources
- Multipart HTTP client for AudioClaw ASR
- Handles model routing validation and JSON or text responses
- Main runtime for AudioClaw
- Builds transcript, normalized user text, and turn payload
- Official ASR docs summary, model support notes, and AudioClaw payload examples
AudioClaw 技能语音输入
使用时机
当用户发送语音消息且 AudioClaw 需要在回复前理解内容时使用此技能。
常见触发场景:
- - 飞书或聊天机器人收到语音消息而非文本
- AudioClaw 需要转录文本及干净的用户消息负载
- 工作流需要更丰富的 ASR 功能,如时间戳、情感或说话人分离
- 团队希望使用稳定的 AudioClaw 语音输入入口,而非手写 ASR 请求
- 频道将传入的语音文件存储为 .ogg 或 .opus 格式,AudioClaw 仍需要稳定的 ASR 路径
请勿将此技能用于语音输出。TTS 请使用 $audioclaw-skills-voice-reply。
工作流程
- 1. 将传入的音频文件保存到本地
- 使用音频路径运行 scripts/openclawvoiceintake.py
- 当未强制指定模型时,让脚本选择最佳模型:
- sense-asr-deepthink:用于普通单人语音理解
- sense-asr:当提供语言提示时使用
- sense-asr-pro:当需要时间戳、情感、说话人分离或标点符号时使用
- sense-asr-lite:当需要热词时使用
- 4. 使用脚本返回的 JSON 清单作为 AudioClaw 交接数据:
- transcript.normalized_text
- openclaw.turn_payload
- routing.selected_model
- 5. 如果 understanding.clarification_needed 为 true,请让用户重复或重新发送音频
运行时模型
官方 HTTP ASR API:
- - 端点:https://api.senseaudio.cn/v1/audio/transcriptions
- 内容类型:multipart/form-data
- 文件大小限制:<=10MB
- 此技能支持的实际本地输入后缀:.wav、.mp3、.ogg、.opus、.flac、.aac、.m4a、.mp4
支持的响应目标:
- - 纯文本转录
- 更丰富的原始响应透传
- AudioClaw 就绪的对话负载
此技能保持两个层分离:
- - AudioClaw ASR 的 ASR 输出
- AudioClaw 打包和澄清启发式逻辑
API 密钥查找
此技能现在再次将 SENSEAUDIOAPIKEY 视为默认 API 密钥来源。
运行时规则:
- - 如果宿主应用注入的 SENSEAUDIOAPIKEY 是 AudioClaw 登录令牌(如 v2.public...),共享引导程序将在 ASR 开始前将其替换为 ~/.audioclaw/workspace/state/senseaudiocredentials.json 中的真实 sk-... 值
- --api-key-env 仍然可用,但默认运行时路径为 SENSEAUDIOAPI_KEY
命令
基础语音输入:
bash
python3 scripts/openclawvoiceintake.py \
--input /path/to/user_audio.mp3
带更丰富 AudioClaw 结构的语音输入:
bash
python3 scripts/openclawvoiceintake.py \
--input /path/to/meeting_clip.m4a \
--enable-punctuation \
--timestamp-granularity segment \
--enable-sentiment \
--out-json /tmp/openclawvoiceturn.json
强制指定特定模型:
bash
python3 scripts/openclawvoiceintake.py \
--input /path/to/user_audio.mp3 \
--model sense-asr-deepthink
AudioClaw 集成模式
推荐交接流程:
- 1. 频道适配器存储传入的音频
- AudioClaw 调用 scripts/openclawvoiceintake.py
- AudioClaw 读取:
- openclaw.turn_payload.role
- openclaw.turn_payload.content
- openclaw.turn_payload.metadata
- 4. 正常对话流程继续,如同用户键入了识别出的文本
操作规则:
- - 在元数据中保留原始音频路径以便调试
- 仅在确定时传递 language;否则让 ASR 自动检测
- 如果需要时间戳、情感或说话人分离,让脚本选择 sense-asr-pro
- 如果转录文本为空,不要臆测用户意图。请求澄清
资源
- - scripts/senseaudioasrclient.py
- AudioClaw ASR 的多部分 HTTP 客户端
- 处理模型路由验证和 JSON 或文本响应
- - scripts/openclawvoiceintake.py
- AudioClaw 的主要运行时
- 构建转录文本、规范化用户文本和对话负载
- - references/openclawvoiceintake.md
- 官方 ASR 文档摘要、模型支持说明和 AudioClaw 负载示例