Feishu Voice (TTS + STT)
Send voice messages and transcribe received voice messages on Feishu using ElevenLabs.
Prerequisites
- -
sag CLI (ElevenLabs TTS): npm i -g sag or INLINECODE2 - INLINECODE3 /
ffprobe: INLINECODE5 - ElevenLabs paid plan (required for library voices)
- Feishu app with
im:message:send_as_bot and im:file permissions
Environment Variables
| Variable | Required | Description |
|---|
| INLINECODE8 | ✅ | ElevenLabs API key |
| INLINECODE9 |
✅ (TTS) | Feishu app ID |
|
FEISHU_APP_SECRET | ✅ (TTS) | Feishu app secret |
|
ELEVENLABS_VOICE_ID | ✅ (TTS) | Voice ID (browse at elevenlabs.io/voice-library) |
|
ELEVENLABS_MODEL_ID | ✅ (TTS) | Model ID (e.g.
eleven_multilingual_v2,
eleven_v3) |
|
ELEVENLABS_SPEED | ❌ | Speech speed 0.5-2.0 (default: 1.0) |
If FEISHU_APP_ID / FEISHU_APP_SECRET are not in env, extract from openclaw config:
CODEBLOCK0
Voice Selection
See config/voice-config.example.json for a curated voice list. Browse all voices at https://elevenlabs.io/voice-library or run sag voices.
Recommended models:
- -
eleven_multilingual_v2 — best for Chinese and multilingual content - INLINECODE21 — latest English-optimized model
Sending Voice Messages (TTS)
CODEBLOCK1
- -
receive_id: target user open_id or INLINECODE24 - INLINECODE25 :
open_id (default) or INLINECODE27 - INLINECODE28 : speech speed multiplier, 0.5-2.0 (default: 1.0)
Receiving Voice Messages (STT)
When OpenClaw delivers a Feishu voice message, it arrives as a media attachment (.ogg file). Transcribe with:
CODEBLOCK2
Returns recognized text to stdout. Uses ElevenLabs scribe_v1 model with automatic language detection.
Fallback: Download via Feishu API
If the audio file is not delivered as an attachment (only file_key available):
- 1. List recent messages: INLINECODE32
- Download audio: INLINECODE33
- Run STT script on the downloaded file
Smart Reply Mode
When receiving messages, follow this pattern for natural conversation:
- - Voice message received → transcribe with STT → understand → reply with voice (TTS)
- Text message received → understand → reply with text
- Override: user can request voice/text reply explicitly
Important Notes
- - Feishu
msg_type must be "audio" — not "media" or INLINECODE37 - OpenClaw's
message tool asVoice does not work correctly for Feishu — use this script instead - STT uses ElevenLabs
scribe_v1 model, supports Chinese, English, and 90+ languages - For free ElevenLabs accounts, only premade voices work; library voices require a paid plan
飞书语音(TTS + STT)
使用ElevenLabs在飞书上发送语音消息并转录接收到的语音消息。
前置条件
- - sag CLI(ElevenLabs TTS):npm i -g sag 或 go install
- ffmpeg / ffprobe:brew install ffmpeg
- ElevenLabs付费套餐(使用语音库需要)
- 已授权im:message:sendasbot和im:file权限的飞书应用
环境变量
| 变量 | 必需 | 说明 |
|---|
| ELEVENLABSAPIKEY | ✅ | ElevenLabs API密钥 |
| FEISHUAPPID |
✅(TTS) | 飞书应用ID |
| FEISHU
APPSECRET | ✅(TTS) | 飞书应用密钥 |
| ELEVENLABS
VOICEID | ✅(TTS) | 语音ID(在elevenlabs.io/voice-library浏览) |
| ELEVENLABS
MODELID | ✅(TTS) | 模型ID(例如eleven
multilingualv2、eleven_v3) |
| ELEVENLABS_SPEED | ❌ | 语速 0.5-2.0(默认:1.0) |
如果环境中未设置FEISHUAPPID/FEISHUAPPSECRET,可从openclaw配置中提取:
bash
export FEISHUAPPID=$(python3 -c import json; print(json.load(open($HOME/.openclaw/openclaw.json))[channels][feishu][appId]))
export FEISHUAPPSECRET=$(python3 -c import json; print(json.load(open($HOME/.openclaw/openclaw.json))[channels][feishu][appSecret]))
语音选择
查看config/voice-config.example.json获取精选语音列表。浏览所有语音请访问https://elevenlabs.io/voice-library或运行sag voices。
推荐模型:
- - elevenmultilingualv2 — 最适合中文和多语言内容
- eleven_v3 — 最新的英语优化模型
发送语音消息(TTS)
bash
scripts/feishu-voice-send.sh <文本> <接收者ID> [接收者ID类型] [语速]
- - receiveid:目标用户的openid或群聊的chatid
- receiveidtype:openid(默认)或chat_id
- speed:语速倍数,0.5-2.0(默认:1.0)
接收语音消息(STT)
当OpenClaw传递飞书语音消息时,会以媒体附件(.ogg文件)形式到达。使用以下命令进行转录:
bash
scripts/feishu-voice-stt.sh /path/to/audio.ogg
将识别出的文本输出到标准输出。使用ElevenLabs scribe_v1模型,支持自动语言检测。
备用方案:通过飞书API下载
如果音频文件未作为附件传递(仅有file_key可用):
- 1. 列出最近消息:GET /im/v1/messages?containeridtype=chat&containerid=CHATID&pagesize=5&sorttype=ByCreateTimeDesc
- 下载音频:GET /im/v1/messages/{messageid}/resources/{filekey}?type=file
- 对下载的文件运行STT脚本
智能回复模式
接收消息时,遵循以下模式进行自然对话:
- - 收到语音消息 → 使用STT转录 → 理解 → 使用语音回复(TTS)
- 收到文本消息 → 理解 → 使用文本回复
- 覆盖模式:用户可以明确要求语音或文本回复
重要说明
- - 飞书msgtype必须为audio — 而非media或file
- OpenClaw的message工具的asVoice功能在飞书上无法正常工作 — 请改用此脚本
- STT使用ElevenLabs scribev1模型,支持中文、英文及90多种语言
- 免费版ElevenLabs账户仅支持预制语音;语音库需要付费套餐