Discord Voice Plugin for Clawdbot

Real-time voice conversations in Discord voice channels. Join a voice channel, speak, and have your words transcribed, processed by Claude, and spoken back.

Features

- Join/Leave Voice Channels: Via slash commands, CLI, or agent tool
Voice Activity Detection (VAD): Automatically detects when users are speaking
Speech-to-Text: Whisper API (OpenAI), Deepgram, or Local Whisper (Offline)
Streaming STT: Real-time transcription with Deepgram WebSocket (~1s latency reduction)
Agent Integration: Transcribed speech is routed through the Clawdbot agent
Text-to-Speech: OpenAI TTS, ElevenLabs, or Kokoro (Local/Offline)
Audio Playback: Responses are spoken back in the voice channel
Barge-in Support: Stops speaking immediately when user starts talking
Auto-reconnect: Automatic heartbeat monitoring and reconnection on disconnect

Requirements

- Discord bot with voice permissions (Connect, Speak, Use Voice Activity)
API keys for STT and TTS providers
System dependencies for voice:

- ffmpeg (audio processing) - Native build tools for @discordjs/opus and INLINECODE2

Installation

1. Install System Dependencies

CODEBLOCK0

2. Install via ClawdHub

CODEBLOCK1

Or manually:

CODEBLOCK2

3. Configure in clawdbot.json

CODEBLOCK3

4. Discord Bot Setup

Ensure your Discord bot has these permissions:

- Connect - Join voice channels
Speak - Play audio
Use Voice Activity - Detect when users speak

Add these to your bot's OAuth2 URL or configure in Discord Developer Portal.

Configuration

Option	Type	Default	Description
INLINECODE3	boolean	INLINECODE4	Enable/disable the plugin
INLINECODE5

Provider Configuration

OpenAI (Whisper + TTS)

CODEBLOCK4

ElevenLabs (TTS only)

CODEBLOCK5

Deepgram (STT only)

CODEBLOCK6

Usage

Slash Commands (Discord)

Once registered with Discord, use these commands:

- /discord_voice join <channel> - Join a voice channel
INLINECODE36 - Leave the current voice channel
INLINECODE37 - Show voice connection status

CLI Commands

CODEBLOCK7

Agent Tool

The agent can use the discord_voice tool:

CODEBLOCK8

The tool supports actions:

- join - Join a voice channel (requires channelId)
INLINECODE40 - Leave voice channel
INLINECODE41 - Speak text in the voice channel
INLINECODE42 - Get current voice status

How It Works

1. Join: Bot joins the specified voice channel
Listen: VAD detects when users start/stop speaking
Record: Audio is buffered while user speaks
Transcribe: On silence, audio is sent to STT provider
Process: Transcribed text is sent to Clawdbot agent
Synthesize: Agent response is converted to audio via TTS
Play: Audio is played back in the voice channel

Streaming STT (Deepgram)

When using Deepgram as your STT provider, streaming mode is enabled by default. This provides:

- ~1 second faster end-to-end latency
Real-time feedback with interim transcription results
Automatic keep-alive to prevent connection timeouts
Fallback to batch transcription if streaming fails

To use streaming STT:

CODEBLOCK9

Barge-in Support

When enabled (default), the bot will immediately stop speaking if a user starts talking. This creates a more natural conversational flow where you can interrupt the bot.

To disable (let the bot finish speaking):

CODEBLOCK10

Auto-reconnect

The plugin includes automatic connection health monitoring:

- Heartbeat checks every 30 seconds (configurable)
Auto-reconnect on disconnect with exponential backoff
Max 3 attempts before giving up

If the connection drops, you'll see logs like:

CODEBLOCK11

VAD Sensitivity

- low: Picks up quiet speech, may trigger on background noise
medium: Balanced (recommended)
high: Requires louder, clearer speech

Troubleshooting

"Discord client not available"

Ensure the Discord channel is configured and the bot is connected before using voice.

Opus/Sodium build errors

Install build tools:

CODEBLOCK12

No audio heard

1. Check bot has Connect + Speak permissions
Check bot isn't server muted
Verify TTS API key is valid

Transcription not working

1. Check STT API key is valid
Check audio is being recorded (see debug logs)
Try adjusting VAD sensitivity

Enable debug logging

CODEBLOCK13

Environment Variables

Variable	Description
INLINECODE43	Discord bot token (required)
INLINECODE44

Limitations

- Only one voice channel per guild at a time
Maximum recording length: 30 seconds (configurable)
Requires stable network for real-time audio
TTS output may have slight delay due to synthesis

License

MIT

Clawdbot 的 Discord 语音插件

在 Discord 语音频道中进行实时语音对话。加入语音频道、说话，你的话语将被转录、由 Claude 处理，并以语音形式回复。

功能特性

- 加入/离开语音频道：通过斜杠命令、CLI 或代理工具实现
语音活动检测 (VAD)：自动检测用户何时说话
语音转文字：支持 Whisper API（OpenAI）、Deepgram 或本地 Whisper（离线）
流式 STT：通过 Deepgram WebSocket 实现实时转录（延迟降低约 1 秒）
代理集成：转录的语音通过 Clawdbot 代理进行处理
文字转语音：支持 OpenAI TTS、ElevenLabs 或 Kokoro（本地/离线）
音频播放：在语音频道中以语音形式回复
打断支持：用户开始说话时立即停止播放
自动重连：自动心跳监测，断开时自动重连

系统要求

- 具有语音权限的 Discord 机器人（连接、说话、使用语音活动）
STT 和 TTS 提供商的 API 密钥
语音相关的系统依赖：

- ffmpeg（音频处理） - @discordjs/opus 和 sodium-native 的原生构建工具

安装指南

1. 安装系统依赖

bash

Ubuntu/Debian

sudo apt-get install ffmpeg build-essential python3

Fedora/RHEL

sudo dnf install ffmpeg gcc-c++ make python3

macOS

brew install ffmpeg

2. 通过 ClawdHub 安装

bash
clawdhub install discord-voice

或手动安装：

bash
cd ~/.clawdbot/extensions
git clone discord-voice
cd discord-voice
npm install

3. 在 clawdbot.json 中配置

json5
{
plugins: {
entries: {
discord-voice: {
enabled: true,
config: {
sttProvider: local-whisper,
ttsProvider: openai,
ttsVoice: nova,
vadSensitivity: medium,
allowedUsers: [], // 空数组表示允许所有用户
silenceThresholdMs: 1500,
maxRecordingMs: 30000,
openai: {
apiKey: sk-..., // 或使用 OPENAIAPIKEY 环境变量
},
},
},
},
},
}

4. Discord 机器人设置

确保你的 Discord 机器人拥有以下权限：

- 连接 - 加入语音频道
说话 - 播放音频
使用语音活动 - 检测用户说话

将这些权限添加到机器人的 OAuth2 URL 中，或在 Discord 开发者门户中进行配置。

配置选项

选项	类型	默认值	描述
enabled	boolean	true	启用/禁用插件
sttProvider

提供商配置

OpenAI（Whisper + TTS）

json5
{
openai: {
apiKey: sk-...,
whisperModel: whisper-1,
ttsModel: tts-1,
},
}

ElevenLabs（仅 TTS）

json5
{
elevenlabs: {
apiKey: ...,
voiceId: 21m00Tcm4TlvDq8ikWAM, // Rachel
modelId: elevenmultilingualv2,
},
}

Deepgram（仅 STT）

json5
{
deepgram: {
apiKey: ...,
model: nova-2,
},
}

使用方法

斜杠命令（Discord）

在 Discord 中注册后，使用以下命令：

- /discordvoice join - 加入语音频道
/discordvoice leave - 离开当前语音频道
/discord_voice status - 显示语音连接状态

CLI 命令

bash

加入语音频道

clawdbot discord_voice join

离开语音频道

clawdbot discord_voice leave --guild

检查状态

clawdbot discord_voice status

代理工具

代理可以使用 discord_voice 工具：

加入语音频道 1234567890

该工具支持以下操作：

- join - 加入语音频道（需要 channelId）
leave - 离开语音频道
speak - 在语音频道中说话
status - 获取当前语音状态

工作原理

1. 加入：机器人加入指定的语音频道
监听：VAD 检测用户何时开始/停止说话
录制：用户说话时缓冲音频
转录：检测到静音后，音频发送至 STT 提供商
处理：转录文本发送至 Clawdbot 代理
合成：代理响应通过 TTS 转换为音频
播放：在语音频道中播放音频

流式 STT（Deepgram）

使用 Deepgram 作为 STT 提供商时，默认启用流式模式。这提供了：

- 端到端延迟降低约 1 秒
实时反馈，包含中间转录结果
自动保活，防止连接超时
回退机制，流式传输失败时使用批量转录

使用流式 STT：

json5
{
sttProvider: deepgram,
streamingSTT: true, // 默认值
deepgram: {
apiKey: ...,
model: nova-2,
},
}

打断支持

启用时（默认），如果用户开始说话，机器人会立即停止播放。这创造了更自然的对话流程，允许你打断机器人。

禁用（让机器人说完）：

json5
{
bargeIn: false,
}

自动重连

插件包含自动连接健康监测：

- 每 30 秒心跳检查（可配置）
断开时自动重连，采用指数退避策略
最多 3 次尝试，之后放弃

如果连接断开，你会看到类似日志：

[discord-voice] 与语音频道断开连接
[discord-voice] 重连尝试 1/3
[discord-voice] 重连成功

VAD 灵敏度

- low：捕捉轻声说话，可能触发背景噪音
medium：平衡（推荐）
high：需要更响亮、更清晰的语音

故障排除

Discord 客户端不可用

确保 Discord 频道已配置，且机器人在使用语音前已连接。

Opus/Sodium 构建错误

安装构建工具：

bash
npm install -g node-gyp
npm rebuild @discordjs/opus sodium-native

听不到音频

1. 检查机器人是否拥有连接 + 说话权限
检查机器人是否被服务器静音
验证 TTS API 密钥是否有效

转录不工作

1. 检查 STT API

discord-voiceDiscord语音

discord-voice

Discord Voice Plugin for Clawdbot

Features

Requirements

Installation

1. Install System Dependencies

2. Install via ClawdHub

3. Configure in clawdbot.json

4. Discord Bot Setup

Configuration

Provider Configuration

OpenAI (Whisper + TTS)

ElevenLabs (TTS only)

Deepgram (STT only)

Usage

Slash Commands (Discord)

CLI Commands

Agent Tool

How It Works

Streaming STT (Deepgram)

Barge-in Support

Auto-reconnect

VAD Sensitivity

Troubleshooting

"Discord client not available"

Opus/Sodium build errors

No audio heard

Transcription not working

Enable debug logging

Environment Variables

Limitations

License

Clawdbot 的 Discord 语音插件

功能特性

系统要求

安装指南

1. 安装系统依赖

Ubuntu/Debian

Fedora/RHEL

macOS

2. 通过 ClawdHub 安装

3. 在 clawdbot.json 中配置

4. Discord 机器人设置

配置选项

提供商配置

OpenAI（Whisper + TTS）

ElevenLabs（仅 TTS）

Deepgram（仅 STT）

使用方法

斜杠命令（Discord）

CLI 命令

加入语音频道

离开语音频道

检查状态

代理工具

工作原理

流式 STT（Deepgram）

打断支持

自动重连

VAD 灵敏度

故障排除

Discord 客户端不可用

Opus/Sodium 构建错误

听不到音频

转录不工作

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement