SuperCall
Make AI-powered phone calls with custom personas and goals using OpenAI Realtime API + Twilio.
Features
- - Persona Calls: Define a persona, goal, and opening line for autonomous calls
- Full Realtime Mode: GPT-4o powered voice conversations with <~1s latency
- DTMF / IVR Navigation: AI automatically navigates automated phone menus (press 1 for X, enter your account number, etc.) by generating and injecting touch-tone digits into the audio stream
- Provider: Supports Twilio (full realtime) and mock provider for testing
- Streaming Audio: Bidirectional audio via WebSocket for real-time conversations
- Limited Access: Unlike the standard voice_call plugin, the person on the call doesn't have access to gateway agent, reducing attack surfaces.
Credentials
Required
| Credential | Source | Purpose |
|---|
| INLINECODE0 | OpenAI | Powers the realtime voice AI (GPT-4o) |
| INLINECODE1 |
Twilio Console | Twilio account identifier |
|
TWILIO_AUTH_TOKEN |
Twilio Console | Twilio API authentication |
Optional
| Credential | Source | Purpose |
|---|
| INLINECODE3 | ngrok | ngrok tunnel auth (only needed if using ngrok as tunnel provider) |
Credentials can be set via environment variables or in the plugin config (config takes precedence).
Installation
- 1. Install the plugin via npm or copy to your OpenClaw extensions directory
- 2. Enable hooks for call completion callbacks (required):
CODEBLOCK0
Generate a secure token with: INLINECODE4
⚠️ Security: The hooks.token is sensitive — it authenticates internal callbacks. Keep it secret and rotate if compromised.
- 3. Configure the plugin in your openclaw config:
CODEBLOCK1
Important: The hooks.token is required for call completion callbacks. Without it, the agent won't be notified when calls finish.
Tool: supercall
Make phone calls with custom personas:
CODEBLOCK2
Actions
- -
persona_call - Start a new call with a persona - INLINECODE8 - Check call status and transcript
- INLINECODE9 - End an active call
- INLINECODE10 - List active persona calls
DTMF / IVR Navigation
The AI automatically handles automated phone menus (IVR systems) during calls. When it hears prompts like "press 1 for sales", it uses an internal send_dtmf tool to send touch-tone digits through the audio stream. This is fully automatic — no extra configuration or agent intervention is needed.
- - Supported characters:
0-9, *, #, A-D, w (500ms pause) - Example sequences:
1 (press 1), 1234567890# (enter account number + pound), 1w123# (press 1, wait, then enter 123#) - How it works: DTMF tones are generated as ITU-standard dual-frequency pairs, encoded to µ-law (8kHz mono), and injected directly into the Twilio media stream. No external dependencies.
This means persona calls can navigate phone trees end-to-end — e.g., "call the pharmacy, navigate through their menu, and check on my prescription status."
Configuration Options
| Option | Description | Default |
|---|
| INLINECODE20 | Voice provider (twilio/mock) | Required |
| INLINECODE21 |
Caller ID (E.164 format) | Required for real providers |
|
toNumber | Default recipient number | - |
|
twilio.accountSid | Twilio Account SID | TWILIO
ACCOUNTSID env |
|
twilio.authToken | Twilio Auth Token | TWILIO
AUTHTOKEN env |
|
streaming.openaiApiKey | OpenAI API key for realtime | OPENAI
APIKEY env |
|
streaming.silenceDurationMs | VAD silence duration in ms | 800 |
|
streaming.vadThreshold | VAD threshold 0-1 (higher = less sensitive) | 0.5 |
|
streaming.streamPath | WebSocket path for media stream | /voice/stream |
|
tunnel.provider | Tunnel for webhooks (ngrok/tailscale-serve/tailscale-funnel) | none |
|
tunnel.ngrokDomain | Fixed ngrok domain (recommended for production) | - |
|
tunnel.ngrokAuthToken | ngrok auth token | NGROK_AUTHTOKEN env |
Full realtime requires an OpenAI API key.
Requirements
- - Node.js 20+
- Twilio account for full realtime calls (media streams)
- ngrok or Tailscale for webhook tunneling (production)
- OpenAI API key for real-time features
Architecture
This is a fully standalone skill - it does not depend on the built-in voice-call plugin. All voice calling logic is self-contained.
Runtime Behavior and Security
This plugin is not instruction-only. It runs code, spawns processes, opens network listeners, and writes to disk. The following describes exactly what happens at runtime.
Process spawning
When tunnel.provider is set to ngrok, the plugin spawns the ngrok CLI binary via child_process.spawn. When set to tailscale-serve or tailscale-funnel, it spawns the tailscale CLI instead. These processes run for the lifetime of the plugin and are terminated on shutdown. If tunnel.provider is none (or a publicUrl is provided directly), no external processes are spawned.
Network activity
- - Local webhook server: The plugin opens an HTTP server (default
0.0.0.0:3335) to receive Twilio webhook callbacks and WebSocket media streams. - Startup self-test: On startup, the plugin sends an HTTP POST to its own public webhook URL with an
x-supercall-self-test header to verify connectivity. If publicUrl is misconfigured to point at an unintended endpoint, this self-test token could be sent there. Always verify your publicUrl or tunnel configuration before starting. - Outbound API calls: The plugin makes outbound requests to the OpenAI Realtime API (WebSocket) and Twilio REST API during calls.
Webhook verification
- - Twilio calls: Verified using Twilio's X-Twilio-Signature header (HMAC-SHA1).
- Self-test requests: Authenticated using an internal token (
x-supercall-self-test) generated at startup. - ngrok free-tier relaxation: On free-tier ngrok domains (
.ngrok-free.app, .ngrok.io), URL reconstruction may vary due to ngrok's request rewriting; Twilio signature mismatches are logged but allowed through. Paid/custom ngrok domains (.ngrok.app) are verified strictly. This relaxation is limited to free-tier domains only and does not affect Tailscale or direct publicUrl configurations.
Data at rest
Call transcripts are persisted to ~/clawd/supercall-logs. These logs may contain sensitive conversation content. Review and rotate logs periodically.
Best practices
- - Protect your credentials — Twilio and OpenAI keys grant access to paid services
- Verify your public URL — ensure
publicUrl or tunnel config points where you expect before starting - Rotate
hooks.token periodically and if you suspect compromise - Review call logs — transcripts stored on disk may contain sensitive content
SuperCall
使用 OpenAI Realtime API + Twilio,通过自定义角色和目标进行 AI 驱动的电话通话。
功能特性
- - 角色通话:为自主通话定义角色、目标和开场白
- 全实时模式:GPT-4o 驱动的语音对话,延迟 <1 秒
- DTMF / IVR 导航:AI 通过生成并注入按键音到音频流中,自动导航自动电话菜单(按 1 转接 X、输入账号等)
- 提供商:支持 Twilio(全实时)和用于测试的模拟提供商
- 流式音频:通过 WebSocket 实现双向音频实时对话
- 有限访问权限:与标准 voice_call 插件不同,通话中的用户无法访问网关代理,减少了攻击面。
凭证信息
必需
| 凭证 | 来源 | 用途 |
|---|
| OPENAIAPIKEY | OpenAI | 驱动实时语音 AI(GPT-4o) |
| TWILIOACCOUNTSID |
Twilio 控制台 | Twilio 账户标识符 |
| TWILIO
AUTHTOKEN |
Twilio 控制台 | Twilio API 认证 |
可选
| 凭证 | 来源 | 用途 |
|---|
| NGROKAUTHTOKEN | ngrok | ngrok 隧道认证(仅在使用 ngrok 作为隧道提供商时需要) |
凭证可通过环境变量或插件配置设置(配置优先级更高)。
安装步骤
- 1. 通过 npm 安装插件,或复制到您的 OpenClaw 扩展目录
- 2. 启用钩子以实现通话完成回调(必需):
json
{
hooks: {
enabled: true,
token: your-secret-token
}
}
使用以下命令生成安全令牌:openssl rand -hex 24
⚠️ 安全提示:hooks.token 是敏感信息——它用于认证内部回调。请保密,如遭泄露请及时更换。
- 3. 在您的 openclaw 配置中配置插件:
json
{
plugins: {
entries: {
supercall: {
enabled: true,
config: {
provider: twilio,
fromNumber: +15551234567,
twilio: {
accountSid: your-account-sid,
authToken: your-auth-token
},
streaming: {
openaiApiKey: your-openai-key
},
tunnel: {
provider: ngrok,
ngrokDomain: your-domain.ngrok.app
}
}
}
}
}
}
重要提示:hooks.token 是通话完成回调所必需的。没有它,代理将无法在通话结束时收到通知。
工具:supercall
使用自定义角色拨打电话:
supercall(
action: persona_call,
to: +1234567890,
persona: 国王的私人助理,
goal: 确认被叫方下周晚餐的空闲时间,
openingLine: 你好,我是 Michael,Alex 的助理...
)
操作类型
- - personacall - 使用角色发起新通话
- getstatus - 检查通话状态和记录
- endcall - 结束进行中的通话
- listcalls - 列出活跃的角色通话
DTMF / IVR 导航
AI 在通话过程中自动处理自动电话菜单(IVR 系统)。当听到按 1 转接销售部门等提示时,它会使用内部的 send_dtmf 工具通过音频流发送按键音。这是完全自动化的——无需额外配置或代理干预。
- - 支持的字符:0-9、*、#、A-D、w(500ms 暂停)
- 示例序列:1(按 1)、1234567890#(输入账号 + 井号键)、1w123#(按 1,等待,然后输入 123#)
- 工作原理:DTMF 音调生成为 ITU 标准的双频对,编码为 µ-law(8kHz 单声道),直接注入 Twilio 媒体流。无需外部依赖。
这意味着角色通话可以端到端地导航电话树——例如,致电药房,导航其菜单,查询我的处方状态。
配置选项
| 选项 | 描述 | 默认值 |
|---|
| provider | 语音提供商(twilio/mock) | 必需 |
| fromNumber |
主叫号码(E.164 格式) | 真实提供商必需 |
| toNumber | 默认接收方号码 | - |
| twilio.accountSid | Twilio 账户 SID | TWILIO
ACCOUNTSID 环境变量 |
| twilio.authToken | Twilio 认证令牌 | TWILIO
AUTHTOKEN 环境变量 |
| streaming.openaiApiKey | 用于实时的 OpenAI API 密钥 | OPENAI
APIKEY 环境变量 |
| streaming.silenceDurationMs | VAD 静音持续时间(毫秒) | 800 |
| streaming.vadThreshold | VAD 阈值 0-1(越高越不敏感) | 0.5 |
| streaming.streamPath | 媒体流的 WebSocket 路径 | /voice/stream |
| tunnel.provider | Webhook 隧道(ngrok/tailscale-serve/tailscale-funnel) | 无 |
| tunnel.ngrokDomain | 固定 ngrok 域名(建议生产环境使用) | - |
| tunnel.ngrokAuthToken | ngrok 认证令牌 | NGROK_AUTHTOKEN 环境变量 |
全实时模式需要 OpenAI API 密钥。
系统要求
- - Node.js 20+
- 用于全实时通话的 Twilio 账户(媒体流)
- 用于 Webhook 隧道的 ngrok 或 Tailscale(生产环境)
- 用于实时功能的 OpenAI API 密钥
架构设计
这是一个完全独立的技能——不依赖内置的语音通话插件。所有语音通话逻辑均为自包含。
运行时行为与安全
此插件并非仅指令型。它会运行代码、生成进程、打开网络监听器并写入磁盘。以下描述运行时具体行为。
进程生成
当 tunnel.provider 设置为 ngrok 时,插件通过 child_process.spawn 生成 ngrok CLI 二进制文件。当设置为 tailscale-serve 或 tailscale-funnel 时,则生成 tailscale CLI。这些进程在插件生命周期内运行,并在关闭时终止。如果 tunnel.provider 为 none(或直接提供了 publicUrl),则不会生成外部进程。
网络活动
- - 本地 Webhook 服务器:插件打开一个 HTTP 服务器(默认 0.0.0.0:3335)以接收 Twilio Webhook 回调和 WebSocket 媒体流。
- 启动自检:启动时,插件向其自身的公共 Webhook URL 发送带有 x-supercall-self-test 头的 HTTP POST 请求以验证连接。如果 publicUrl 配置错误指向了非预期的端点,此自检令牌可能被发送到那里。启动前务必验证您的 publicUrl 或隧道配置。
- 出站 API 调用:插件在通话期间向 OpenAI Realtime API(WebSocket)和 Twilio REST API 发起出站请求。
Webhook 验证
- - Twilio 调用:使用 Twilio 的 X-Twilio-Signature 头(HMAC-SHA1)进行验证。
- 自检请求:使用启动时生成的内部令牌(x-supercall-self-test)进行认证。
- ngrok 免费版放宽:在免费版 ngrok 域名(.ngrok-free.app、.ngrok.io)上,由于 ngrok 的请求重写,URL 重建可能有所不同;Twilio 签名不匹配会被记录但允许通过。付费/自定义 ngrok 域名(.ngrok.app)会严格验证。此放宽仅限于免费版域名,不影响 Tailscale 或直接 publicUrl 配置。
静态数据
通话记录持久化存储到 ~/clawd/supercall-logs。这些日志可能包含敏感对话内容。请定期审查和轮换日志。
最佳实践
- - 保护您的凭证——Twilio 和 OpenAI 密钥可访问付费服务
- 验证您的公共 URL——启动前确保 publicUrl 或隧道配置指向您预期的位置
- 定期轮换 hooks.token