返回顶部
a

azure-ai-voicelive-pyAzure语音AI实时构建

Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, a

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 0.1.0
安全检测
已通过
2,085
下载量
免费
免费
2
收藏
概述
安装方式
版本历史

azure-ai-voicelive-py

Azure AI 语音实时 SDK

通过双向WebSocket通信构建实时语音AI应用程序。

安装

bash
pip install azure-ai-voicelive aiohttp azure-identity

环境变量

bash
AZURECOGNITIVESERVICES_ENDPOINT=https://<区域>.api.cognitive.microsoft.com

用于API密钥认证(生产环境不推荐)


AZURECOGNITIVESERVICES_KEY=

身份认证

DefaultAzureCredential(推荐)
python
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential

async with connect(
endpoint=os.environ[AZURECOGNITIVESERVICES_ENDPOINT],
credential=DefaultAzureCredential(),
model=gpt-4o-realtime-preview,
credential_scopes=[https://cognitiveservices.azure.com/.default]
) as conn:
...

API密钥
python
from azure.ai.voicelive.aio import connect
from azure.core.credentials import AzureKeyCredential

async with connect(
endpoint=os.environ[AZURECOGNITIVESERVICES_ENDPOINT],
credential=AzureKeyCredential(os.environ[AZURECOGNITIVESERVICES_KEY]),
model=gpt-4o-realtime-preview
) as conn:
...

快速开始

python
import asyncio
import os
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential

async def main():
async with connect(
endpoint=os.environ[AZURECOGNITIVESERVICES_ENDPOINT],
credential=DefaultAzureCredential(),
model=gpt-4o-realtime-preview,
credential_scopes=[https://cognitiveservices.azure.com/.default]
) as conn:
# 更新会话指令
await conn.session.update(session={
instructions: 你是一个有用的助手。,
modalities: [text, audio],
voice: alloy
})

# 监听事件
async for event in conn:
print(f事件:{event.type})
if event.type == response.audio_transcript.done:
print(f转录:{event.transcript})
elif event.type == response.done:
break

asyncio.run(main())

核心架构

连接资源

VoiceLiveConnection 公开以下资源:

资源用途关键方法
conn.session会话配置update(session=...)
conn.response
模型响应 | create(), cancel() |
| conn.inputaudiobuffer | 音频输入 | append(), commit(), clear() |
| conn.outputaudiobuffer | 音频输出 | clear() |
| conn.conversation | 对话状态 | item.create(), item.delete(), item.truncate() |
| conn.transcription_session | 转录配置 | update(session=...) |

会话配置

python
from azure.ai.voicelive.models import RequestSession, FunctionTool

await conn.session.update(session=RequestSession(
instructions=你是一个有用的语音助手。,
modalities=[text, audio],
voice=alloy, # 或 echo, shimmer, sage 等
inputaudioformat=pcm16,
outputaudioformat=pcm16,
turn_detection={
type: server_vad,
threshold: 0.5,
prefixpaddingms: 300,
silencedurationms: 500
},
tools=[
FunctionTool(
type=function,
name=get_weather,
description=获取当前天气,
parameters={
type: object,
properties: {
location: {type: string}
},
required: [location]
}
)
]
))

音频流

发送音频(Base64 PCM16)

python
import base64

读取音频块(16位PCM,24kHz单声道)

audiochunk = await readaudiofrommicrophone() b64audio = base64.b64encode(audiochunk).decode()

await conn.inputaudiobuffer.append(audio=b64_audio)

接收音频

python
async for event in conn:
if event.type == response.audio.delta:
audio_bytes = base64.b64decode(event.delta)
await playaudio(audiobytes)
elif event.type == response.audio.done:
print(音频完成)

事件处理

python
async for event in conn:
match event.type:
# 会话事件
case session.created:
print(f会话:{event.session})
case session.updated:
print(会话已更新)

# 音频输入事件
case inputaudiobuffer.speech_started:
print(f语音在{event.audiostartms}毫秒开始)
case inputaudiobuffer.speech_stopped:
print(f语音在{event.audioendms}毫秒停止)

# 转录事件
case conversation.item.inputaudiotranscription.completed:
print(f用户说:{event.transcript})
case conversation.item.inputaudiotranscription.delta:
print(f部分:{event.delta})

# 响应事件
case response.created:
print(f响应开始:{event.response.id})
case response.audio_transcript.delta:
print(event.delta, end=, flush=True)
case response.audio.delta:
audio = base64.b64decode(event.delta)
case response.done:
print(f响应完成:{event.response.status})

# 函数调用
case response.functioncallarguments.done:
result = handle_function(event.name, event.arguments)
await conn.conversation.item.create(item={
type: functioncalloutput,
callid: event.callid,
output: json.dumps(result)
})
await conn.response.create()

# 错误
case error:
print(f错误:{event.error.message})

常见模式

手动轮次模式(无VAD)

python
await conn.session.update(session={turn_detection: None})

手动控制轮次

await conn.inputaudiobuffer.append(audio=b64_audio) await conn.inputaudiobuffer.commit() # 用户轮次结束 await conn.response.create() # 触发响应

中断处理

python
async for event in conn:
if event.type == inputaudiobuffer.speech_started:
# 用户中断 - 取消当前响应
await conn.response.cancel()
await conn.outputaudiobuffer.clear()

对话历史

python

添加系统消息


await conn.conversation.item.create(item={
type: message,
role: system,
content: [{type: input_text, text: 请简洁回答。}]
})

添加用户消息

await conn.conversation.item.create(item={ type: message, role: user, content: [{type: input_text, text: 你好!}] })

await conn.response.create()

语音选项

语音描述
alloy中性,平衡
echo
温暖,对话式 | | shimmer | 清晰,专业 | | sage | 冷静,权威 | | coral | 友好,积极 | | ash | 深沉,稳重 | | ballad | 富有表现力 | | verse | 叙事风格 |

Azure语音:使用 AzureStandardVoice、AzureCustomVoice 或 AzurePersonalVoice 模型。

音频格式

格式采样率使用场景
pcm1624kHz默认,高质量
pcm16-8000hz
8kHz | 电话通信 | | pcm16-16000hz | 16kHz | 语音助手 | | g711_ulaw | 8kHz | 电话通信(美国) | | g711_alaw | 8kHz | 电话通信(欧洲) |

轮次检测选项

python

服务器端VAD(默认)


{type: servervad, threshold: 0.5, silenceduration_ms: 500}

Azure语义VAD(更智能的检测)

{

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 azure-ai-voicelive-py-1776376203 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 azure-ai-voicelive-py-1776376203 技能

通过命令行安装

skillhub install azure-ai-voicelive-py-1776376203

下载

⬇ 下载 azure-ai-voicelive-py v0.1.0(免费)

文件大小: 13.51 KB | 发布时间: 2026-4-17 14:29

v0.1.0 最新 2026-4-17 14:29
Initial release – enables building real-time voice AI apps using Azure AI Voice Live SDK for Python.

- Real-time bidirectional WebSocket audio streaming with Azure AI models.
- Supports Server VAD, turn-based conversation, function calls, tools, transcription, and avatar integration.
- Easy authentication via `DefaultAzureCredential` or API key.
- Provides structured resources for session, response, audio buffers, and conversation state.
- Includes example snippets for session config, event handling, audio streaming, and interrupt management.
- Supports voice selection, multiple audio formats, and manual or VAD turn-taking.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部