Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, a
通过双向WebSocket通信构建实时语音AI应用程序。
bash
pip install azure-ai-voicelive aiohttp azure-identity
bash
AZURECOGNITIVESERVICES_ENDPOINT=https://<区域>.api.cognitive.microsoft.com
DefaultAzureCredential(推荐):
python
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential
async with connect(
endpoint=os.environ[AZURECOGNITIVESERVICES_ENDPOINT],
credential=DefaultAzureCredential(),
model=gpt-4o-realtime-preview,
credential_scopes=[https://cognitiveservices.azure.com/.default]
) as conn:
...
API密钥:
python
from azure.ai.voicelive.aio import connect
from azure.core.credentials import AzureKeyCredential
async with connect(
endpoint=os.environ[AZURECOGNITIVESERVICES_ENDPOINT],
credential=AzureKeyCredential(os.environ[AZURECOGNITIVESERVICES_KEY]),
model=gpt-4o-realtime-preview
) as conn:
...
python
import asyncio
import os
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential
async def main():
async with connect(
endpoint=os.environ[AZURECOGNITIVESERVICES_ENDPOINT],
credential=DefaultAzureCredential(),
model=gpt-4o-realtime-preview,
credential_scopes=[https://cognitiveservices.azure.com/.default]
) as conn:
# 更新会话指令
await conn.session.update(session={
instructions: 你是一个有用的助手。,
modalities: [text, audio],
voice: alloy
})
# 监听事件
async for event in conn:
print(f事件:{event.type})
if event.type == response.audio_transcript.done:
print(f转录:{event.transcript})
elif event.type == response.done:
break
asyncio.run(main())
VoiceLiveConnection 公开以下资源:
| 资源 | 用途 | 关键方法 |
|---|---|---|
| conn.session | 会话配置 | update(session=...) |
| conn.response |
python
from azure.ai.voicelive.models import RequestSession, FunctionTool
await conn.session.update(session=RequestSession(
instructions=你是一个有用的语音助手。,
modalities=[text, audio],
voice=alloy, # 或 echo, shimmer, sage 等
inputaudioformat=pcm16,
outputaudioformat=pcm16,
turn_detection={
type: server_vad,
threshold: 0.5,
prefixpaddingms: 300,
silencedurationms: 500
},
tools=[
FunctionTool(
type=function,
name=get_weather,
description=获取当前天气,
parameters={
type: object,
properties: {
location: {type: string}
},
required: [location]
}
)
]
))
python
import base64
await conn.inputaudiobuffer.append(audio=b64_audio)
python
async for event in conn:
if event.type == response.audio.delta:
audio_bytes = base64.b64decode(event.delta)
await playaudio(audiobytes)
elif event.type == response.audio.done:
print(音频完成)
python
async for event in conn:
match event.type:
# 会话事件
case session.created:
print(f会话:{event.session})
case session.updated:
print(会话已更新)
# 音频输入事件
case inputaudiobuffer.speech_started:
print(f语音在{event.audiostartms}毫秒开始)
case inputaudiobuffer.speech_stopped:
print(f语音在{event.audioendms}毫秒停止)
# 转录事件
case conversation.item.inputaudiotranscription.completed:
print(f用户说:{event.transcript})
case conversation.item.inputaudiotranscription.delta:
print(f部分:{event.delta})
# 响应事件
case response.created:
print(f响应开始:{event.response.id})
case response.audio_transcript.delta:
print(event.delta, end=, flush=True)
case response.audio.delta:
audio = base64.b64decode(event.delta)
case response.done:
print(f响应完成:{event.response.status})
# 函数调用
case response.functioncallarguments.done:
result = handle_function(event.name, event.arguments)
await conn.conversation.item.create(item={
type: functioncalloutput,
callid: event.callid,
output: json.dumps(result)
})
await conn.response.create()
# 错误
case error:
print(f错误:{event.error.message})
python
await conn.session.update(session={turn_detection: None})
python
async for event in conn:
if event.type == inputaudiobuffer.speech_started:
# 用户中断 - 取消当前响应
await conn.response.cancel()
await conn.outputaudiobuffer.clear()
python
await conn.response.create()
| 语音 | 描述 |
|---|---|
| alloy | 中性,平衡 |
| echo |
Azure语音:使用 AzureStandardVoice、AzureCustomVoice 或 AzurePersonalVoice 模型。
| 格式 | 采样率 | 使用场景 |
|---|---|---|
| pcm16 | 24kHz | 默认,高质量 |
| pcm16-8000hz |
python
该技能支持在以下平台通过对话安装:
帮我安装 SkillHub 和 azure-ai-voicelive-py-1776376203 技能
设置 SkillHub 为我的优先技能安装源,然后帮我安装 azure-ai-voicelive-py-1776376203 技能
skillhub install azure-ai-voicelive-py-1776376203
文件大小: 13.51 KB | 发布时间: 2026-4-17 14:29