Azure AI Voice Live SDK

Build real-time voice AI applications with bidirectional WebSocket communication.

Installation

CODEBLOCK0

Environment Variables

CODEBLOCK1

Authentication

DefaultAzureCredential (preferred):
CODEBLOCK2

API Key:
CODEBLOCK3

Quick Start

CODEBLOCK4

Core Architecture

Connection Resources

The VoiceLiveConnection exposes these resources:

Resource	Purpose	Key Methods
INLINECODE1	Session configuration	INLINECODE2
INLINECODE3

Session Configuration

CODEBLOCK5

Audio Streaming

Send Audio (Base64 PCM16)

CODEBLOCK6

Receive Audio

CODEBLOCK7

Event Handling

CODEBLOCK8

Common Patterns

Manual Turn Mode (No VAD)

CODEBLOCK9

Interrupt Handling

CODEBLOCK10

Conversation History

CODEBLOCK11

Voice Options

Voice	Description
INLINECODE18	Neutral, balanced
INLINECODE19

Azure voices: Use AzureStandardVoice, AzureCustomVoice, or AzurePersonalVoice models.

Audio Formats

Format	Sample Rate	Use Case
INLINECODE29	24kHz	Default, high quality
INLINECODE30

Turn Detection Options

CODEBLOCK12

Error Handling

CODEBLOCK13

References

- Detailed API Reference: See references/api-reference.md
Complete Examples: See references/examples.md
All Models & Types: See references/models.md

Azure AI 语音实时 SDK

通过双向WebSocket通信构建实时语音AI应用程序。

安装

bash
pip install azure-ai-voicelive aiohttp azure-identity

环境变量

bash
AZURECOGNITIVESERVICES_ENDPOINT=https://<区域>.api.cognitive.microsoft.com

用于API密钥认证（生产环境不推荐）

AZURECOGNITIVESERVICES_KEY=

身份认证

DefaultAzureCredential（推荐）：
python
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential

async with connect(
endpoint=os.environ[AZURECOGNITIVESERVICES_ENDPOINT],
credential=DefaultAzureCredential(),
model=gpt-4o-realtime-preview,
credential_scopes=[https://cognitiveservices.azure.com/.default]
) as conn:
...

API密钥：
python
from azure.ai.voicelive.aio import connect
from azure.core.credentials import AzureKeyCredential

async with connect(
endpoint=os.environ[AZURECOGNITIVESERVICES_ENDPOINT],
credential=AzureKeyCredential(os.environ[AZURECOGNITIVESERVICES_KEY]),
model=gpt-4o-realtime-preview
) as conn:
...

快速开始

python
import asyncio
import os
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential

async def main():
async with connect(
endpoint=os.environ[AZURECOGNITIVESERVICES_ENDPOINT],
credential=DefaultAzureCredential(),
model=gpt-4o-realtime-preview,
credential_scopes=[https://cognitiveservices.azure.com/.default]
) as conn:
# 更新会话指令
await conn.session.update(session={
instructions: 你是一个有用的助手。,
modalities: [text, audio],
voice: alloy
})

# 监听事件
async for event in conn:
print(f事件：{event.type})
if event.type == response.audio_transcript.done:
print(f转录：{event.transcript})
elif event.type == response.done:
break

asyncio.run(main())

核心架构

连接资源

VoiceLiveConnection 公开以下资源：

资源	用途	关键方法
conn.session	会话配置	update(session=...)
conn.response

会话配置

python
from azure.ai.voicelive.models import RequestSession, FunctionTool

await conn.session.update(session=RequestSession(
instructions=你是一个有用的语音助手。,
modalities=[text, audio],
voice=alloy, # 或 echo, shimmer, sage 等
inputaudioformat=pcm16,
outputaudioformat=pcm16,
turn_detection={
type: server_vad,
threshold: 0.5,
prefixpaddingms: 300,
silencedurationms: 500
},
tools=[
FunctionTool(
type=function,
name=get_weather,
description=获取当前天气,
parameters={
type: object,
properties: {
location: {type: string}
},
required: [location]
}
)
]
))

音频流

发送音频（Base64 PCM16）

python
import base64

读取音频块（16位PCM，24kHz单声道）

audiochunk = await readaudiofrommicrophone() b64audio = base64.b64encode(audiochunk).decode()

await conn.inputaudiobuffer.append(audio=b64_audio)

接收音频

python
async for event in conn:
if event.type == response.audio.delta:
audio_bytes = base64.b64decode(event.delta)
await playaudio(audiobytes)
elif event.type == response.audio.done:
print(音频完成)

事件处理

python
async for event in conn:
match event.type:
# 会话事件
case session.created:
print(f会话：{event.session})
case session.updated:
print(会话已更新)

# 音频输入事件
case inputaudiobuffer.speech_started:
print(f语音在{event.audiostartms}毫秒开始)
case inputaudiobuffer.speech_stopped:
print(f语音在{event.audioendms}毫秒停止)

# 转录事件
case conversation.item.inputaudiotranscription.completed:
print(f用户说：{event.transcript})
case conversation.item.inputaudiotranscription.delta:
print(f部分：{event.delta})

# 响应事件
case response.created:
print(f响应开始：{event.response.id})
case response.audio_transcript.delta:
print(event.delta, end=, flush=True)
case response.audio.delta:
audio = base64.b64decode(event.delta)
case response.done:
print(f响应完成：{event.response.status})

# 函数调用
case response.functioncallarguments.done:
result = handle_function(event.name, event.arguments)
await conn.conversation.item.create(item={
type: functioncalloutput,
callid: event.callid,
output: json.dumps(result)
})
await conn.response.create()

# 错误
case error:
print(f错误：{event.error.message})

常见模式

手动轮次模式（无VAD）

python
await conn.session.update(session={turn_detection: None})

手动控制轮次

await conn.inputaudiobuffer.append(audio=b64_audio) await conn.inputaudiobuffer.commit() # 用户轮次结束 await conn.response.create() # 触发响应

中断处理

python
async for event in conn:
if event.type == inputaudiobuffer.speech_started:
# 用户中断 - 取消当前响应
await conn.response.cancel()
await conn.outputaudiobuffer.clear()

对话历史

python

添加系统消息

await conn.conversation.item.create(item={
type: message,
role: system,
content: [{type: input_text, text: 请简洁回答。}]
})

添加用户消息

await conn.conversation.item.create(item={ type: message, role: user, content: [{type: input_text, text: 你好！}] })

await conn.response.create()

语音选项

语音	描述
alloy	中性，平衡
echo

Azure语音：使用 AzureStandardVoice、AzureCustomVoice 或 AzurePersonalVoice 模型。

音频格式

格式	采样率	使用场景
pcm16	24kHz	默认，高质量
pcm16-8000hz

轮次检测选项

python

服务器端VAD（默认）

{type: servervad, threshold: 0.5, silenceduration_ms: 500}

Azure语义VAD（更智能的检测）

{

azure-ai-voicelive-pyAzure语音AI实时构建

azure-ai-voicelive-py

Azure AI Voice Live SDK

Installation

Environment Variables

Authentication

Quick Start

Core Architecture

Connection Resources

Session Configuration

Audio Streaming

Send Audio (Base64 PCM16)

Receive Audio

Event Handling

Common Patterns

Manual Turn Mode (No VAD)

Interrupt Handling

Conversation History

Voice Options

Audio Formats

Turn Detection Options

Error Handling

References

Azure AI 语音实时 SDK

安装

环境变量

用于API密钥认证（生产环境不推荐）

身份认证

快速开始

核心架构

连接资源

会话配置

音频流

发送音频（Base64 PCM16）

读取音频块（16位PCM，24kHz单声道）

接收音频

事件处理

常见模式

手动轮次模式（无VAD）

手动控制轮次

中断处理

对话历史

添加系统消息

添加用户消息

语音选项

音频格式

轮次检测选项

服务器端VAD（默认）

Azure语义VAD（更智能的检测）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement