Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 55 additions & 2 deletions sdk/voicelive/azure-ai-voicelive/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,68 @@
# Release History

## 1.2.0b6 (Unreleased)
## 1.2.0 (Unreleased)

### Features Added

- **Web Search & File Search**: Added support for built-in web search and file search tools:
- New item types: `ResponseWebSearchCallItem`, `ResponseFileSearchCallItem`
- New server events for web/file search lifecycle (`searching`, `in_progress`, `completed`)
- New models: `ActionFind`, `ActionOpenPage`, `ActionSearch`, `ActionSearchSource`, `FileSearchResult`
- New enum values: `ItemType.WEB_SEARCH_CALL`, `ItemType.FILE_SEARCH_CALL`
- New `SessionIncludeOption` enum for controlling what data is included in session responses
- **MCP (Model Context Protocol) Support**: Added comprehensive support for Model Context Protocol integration:
- `MCPServer` tool type for defining MCP server configurations with authorization, headers, and approval requirements
- `MCPTool` model for representing MCP tool definitions with input schemas and annotations
- `MCPApprovalType` enum for controlling approval workflows (`never`, `always`, or tool-specific)
- New item types for MCP approval and call workflows
- New server events for MCP tool listing, call lifecycle, and approval flows
- **Avatar Enhancements**:
- Added `AzureAvatarVoiceSyncVoice` for avatar voice sync configuration
- Added `ServerEventSessionAvatarSwitchToIdle` and `ServerEventSessionAvatarSwitchToSpeaking` events
- Added `ServerEventResponseVideoDelta` for avatar video frame streaming
- Added `ClientEventOutputAudioBufferClear` and `ServerEventOutputAudioBufferCleared` for output buffer management
- Added `AvatarConfigTypes` enum with support for `video-avatar` and `photo-avatar` types
- Added `AvatarOutputProtocol` enum for avatar streaming protocols (`webrtc`, `websocket`)
- Added `Scene` model for controlling avatar zoom, position, rotation, and movement amplitude
- Added `output_audit_audio` field to `AvatarConfig`
- **OpenTelemetry Tracing Support**: Added `VoiceLiveInstrumentor` for opt-in OpenTelemetry-based
tracing of VoiceLive WebSocket connections, following Azure SDK and GenAI semantic conventions.
- Enable via `AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true` environment variable
- Content recording controlled by `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`
- Comprehensive session-level telemetry: session ID, audio format, first-token latency,
turn count, interruption count, audio bytes sent/received, message size
- Response & function call ID tracking for end-to-end tracing
- Agent v2 telemetry with agent identity and configuration tracking
- MCP telemetry with tool call and approval flow tracking
- **Agent Session Configuration**: Added `AgentSessionConfig` for configuring Azure AI Foundry agents
at connection time with `agent_name`, `project_name`, `agent_version`, `conversation_id`, and more
- **Transcription Improvements**:
- Added `TranscriptionPhrase` and `TranscriptionWord` models for detailed transcription data
- Added `ServerEventResponseAudioTranscriptAnnotationAdded` event
- Added `gpt-4o-transcribe-diarize` and `mai-transcribe-1` transcription model support
- **Interim Response Configuration**: Added `StaticInterimResponseConfig` and `LlmInterimResponseConfig`
for generating interim responses during latency or tool calls
- **Image Content Support**: Added `RequestImageContentPart` for image inputs in conversations
- **Reasoning Effort Control**: Added `reasoning_effort` field with `ReasoningEffort` enum
- **Response Metadata**: Added `metadata` field to `Response` and `ResponseCreateParams`
- **Server Warning Events**: Added `ServerEventWarning` for handling non-fatal warnings
- **Personal Voice Models**: Added `DragonHDOmniLatestNeural` and `MAI-Voice-1` model options
- **Enhanced OpenAI Voices**: Added `marin` and `cedar` voices to `OpenAIVoiceName` enum
- **Enhanced Azure Personal Voice**: Added `custom_lexicon_url`, `prefer_locales`, `locale`, `style`,
`pitch`, `rate`, and `volume` properties
- **Pre-generated Assistant Messages**: Added `pre_generated_assistant_message` in `ResponseCreateParams`
- **Explicit Null Values**: Enhanced `RequestSession` to properly serialize explicitly set `None` values

### Breaking Changes

### Bugs Fixed
- Removed `PersonalVoiceModels.PHOENIX_V2_NEURAL` enum value (replaced by `DRAGON_HD_OMNI_LATEST_NEURAL` and `MAI_VOICE1`)
- Removed Foundry Agent Tool classes (`FoundryAgentTool`, `ResponseFoundryAgentCallItem`, etc.) —
use `AgentSessionConfig` with `connect()` instead

### Other Changes

- Updated default API version to `2026-04-10`

## 1.2.0b5 (2026-04-06)

### Features Added
Expand Down
4 changes: 1 addition & 3 deletions sdk/voicelive/azure-ai-voicelive/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
include *.md
include LICENSE
include azure/ai/voicelive/py.typed
include azure/py.typed
recursive-include tests *.py
recursive-include samples *.py *.md
include azure/__init__.py
include azure/ai/__init__.py
4 changes: 2 additions & 2 deletions sdk/voicelive/azure-ai-voicelive/_metadata.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"apiVersion": "2026-01-01-preview",
"apiVersion": "2026-04-10",
"apiVersions": {
"VoiceLive": "2026-01-01-preview"
"VoiceLive": "2026-04-10"
}
}
79 changes: 51 additions & 28 deletions sdk/voicelive/azure-ai-voicelive/apiview-properties.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
{
"CrossLanguagePackageId": "VoiceLive",
"CrossLanguageDefinitionId": {
"azure.ai.voicelive.models.ActionFind": "VoiceLive.ActionFind",
"azure.ai.voicelive.models.ActionOpenPage": "VoiceLive.ActionOpenPage",
"azure.ai.voicelive.models.ActionSearch": "VoiceLive.ActionSearch",
"azure.ai.voicelive.models.ActionSearchSource": "VoiceLive.ActionSearchSource",
"azure.ai.voicelive.models.AgentConfig": "VoiceLive.AgentConfig",
"azure.ai.voicelive.models.Animation": "VoiceLive.Animation",
"azure.ai.voicelive.models.ConversationRequestItem": "VoiceLive.ConversationRequestItem",
Expand All @@ -11,6 +15,7 @@
"azure.ai.voicelive.models.AudioNoiseReduction": "VoiceLive.AudioNoiseReduction",
"azure.ai.voicelive.models.AvatarConfig": "VoiceLive.AvatarConfig",
"azure.ai.voicelive.models.AzureVoice": "VoiceLive.AzureVoice",
"azure.ai.voicelive.models.AzureAvatarVoiceSyncVoice": "VoiceLive.AzureAvatarVoiceSyncVoice",
"azure.ai.voicelive.models.AzureCustomVoice": "VoiceLive.AzureCustomVoice",
"azure.ai.voicelive.models.AzurePersonalVoice": "VoiceLive.AzurePersonalVoice",
"azure.ai.voicelive.models.EouDetection": "VoiceLive.EouDetection",
Expand All @@ -37,13 +42,15 @@
"azure.ai.voicelive.models.ClientEventInputAudioTurnCancel": "VoiceLive.ClientEventInputAudioTurnCancel",
"azure.ai.voicelive.models.ClientEventInputAudioTurnEnd": "VoiceLive.ClientEventInputAudioTurnEnd",
"azure.ai.voicelive.models.ClientEventInputAudioTurnStart": "VoiceLive.ClientEventInputAudioTurnStart",
"azure.ai.voicelive.models.ClientEventOutputAudioBufferClear": "VoiceLive.ClientEventOutputAudioBufferClear",
"azure.ai.voicelive.models.ClientEventResponseCancel": "VoiceLive.ClientEventResponseCancel",
"azure.ai.voicelive.models.ClientEventResponseCreate": "VoiceLive.ClientEventResponseCreate",
"azure.ai.voicelive.models.ClientEventSessionAvatarConnect": "VoiceLive.ClientEventSessionAvatarConnect",
"azure.ai.voicelive.models.ClientEventSessionUpdate": "VoiceLive.ClientEventSessionUpdate",
"azure.ai.voicelive.models.ContentPart": "VoiceLive.ContentPart",
"azure.ai.voicelive.models.ConversationItemBase": "VoiceLive.ConversationItemBase",
"azure.ai.voicelive.models.ErrorResponse": "VoiceLive.ErrorResponse",
"azure.ai.voicelive.models.FileSearchResult": "VoiceLive.FileSearchResult",
"azure.ai.voicelive.models.FunctionCallItem": "VoiceLive.FunctionCallItem",
"azure.ai.voicelive.models.FunctionCallOutputItem": "VoiceLive.FunctionCallOutputItem",
"azure.ai.voicelive.models.Tool": "VoiceLive.Tool",
Expand Down Expand Up @@ -73,6 +80,7 @@
"azure.ai.voicelive.models.ResponseCreateParams": "VoiceLive.ResponseCreateParams",
"azure.ai.voicelive.models.ResponseFailedDetails": "VoiceLive.ResponseFailedDetails",
"azure.ai.voicelive.models.ResponseItem": "VoiceLive.ResponseItem",
"azure.ai.voicelive.models.ResponseFileSearchCallItem": "VoiceLive.ResponseFileSearchCallItem",
"azure.ai.voicelive.models.ResponseFunctionCallItem": "VoiceLive.ResponseFunctionCallItem",
"azure.ai.voicelive.models.ResponseFunctionCallOutputItem": "VoiceLive.ResponseFunctionCallOutputItem",
"azure.ai.voicelive.models.ResponseIncompleteDetails": "VoiceLive.ResponseIncompleteDetails",
Expand All @@ -83,6 +91,7 @@
"azure.ai.voicelive.models.ResponseMessageItem": "VoiceLive.ResponseMessageItem",
"azure.ai.voicelive.models.ResponseSession": "VoiceLive.ResponseSession",
"azure.ai.voicelive.models.ResponseTextContentPart": "VoiceLive.ResponseTextContentPart",
"azure.ai.voicelive.models.ResponseWebSearchCallItem": "VoiceLive.ResponseWebSearchCallItem",
"azure.ai.voicelive.models.Scene": "VoiceLive.Scene",
"azure.ai.voicelive.models.ServerEvent": "VoiceLive.ServerEvent",
"azure.ai.voicelive.models.ServerEventConversationItemCreated": "VoiceLive.ServerEventConversationItemCreated",
Expand All @@ -101,6 +110,7 @@
"azure.ai.voicelive.models.ServerEventMcpListToolsCompleted": "VoiceLive.ServerEventMcpListToolsCompleted",
"azure.ai.voicelive.models.ServerEventMcpListToolsFailed": "VoiceLive.ServerEventMcpListToolsFailed",
"azure.ai.voicelive.models.ServerEventMcpListToolsInProgress": "VoiceLive.ServerEventMcpListToolsInProgress",
"azure.ai.voicelive.models.ServerEventOutputAudioBufferCleared": "VoiceLive.ServerEventOutputAudioBufferCleared",
"azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDelta": "VoiceLive.ServerEventResponseAnimationBlendshapeDelta",
"azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDone": "VoiceLive.ServerEventResponseAnimationBlendshapeDone",
"azure.ai.voicelive.models.ServerEventResponseAnimationVisemeDelta": "VoiceLive.ServerEventResponseAnimationVisemeDelta",
Expand All @@ -109,12 +119,16 @@
"azure.ai.voicelive.models.ServerEventResponseAudioDone": "VoiceLive.ServerEventResponseAudioDone",
"azure.ai.voicelive.models.ServerEventResponseAudioTimestampDelta": "VoiceLive.ServerEventResponseAudioTimestampDelta",
"azure.ai.voicelive.models.ServerEventResponseAudioTimestampDone": "VoiceLive.ServerEventResponseAudioTimestampDone",
"azure.ai.voicelive.models.ServerEventResponseAudioTranscriptAnnotationAdded": "VoiceLive.ServerEventResponseAudioTranscriptAnnotationAdded",
"azure.ai.voicelive.models.ServerEventResponseAudioTranscriptDelta": "VoiceLive.ServerEventResponseAudioTranscriptDelta",
"azure.ai.voicelive.models.ServerEventResponseAudioTranscriptDone": "VoiceLive.ServerEventResponseAudioTranscriptDone",
"azure.ai.voicelive.models.ServerEventResponseContentPartAdded": "VoiceLive.ServerEventResponseContentPartAdded",
"azure.ai.voicelive.models.ServerEventResponseContentPartDone": "VoiceLive.ServerEventResponseContentPartDone",
"azure.ai.voicelive.models.ServerEventResponseCreated": "VoiceLive.ServerEventResponseCreated",
"azure.ai.voicelive.models.ServerEventResponseDone": "VoiceLive.ServerEventResponseDone",
"azure.ai.voicelive.models.ServerEventResponseFileSearchCallCompleted": "VoiceLive.ServerEventResponseFileSearchCallCompleted",
"azure.ai.voicelive.models.ServerEventResponseFileSearchCallInProgress": "VoiceLive.ServerEventResponseFileSearchCallInProgress",
"azure.ai.voicelive.models.ServerEventResponseFileSearchCallSearching": "VoiceLive.ServerEventResponseFileSearchCallSearching",
"azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDelta": "VoiceLive.ServerEventResponseFunctionCallArgumentsDelta",
"azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDone": "VoiceLive.ServerEventResponseFunctionCallArgumentsDone",
"azure.ai.voicelive.models.ServerEventResponseMcpCallArgumentsDelta": "VoiceLive.ServerEventResponseMcpCallArgumentsDelta",
Expand All @@ -126,7 +140,13 @@
"azure.ai.voicelive.models.ServerEventResponseOutputItemDone": "VoiceLive.ServerEventResponseOutputItemDone",
"azure.ai.voicelive.models.ServerEventResponseTextDelta": "VoiceLive.ServerEventResponseTextDelta",
"azure.ai.voicelive.models.ServerEventResponseTextDone": "VoiceLive.ServerEventResponseTextDone",
"azure.ai.voicelive.models.ServerEventResponseVideoDelta": "VoiceLive.ServerEventResponseVideoDelta",
"azure.ai.voicelive.models.ServerEventResponseWebSearchCallCompleted": "VoiceLive.ServerEventResponseWebSearchCallCompleted",
"azure.ai.voicelive.models.ServerEventResponseWebSearchCallInProgress": "VoiceLive.ServerEventResponseWebSearchCallInProgress",
"azure.ai.voicelive.models.ServerEventResponseWebSearchCallSearching": "VoiceLive.ServerEventResponseWebSearchCallSearching",
"azure.ai.voicelive.models.ServerEventSessionAvatarConnecting": "VoiceLive.ServerEventSessionAvatarConnecting",
"azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToIdle": "VoiceLive.ServerEventSessionAvatarSwitchToIdle",
"azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToSpeaking": "VoiceLive.ServerEventSessionAvatarSwitchToSpeaking",
"azure.ai.voicelive.models.ServerEventSessionCreated": "VoiceLive.ServerEventSessionCreated",
"azure.ai.voicelive.models.ServerEventSessionUpdated": "VoiceLive.ServerEventSessionUpdated",
"azure.ai.voicelive.models.ServerEventWarning": "VoiceLive.ServerEventWarning",
Expand All @@ -138,38 +158,41 @@
"azure.ai.voicelive.models.TokenUsage": "VoiceLive.TokenUsage",
"azure.ai.voicelive.models.ToolChoiceSelection": "VoiceLive.ToolChoiceObject",
"azure.ai.voicelive.models.ToolChoiceFunctionSelection": "VoiceLive.ToolChoiceFunctionObject",
"azure.ai.voicelive.models.TranscriptionPhrase": "VoiceLive.TranscriptionPhrase",
"azure.ai.voicelive.models.TranscriptionWord": "VoiceLive.TranscriptionWord",
"azure.ai.voicelive.models.UserMessageItem": "VoiceLive.UserMessageItem",
"azure.ai.voicelive.models.VideoCrop": "VoiceLive.VideoCrop",
"azure.ai.voicelive.models.VideoParams": "VoiceLive.VideoParams",
"azure.ai.voicelive.models.VideoResolution": "VoiceLive.VideoResolution",
"azure.ai.voicelive.models.VoiceLiveErrorDetails": "VoiceLive.VoiceLiveErrorDetails",
"azure.ai.voicelive.models.ClientEventType": "VoiceLive.ClientEventType",
"azure.ai.voicelive.models.ItemType": "VoiceLive.ItemType",
"azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
"azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
"azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
"azure.ai.voicelive.models.Modality": "VoiceLive.Modality",
"azure.ai.voicelive.models.OpenAIVoiceName": "VoiceLive.OAIVoice",
"azure.ai.voicelive.models.AzureVoiceType": "VoiceLive.AzureVoiceType",
"azure.ai.voicelive.models.PersonalVoiceModels": "VoiceLive.PersonalVoiceModels",
"azure.ai.voicelive.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
"azure.ai.voicelive.models.ToolType": "VoiceLive.ToolType",
"azure.ai.voicelive.models.MCPApprovalType": "VoiceLive.MCPApprovalType",
"azure.ai.voicelive.models.ReasoningEffort": "VoiceLive.ReasoningEffort",
"azure.ai.voicelive.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
"azure.ai.voicelive.models.InputAudioFormat": "VoiceLive.InputAudioFormat",
"azure.ai.voicelive.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
"azure.ai.voicelive.models.EouThresholdLevel": "VoiceLive.EouThresholdLevel",
"azure.ai.voicelive.models.AvatarConfigTypes": "VoiceLive.AvatarConfigTypes",
"azure.ai.voicelive.models.PhotoAvatarBaseModes": "VoiceLive.PhotoAvatarBaseModes",
"azure.ai.voicelive.models.AvatarOutputProtocol": "VoiceLive.AvatarOutputProtocol",
"azure.ai.voicelive.models.AudioTimestampType": "VoiceLive.AudioTimestampType",
"azure.ai.voicelive.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
"azure.ai.voicelive.models.InterimResponseConfigType": "VoiceLive.InterimResponseConfigType",
"azure.ai.voicelive.models.InterimResponseTrigger": "VoiceLive.InterimResponseTrigger",
"azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus",
"azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
"azure.ai.voicelive.models.RequestImageContentPartDetail": "VoiceLive.RequestImageContentPartDetail",
"azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType"
"azure.models.ClientEventType": "VoiceLive.ClientEventType",
"azure.models.ItemType": "VoiceLive.ItemType",
"azure.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
"azure.models.MessageRole": "VoiceLive.MessageRole",
"azure.models.ContentPartType": "VoiceLive.ContentPartType",
"azure.models.Modality": "VoiceLive.Modality",
"azure.models.OpenAIVoiceName": "VoiceLive.OAIVoice",
"azure.models.AzureVoiceType": "VoiceLive.AzureVoiceType",
"azure.models.PersonalVoiceModels": "VoiceLive.PersonalVoiceModels",
"azure.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
"azure.models.ToolType": "VoiceLive.ToolType",
"azure.models.MCPApprovalType": "VoiceLive.MCPApprovalType",
"azure.models.ReasoningEffort": "VoiceLive.ReasoningEffort",
"azure.models.InterimResponseConfigType": "VoiceLive.InterimResponseConfigType",
"azure.models.InterimResponseTrigger": "VoiceLive.InterimResponseTrigger",
"azure.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
"azure.models.InputAudioFormat": "VoiceLive.InputAudioFormat",
"azure.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
"azure.models.EouThresholdLevel": "VoiceLive.EouThresholdLevel",
"azure.models.AvatarConfigTypes": "VoiceLive.AvatarConfigTypes",
"azure.models.PhotoAvatarBaseModes": "VoiceLive.PhotoAvatarBaseModes",
"azure.models.AvatarOutputProtocol": "VoiceLive.AvatarOutputProtocol",
"azure.models.AudioTimestampType": "VoiceLive.AudioTimestampType",
"azure.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
"azure.models.SessionIncludeOption": "VoiceLive.SessionIncludeOption",
"azure.models.ResponseStatus": "VoiceLive.ResponseStatus",
"azure.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
"azure.models.RequestImageContentPartDetail": "VoiceLive.RequestImageContentPartDetail",
"azure.models.ServerEventType": "VoiceLive.ServerEventType"
}
}
15 changes: 15 additions & 0 deletions sdk/voicelive/azure-ai-voicelive/azure/_types.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# coding=utf-8
# --------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# Code generated by Microsoft (R) Python Code Generator.
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
# --------------------------------------------------------------------------

from typing import TYPE_CHECKING, Union

if TYPE_CHECKING:
from .ai.voicelive import models as _models
Voice = Union[str, "_models.OpenAIVoiceName", "_models.OpenAIVoice", "_models.AzureVoice"]
InterimResponseConfig = Union["_models.StaticInterimResponseConfig", "_models.LlmInterimResponseConfig"]
ToolChoice = Union[str, "_models.ToolChoiceLiteral", "_models.ToolChoiceSelection"]
Loading
Loading