Azure · xitzhang · May 5, 2026 · May 5, 2026 · May 5, 2026 · May 5, 2026
@@ -1,15 +1,68 @@
 # Release History
 
-## 1.2.0b6 (Unreleased)
+## 1.2.0 (Unreleased)
 
 ### Features Added
 
+- **Web Search & File Search**: Added support for built-in web search and file search tools:
+  - New item types: `ResponseWebSearchCallItem`, `ResponseFileSearchCallItem`
+  - New server events for web/file search lifecycle (`searching`, `in_progress`, `completed`)
+  - New models: `ActionFind`, `ActionOpenPage`, `ActionSearch`, `ActionSearchSource`, `FileSearchResult`
+  - New enum values: `ItemType.WEB_SEARCH_CALL`, `ItemType.FILE_SEARCH_CALL`
+  - New `SessionIncludeOption` enum for controlling what data is included in session responses
+- **MCP (Model Context Protocol) Support**: Added comprehensive support for Model Context Protocol integration:
+  - `MCPServer` tool type for defining MCP server configurations with authorization, headers, and approval requirements
+  - `MCPTool` model for representing MCP tool definitions with input schemas and annotations
+  - `MCPApprovalType` enum for controlling approval workflows (`never`, `always`, or tool-specific)
+  - New item types for MCP approval and call workflows
+  - New server events for MCP tool listing, call lifecycle, and approval flows
+- **Avatar Enhancements**:
+  - Added `AzureAvatarVoiceSyncVoice` for avatar voice sync configuration
+  - Added `ServerEventSessionAvatarSwitchToIdle` and `ServerEventSessionAvatarSwitchToSpeaking` events
+  - Added `ServerEventResponseVideoDelta` for avatar video frame streaming
+  - Added `ClientEventOutputAudioBufferClear` and `ServerEventOutputAudioBufferCleared` for output buffer management
+  - Added `AvatarConfigTypes` enum with support for `video-avatar` and `photo-avatar` types
+  - Added `AvatarOutputProtocol` enum for avatar streaming protocols (`webrtc`, `websocket`)
+  - Added `Scene` model for controlling avatar zoom, position, rotation, and movement amplitude
+  - Added `output_audit_audio` field to `AvatarConfig`
+- **OpenTelemetry Tracing Support**: Added `VoiceLiveInstrumentor` for opt-in OpenTelemetry-based
+  tracing of VoiceLive WebSocket connections, following Azure SDK and GenAI semantic conventions.
+  - Enable via `AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true` environment variable
+  - Content recording controlled by `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`
+  - Comprehensive session-level telemetry: session ID, audio format, first-token latency,
+    turn count, interruption count, audio bytes sent/received, message size
+  - Response & function call ID tracking for end-to-end tracing
+  - Agent v2 telemetry with agent identity and configuration tracking
+  - MCP telemetry with tool call and approval flow tracking
+- **Agent Session Configuration**: Added `AgentSessionConfig` for configuring Azure AI Foundry agents
+  at connection time with `agent_name`, `project_name`, `agent_version`, `conversation_id`, and more
+- **Transcription Improvements**:
+  - Added `TranscriptionPhrase` and `TranscriptionWord` models for detailed transcription data
+  - Added `ServerEventResponseAudioTranscriptAnnotationAdded` event
+  - Added `gpt-4o-transcribe-diarize` and `mai-transcribe-1` transcription model support
+- **Interim Response Configuration**: Added `StaticInterimResponseConfig` and `LlmInterimResponseConfig`
+  for generating interim responses during latency or tool calls
+- **Image Content Support**: Added `RequestImageContentPart` for image inputs in conversations
+- **Reasoning Effort Control**: Added `reasoning_effort` field with `ReasoningEffort` enum
+- **Response Metadata**: Added `metadata` field to `Response` and `ResponseCreateParams`
+- **Server Warning Events**: Added `ServerEventWarning` for handling non-fatal warnings
+- **Personal Voice Models**: Added `DragonHDOmniLatestNeural` and `MAI-Voice-1` model options
+- **Enhanced OpenAI Voices**: Added `marin` and `cedar` voices to `OpenAIVoiceName` enum
+- **Enhanced Azure Personal Voice**: Added `custom_lexicon_url`, `prefer_locales`, `locale`, `style`,
+  `pitch`, `rate`, and `volume` properties
+- **Pre-generated Assistant Messages**: Added `pre_generated_assistant_message` in `ResponseCreateParams`
+- **Explicit Null Values**: Enhanced `RequestSession` to properly serialize explicitly set `None` values
+
 ### Breaking Changes
 
-### Bugs Fixed
+- Removed `PersonalVoiceModels.PHOENIX_V2_NEURAL` enum value (replaced by `DRAGON_HD_OMNI_LATEST_NEURAL` and `MAI_VOICE1`)
+- Removed Foundry Agent Tool classes (`FoundryAgentTool`, `ResponseFoundryAgentCallItem`, etc.) —
+  use `AgentSessionConfig` with `connect()` instead
 
 ### Other Changes
 
+- Updated default API version to `2026-04-10`
+
 ## 1.2.0b5 (2026-04-06)
 
 ### Features Added

@@ -1,7 +1,5 @@
 include *.md
 include LICENSE
-include azure/ai/voicelive/py.typed
+include azure/py.typed
 recursive-include tests *.py
 recursive-include samples *.py *.md
-include azure/__init__.py
-include azure/ai/__init__.py
@@ -1,6 +1,6 @@
 {
-  "apiVersion": "2026-01-01-preview",
+  "apiVersion": "2026-04-10",
   "apiVersions": {
-    "VoiceLive": "2026-01-01-preview"
+    "VoiceLive": "2026-04-10"
   }
 }
@@ -1,6 +1,10 @@
 {
     "CrossLanguagePackageId": "VoiceLive",
     "CrossLanguageDefinitionId": {
+        "azure.ai.voicelive.models.ActionFind": "VoiceLive.ActionFind",
+        "azure.ai.voicelive.models.ActionOpenPage": "VoiceLive.ActionOpenPage",
+        "azure.ai.voicelive.models.ActionSearch": "VoiceLive.ActionSearch",
+        "azure.ai.voicelive.models.ActionSearchSource": "VoiceLive.ActionSearchSource",
         "azure.ai.voicelive.models.AgentConfig": "VoiceLive.AgentConfig",
         "azure.ai.voicelive.models.Animation": "VoiceLive.Animation",
         "azure.ai.voicelive.models.ConversationRequestItem": "VoiceLive.ConversationRequestItem",
@@ -11,6 +15,7 @@
         "azure.ai.voicelive.models.AudioNoiseReduction": "VoiceLive.AudioNoiseReduction",
         "azure.ai.voicelive.models.AvatarConfig": "VoiceLive.AvatarConfig",
         "azure.ai.voicelive.models.AzureVoice": "VoiceLive.AzureVoice",
+        "azure.ai.voicelive.models.AzureAvatarVoiceSyncVoice": "VoiceLive.AzureAvatarVoiceSyncVoice",
         "azure.ai.voicelive.models.AzureCustomVoice": "VoiceLive.AzureCustomVoice",
         "azure.ai.voicelive.models.AzurePersonalVoice": "VoiceLive.AzurePersonalVoice",
         "azure.ai.voicelive.models.EouDetection": "VoiceLive.EouDetection",
@@ -37,13 +42,15 @@
         "azure.ai.voicelive.models.ClientEventInputAudioTurnCancel": "VoiceLive.ClientEventInputAudioTurnCancel",
         "azure.ai.voicelive.models.ClientEventInputAudioTurnEnd": "VoiceLive.ClientEventInputAudioTurnEnd",
         "azure.ai.voicelive.models.ClientEventInputAudioTurnStart": "VoiceLive.ClientEventInputAudioTurnStart",
+        "azure.ai.voicelive.models.ClientEventOutputAudioBufferClear": "VoiceLive.ClientEventOutputAudioBufferClear",
         "azure.ai.voicelive.models.ClientEventResponseCancel": "VoiceLive.ClientEventResponseCancel",
         "azure.ai.voicelive.models.ClientEventResponseCreate": "VoiceLive.ClientEventResponseCreate",
         "azure.ai.voicelive.models.ClientEventSessionAvatarConnect": "VoiceLive.ClientEventSessionAvatarConnect",
         "azure.ai.voicelive.models.ClientEventSessionUpdate": "VoiceLive.ClientEventSessionUpdate",
         "azure.ai.voicelive.models.ContentPart": "VoiceLive.ContentPart",
         "azure.ai.voicelive.models.ConversationItemBase": "VoiceLive.ConversationItemBase",
         "azure.ai.voicelive.models.ErrorResponse": "VoiceLive.ErrorResponse",
+        "azure.ai.voicelive.models.FileSearchResult": "VoiceLive.FileSearchResult",
         "azure.ai.voicelive.models.FunctionCallItem": "VoiceLive.FunctionCallItem",
         "azure.ai.voicelive.models.FunctionCallOutputItem": "VoiceLive.FunctionCallOutputItem",
         "azure.ai.voicelive.models.Tool": "VoiceLive.Tool",
@@ -73,6 +80,7 @@
         "azure.ai.voicelive.models.ResponseCreateParams": "VoiceLive.ResponseCreateParams",
         "azure.ai.voicelive.models.ResponseFailedDetails": "VoiceLive.ResponseFailedDetails",
         "azure.ai.voicelive.models.ResponseItem": "VoiceLive.ResponseItem",
+        "azure.ai.voicelive.models.ResponseFileSearchCallItem": "VoiceLive.ResponseFileSearchCallItem",
         "azure.ai.voicelive.models.ResponseFunctionCallItem": "VoiceLive.ResponseFunctionCallItem",
         "azure.ai.voicelive.models.ResponseFunctionCallOutputItem": "VoiceLive.ResponseFunctionCallOutputItem",
         "azure.ai.voicelive.models.ResponseIncompleteDetails": "VoiceLive.ResponseIncompleteDetails",
@@ -83,6 +91,7 @@
         "azure.ai.voicelive.models.ResponseMessageItem": "VoiceLive.ResponseMessageItem",
         "azure.ai.voicelive.models.ResponseSession": "VoiceLive.ResponseSession",
         "azure.ai.voicelive.models.ResponseTextContentPart": "VoiceLive.ResponseTextContentPart",
+        "azure.ai.voicelive.models.ResponseWebSearchCallItem": "VoiceLive.ResponseWebSearchCallItem",
         "azure.ai.voicelive.models.Scene": "VoiceLive.Scene",
         "azure.ai.voicelive.models.ServerEvent": "VoiceLive.ServerEvent",
         "azure.ai.voicelive.models.ServerEventConversationItemCreated": "VoiceLive.ServerEventConversationItemCreated",
@@ -101,6 +110,7 @@
         "azure.ai.voicelive.models.ServerEventMcpListToolsCompleted": "VoiceLive.ServerEventMcpListToolsCompleted",
         "azure.ai.voicelive.models.ServerEventMcpListToolsFailed": "VoiceLive.ServerEventMcpListToolsFailed",
         "azure.ai.voicelive.models.ServerEventMcpListToolsInProgress": "VoiceLive.ServerEventMcpListToolsInProgress",
+        "azure.ai.voicelive.models.ServerEventOutputAudioBufferCleared": "VoiceLive.ServerEventOutputAudioBufferCleared",
         "azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDelta": "VoiceLive.ServerEventResponseAnimationBlendshapeDelta",
         "azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDone": "VoiceLive.ServerEventResponseAnimationBlendshapeDone",
         "azure.ai.voicelive.models.ServerEventResponseAnimationVisemeDelta": "VoiceLive.ServerEventResponseAnimationVisemeDelta",
@@ -109,12 +119,16 @@
         "azure.ai.voicelive.models.ServerEventResponseAudioDone": "VoiceLive.ServerEventResponseAudioDone",
         "azure.ai.voicelive.models.ServerEventResponseAudioTimestampDelta": "VoiceLive.ServerEventResponseAudioTimestampDelta",
         "azure.ai.voicelive.models.ServerEventResponseAudioTimestampDone": "VoiceLive.ServerEventResponseAudioTimestampDone",
+        "azure.ai.voicelive.models.ServerEventResponseAudioTranscriptAnnotationAdded": "VoiceLive.ServerEventResponseAudioTranscriptAnnotationAdded",
         "azure.ai.voicelive.models.ServerEventResponseAudioTranscriptDelta": "VoiceLive.ServerEventResponseAudioTranscriptDelta",
         "azure.ai.voicelive.models.ServerEventResponseAudioTranscriptDone": "VoiceLive.ServerEventResponseAudioTranscriptDone",
         "azure.ai.voicelive.models.ServerEventResponseContentPartAdded": "VoiceLive.ServerEventResponseContentPartAdded",
         "azure.ai.voicelive.models.ServerEventResponseContentPartDone": "VoiceLive.ServerEventResponseContentPartDone",
         "azure.ai.voicelive.models.ServerEventResponseCreated": "VoiceLive.ServerEventResponseCreated",
         "azure.ai.voicelive.models.ServerEventResponseDone": "VoiceLive.ServerEventResponseDone",
+        "azure.ai.voicelive.models.ServerEventResponseFileSearchCallCompleted": "VoiceLive.ServerEventResponseFileSearchCallCompleted",
+        "azure.ai.voicelive.models.ServerEventResponseFileSearchCallInProgress": "VoiceLive.ServerEventResponseFileSearchCallInProgress",
+        "azure.ai.voicelive.models.ServerEventResponseFileSearchCallSearching": "VoiceLive.ServerEventResponseFileSearchCallSearching",
         "azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDelta": "VoiceLive.ServerEventResponseFunctionCallArgumentsDelta",
         "azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDone": "VoiceLive.ServerEventResponseFunctionCallArgumentsDone",
         "azure.ai.voicelive.models.ServerEventResponseMcpCallArgumentsDelta": "VoiceLive.ServerEventResponseMcpCallArgumentsDelta",
@@ -126,7 +140,13 @@
         "azure.ai.voicelive.models.ServerEventResponseOutputItemDone": "VoiceLive.ServerEventResponseOutputItemDone",
         "azure.ai.voicelive.models.ServerEventResponseTextDelta": "VoiceLive.ServerEventResponseTextDelta",
         "azure.ai.voicelive.models.ServerEventResponseTextDone": "VoiceLive.ServerEventResponseTextDone",
+        "azure.ai.voicelive.models.ServerEventResponseVideoDelta": "VoiceLive.ServerEventResponseVideoDelta",
+        "azure.ai.voicelive.models.ServerEventResponseWebSearchCallCompleted": "VoiceLive.ServerEventResponseWebSearchCallCompleted",
+        "azure.ai.voicelive.models.ServerEventResponseWebSearchCallInProgress": "VoiceLive.ServerEventResponseWebSearchCallInProgress",
+        "azure.ai.voicelive.models.ServerEventResponseWebSearchCallSearching": "VoiceLive.ServerEventResponseWebSearchCallSearching",
         "azure.ai.voicelive.models.ServerEventSessionAvatarConnecting": "VoiceLive.ServerEventSessionAvatarConnecting",
+        "azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToIdle": "VoiceLive.ServerEventSessionAvatarSwitchToIdle",
+        "azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToSpeaking": "VoiceLive.ServerEventSessionAvatarSwitchToSpeaking",
         "azure.ai.voicelive.models.ServerEventSessionCreated": "VoiceLive.ServerEventSessionCreated",
         "azure.ai.voicelive.models.ServerEventSessionUpdated": "VoiceLive.ServerEventSessionUpdated",
         "azure.ai.voicelive.models.ServerEventWarning": "VoiceLive.ServerEventWarning",
@@ -138,38 +158,41 @@
         "azure.ai.voicelive.models.TokenUsage": "VoiceLive.TokenUsage",
         "azure.ai.voicelive.models.ToolChoiceSelection": "VoiceLive.ToolChoiceObject",
         "azure.ai.voicelive.models.ToolChoiceFunctionSelection": "VoiceLive.ToolChoiceFunctionObject",
+        "azure.ai.voicelive.models.TranscriptionPhrase": "VoiceLive.TranscriptionPhrase",
+        "azure.ai.voicelive.models.TranscriptionWord": "VoiceLive.TranscriptionWord",
         "azure.ai.voicelive.models.UserMessageItem": "VoiceLive.UserMessageItem",
         "azure.ai.voicelive.models.VideoCrop": "VoiceLive.VideoCrop",
         "azure.ai.voicelive.models.VideoParams": "VoiceLive.VideoParams",
         "azure.ai.voicelive.models.VideoResolution": "VoiceLive.VideoResolution",
         "azure.ai.voicelive.models.VoiceLiveErrorDetails": "VoiceLive.VoiceLiveErrorDetails",
-        "azure.ai.voicelive.models.ClientEventType": "VoiceLive.ClientEventType",
-        "azure.ai.voicelive.models.ItemType": "VoiceLive.ItemType",
-        "azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
-        "azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
-        "azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
-        "azure.ai.voicelive.models.Modality": "VoiceLive.Modality",
-        "azure.ai.voicelive.models.OpenAIVoiceName": "VoiceLive.OAIVoice",
-        "azure.ai.voicelive.models.AzureVoiceType": "VoiceLive.AzureVoiceType",
-        "azure.ai.voicelive.models.PersonalVoiceModels": "VoiceLive.PersonalVoiceModels",
-        "azure.ai.voicelive.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
-        "azure.ai.voicelive.models.ToolType": "VoiceLive.ToolType",
-        "azure.ai.voicelive.models.MCPApprovalType": "VoiceLive.MCPApprovalType",
-        "azure.ai.voicelive.models.ReasoningEffort": "VoiceLive.ReasoningEffort",
-        "azure.ai.voicelive.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
-        "azure.ai.voicelive.models.InputAudioFormat": "VoiceLive.InputAudioFormat",
-        "azure.ai.voicelive.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
-        "azure.ai.voicelive.models.EouThresholdLevel": "VoiceLive.EouThresholdLevel",
-        "azure.ai.voicelive.models.AvatarConfigTypes": "VoiceLive.AvatarConfigTypes",
-        "azure.ai.voicelive.models.PhotoAvatarBaseModes": "VoiceLive.PhotoAvatarBaseModes",
-        "azure.ai.voicelive.models.AvatarOutputProtocol": "VoiceLive.AvatarOutputProtocol",
-        "azure.ai.voicelive.models.AudioTimestampType": "VoiceLive.AudioTimestampType",
-        "azure.ai.voicelive.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
-        "azure.ai.voicelive.models.InterimResponseConfigType": "VoiceLive.InterimResponseConfigType",
-        "azure.ai.voicelive.models.InterimResponseTrigger": "VoiceLive.InterimResponseTrigger",
-        "azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus",
-        "azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
-        "azure.ai.voicelive.models.RequestImageContentPartDetail": "VoiceLive.RequestImageContentPartDetail",
-        "azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType"
+        "azure.models.ClientEventType": "VoiceLive.ClientEventType",
+        "azure.models.ItemType": "VoiceLive.ItemType",
+        "azure.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
+        "azure.models.MessageRole": "VoiceLive.MessageRole",
+        "azure.models.ContentPartType": "VoiceLive.ContentPartType",
+        "azure.models.Modality": "VoiceLive.Modality",
+        "azure.models.OpenAIVoiceName": "VoiceLive.OAIVoice",
+        "azure.models.AzureVoiceType": "VoiceLive.AzureVoiceType",
+        "azure.models.PersonalVoiceModels": "VoiceLive.PersonalVoiceModels",
+        "azure.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
+        "azure.models.ToolType": "VoiceLive.ToolType",
+        "azure.models.MCPApprovalType": "VoiceLive.MCPApprovalType",
+        "azure.models.ReasoningEffort": "VoiceLive.ReasoningEffort",
+        "azure.models.InterimResponseConfigType": "VoiceLive.InterimResponseConfigType",
+        "azure.models.InterimResponseTrigger": "VoiceLive.InterimResponseTrigger",
+        "azure.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
+        "azure.models.InputAudioFormat": "VoiceLive.InputAudioFormat",
+        "azure.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
+        "azure.models.EouThresholdLevel": "VoiceLive.EouThresholdLevel",
+        "azure.models.AvatarConfigTypes": "VoiceLive.AvatarConfigTypes",
+        "azure.models.PhotoAvatarBaseModes": "VoiceLive.PhotoAvatarBaseModes",
+        "azure.models.AvatarOutputProtocol": "VoiceLive.AvatarOutputProtocol",
+        "azure.models.AudioTimestampType": "VoiceLive.AudioTimestampType",
+        "azure.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
+        "azure.models.SessionIncludeOption": "VoiceLive.SessionIncludeOption",
+        "azure.models.ResponseStatus": "VoiceLive.ResponseStatus",
+        "azure.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
+        "azure.models.RequestImageContentPartDetail": "VoiceLive.RequestImageContentPartDetail",
+        "azure.models.ServerEventType": "VoiceLive.ServerEventType"
     }
 }
@@ -0,0 +1,15 @@
+# coding=utf-8
+# --------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for license information.
+# Code generated by Microsoft (R) Python Code Generator.
+# Changes may cause incorrect behavior and will be lost if the code is regenerated.
+# --------------------------------------------------------------------------
+
+from typing import TYPE_CHECKING, Union
+
+if TYPE_CHECKING:
+    from .ai.voicelive import models as _models
+Voice = Union[str, "_models.OpenAIVoiceName", "_models.OpenAIVoice", "_models.AzureVoice"]
+InterimResponseConfig = Union["_models.StaticInterimResponseConfig", "_models.LlmInterimResponseConfig"]
+ToolChoice = Union[str, "_models.ToolChoiceLiteral", "_models.ToolChoiceSelection"]