Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -1498,6 +1498,8 @@
"FOUNDRY",
"genai",
"GENAI",
"HDOMNI",
"SSML",
"Unpooled",
"viseme",
"VISEME",
Expand Down
2 changes: 1 addition & 1 deletion eng/versioning/version_client.txt
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ com.azure:azure-ai-textanalytics-perf;1.0.0-beta.1;1.0.0-beta.1
com.azure:azure-ai-translation-text;1.1.8;2.0.0-beta.2
com.azure:azure-ai-translation-document;1.0.7;1.1.0-beta.1
com.azure:azure-ai-vision-face;1.0.0-beta.2;1.0.0-beta.3
com.azure:azure-ai-voicelive;1.0.0-beta.6;1.0.0-beta.7
com.azure:azure-ai-voicelive;1.0.0-beta.6;1.0.0
com.azure:azure-analytics-defender-easm;1.0.0-beta.1;1.0.0-beta.2
com.azure:azure-analytics-purview-datamap;1.0.0-beta.2;1.0.0-beta.3
com.azure:azure-analytics-onlineexperimentation;1.0.0-beta.1;1.0.0-beta.2
Expand Down
30 changes: 28 additions & 2 deletions sdk/voicelive/azure-ai-voicelive/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,41 @@
# Release History

## 1.0.0-beta.7 (Unreleased)
## 1.0.0 (Unreleased)

This is the first General Availability (GA) release of the Azure VoiceLive client library for Java.

### Features Added

- **Avatar voice synchronization** for video avatars:
- New `AzureVoiceType.AVATAR_VOICE_SYNC` and `AzureAvatarVoiceSyncVoice` class
- New server events `ServerEventSessionAvatarSwitchToSpeaking` / `ServerEventSessionAvatarSwitchToIdle`
- New `ServerEventResponseVideoDelta` for streaming avatar video frames
- New `ClientEventOutputAudioBufferClear` (`output_audio_buffer.clear`) and `ServerEventOutputAudioBufferCleared` (`output_audio_buffer.cleared`) for clearing the avatar output audio buffer
- **Web search and file search tool calls**:
- New `ItemType.WEB_SEARCH_CALL`, `ItemType.FILE_SEARCH_CALL`
- New `ResponseWebSearchCallItem` (with `ResponseWebSearchCallItemStatus`) and `ResponseFileSearchCallItem` (with `ResponseFileSearchCallItemStatus`, plus `FileSearchResult` results)
- New lifecycle server events: `ServerEventResponseWebSearchCall{Searching,InProgress,Completed}` and `ServerEventResponseFileSearchCall{Searching,InProgress,Completed}`
- **Transcription enhancements**:
- New transcription models on `AudioInputTranscriptionOptionsModel`: `GPT_4O_TRANSCRIBE_DIARIZE`, `MAI_TRANSCRIBE_1`
- New `TranscriptionPhrase` and `TranscriptionWord` types with timing/confidence information
- `SessionUpdateConversationItemInputAudioTranscriptionCompleted` now exposes `getLogprobs()` and `getPhrases()`
- New `ServerEventResponseAudioTranscriptAnnotationAdded` event
- **Session include options and metadata**:
- New `SessionIncludeOption` expandable enum for opting into additional response payloads (e.g. logprobs, phrases, file-search results)
- `VoiceLiveSessionOptions` and `VoiceLiveSessionResponse` now expose `include` (`List<SessionIncludeOption>`) and `metadata` (`Map<String,String>`, up to 16 entries)
- **Personal voice models**: added `PersonalVoiceModels.DRAGON_HDOMNI_LATEST_NEURAL` and `MAI_VOICE_1`
- **Reasoning token usage**: `OutputTokenDetails.getReasoningTokens()` exposes reasoning token counts
- **Interim response on response.create**: `ResponseCreateParams.setInterimResponse(BinaryData)` lets callers attach interim response config to a single response request
- Significantly improved Javadoc for `ServerVadTurnDetection`, `AzureCustomVoice`, `AzurePersonalVoice`, `AzureStandardVoice`, `AzureSemanticVadTurnDetection*`, and other model types

### Breaking Changes

### Bugs Fixed
- Removed `PersonalVoiceModels.PHOENIX_V2NEURAL` (no longer supported by the service). Use `PHOENIX_LATEST_NEURAL` or one of the new `DRAGON_*` / `MAI_VOICE_1` models instead.

### Other Changes

- Updated default service API version to track the latest TypeSpec spec.

## 1.0.0-beta.6 (2026-05-01)

### Features Added
Expand Down
4 changes: 2 additions & 2 deletions sdk/voicelive/azure-ai-voicelive/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Use the Azure VoiceLive client library for Java to:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-voicelive</artifactId>
<version>1.0.0-beta.2</version>
<version>1.0.0</version>
</dependency>
```
[//]: # ({x-version-update-end})
Expand Down Expand Up @@ -347,7 +347,7 @@ VoiceLiveSessionOptions options2 = new VoiceLiveSessionOptions()
.setVoice(BinaryData.fromObject(new AzureCustomVoice("myCustomVoice", "myEndpointId")));
// Azure Personal Voice - requires speaker profile ID and model
// Models: DRAGON_LATEST_NEURAL, PHOENIX_LATEST_NEURAL, PHOENIX_V2NEURAL
// Models: DRAGON_LATEST_NEURAL, DRAGON_HDOMNI_LATEST_NEURAL, PHOENIX_LATEST_NEURAL, MAI_VOICE_1
VoiceLiveSessionOptions options3 = new VoiceLiveSessionOptions()
.setVoice(BinaryData.fromObject(
new AzurePersonalVoice("speakerProfileId", PersonalVoiceModels.PHOENIX_LATEST_NEURAL)));
Expand Down
2 changes: 1 addition & 1 deletion sdk/voicelive/azure-ai-voicelive/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Code generated by Microsoft (R) TypeSpec Code Generator.

<groupId>com.azure</groupId>
<artifactId>azure-ai-voicelive</artifactId>
<version>1.0.0-beta.7</version> <!-- {x-version-update;com.azure:azure-ai-voicelive;current} -->
<version>1.0.0</version> <!-- {x-version-update;com.azure:azure-ai-voicelive;current} -->
<packaging>jar</packaging>

<name>Microsoft Azure SDK for VoiceLive</name>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ public final class AudioInputTranscriptionOptions implements JsonSerializable<Au
/*
* The transcription model to use. Supported values:
* 'whisper-1', 'gpt-4o-transcribe', 'gpt-4o-mini-transcribe',
Comment thread
xitzhang marked this conversation as resolved.
* 'azure-speech'.
* 'mai-transcribe-1', 'azure-speech'.
*/
@Generated
private final AudioInputTranscriptionOptionsModel model;
Expand Down Expand Up @@ -59,7 +59,7 @@ public AudioInputTranscriptionOptions(AudioInputTranscriptionOptionsModel model)
/**
* Get the model property: The transcription model to use. Supported values:
* 'whisper-1', 'gpt-4o-transcribe', 'gpt-4o-mini-transcribe',
* 'azure-speech'.
* 'mai-transcribe-1', 'azure-speech'.
*
* @return the model value.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,4 +68,17 @@ public static AudioInputTranscriptionOptionsModel fromString(String name) {
public static Collection<AudioInputTranscriptionOptionsModel> values() {
return values(AudioInputTranscriptionOptionsModel.class);
}

/**
* Static value gpt-4o-transcribe-diarize for AudioInputTranscriptionOptionsModel.
*/
@Generated
public static final AudioInputTranscriptionOptionsModel GPT_4O_TRANSCRIBE_DIARIZE
= fromString("gpt-4o-transcribe-diarize");

/**
* Static value mai-transcribe-1 for AudioInputTranscriptionOptionsModel.
*/
@Generated
public static final AudioInputTranscriptionOptionsModel MAI_TRANSCRIBE_1 = fromString("mai-transcribe-1");
}
Loading