Skip to content

fix(openai): skip realtime truncate when no audio was played#6249

Open
C1-BA-B1-F3 wants to merge 1 commit into
livekit:mainfrom
C1-BA-B1-F3:fix/realtime-truncate-interrupt
Open

fix(openai): skip realtime truncate when no audio was played#6249
C1-BA-B1-F3 wants to merge 1 commit into
livekit:mainfrom
C1-BA-B1-F3:fix/realtime-truncate-interrupt

Conversation

@C1-BA-B1-F3

Copy link
Copy Markdown

Problem

When a RealtimeModel is interrupted in the narrow window after the model has declared an audio response but before the first audio frame is played, the plugin sends a conversation.item.truncate with audio_end_ms=0. The OpenAI Realtime API rejects it:

APIError('Only model output audio messages can be truncated')

Event sequence triggering the bug:

  1. response.created
  2. response.output_item.added β€” message_id assigned
  3. response.content_part.added β€” modalities resolve to ["audio", "text"]
  4. (user interrupts here β€” VAD fires) ← response.audio.delta has NOT happened yet
  5. Interruption path calls truncate(..., audio_end_ms=0) β†’ server error

Fix

In RealtimeSession.truncate(), when audio_end_ms == 0:

  • If the server already holds the (unplayed) item β†’ delete it (prevents dangling items in remote chat ctx)
  • If the item was never committed to the server β†’ no-op (nothing to truncate or delete)

When audio_end_ms > 0, behavior is unchanged (truncate event sent as before).

Tests

Three new tests in test_openai_realtime_chat_ctx.py:

  1. test_truncate_deletes_item_when_no_audio_played β€” item on server β†’ delete event
  2. test_truncate_noop_when_no_audio_and_item_not_on_server β€” item not on server β†’ no-op
  3. test_truncate_sends_event_when_audio_played β€” audio_end_ms > 0 β†’ truncate event (regression check)

Fixes #6157

When a RealtimeModel is interrupted before any audio frame has been
committed (audio_end_ms == 0), sending a conversation.item.truncate
causes the OpenAI Realtime API to reject with:
  'Only model output audio messages can be truncated'

Fix: when audio_end_ms is 0, check whether the server already holds the
item. If so, delete it (so it doesn't dangle in the remote chat ctx);
otherwise do nothing.

Fixes livekit#6157

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@C1-BA-B1-F3 C1-BA-B1-F3 requested a review from a team as a code owner June 26, 2026 14:36

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment on lines +1627 to +1642
else:
# No audio was committed yet β€” truncating with audio_end_ms=0
# causes the Realtime API to reject with "Only model output
# audio messages can be truncated". If the server already holds
# the (unplayed) item, delete it so it doesn't dangle.
remote_ids = {
item.id for item in self._remote_chat_ctx.to_chat_ctx().items
}
if message_id in remote_ids:
self.send_event(
ConversationItemDeleteEvent(
type="conversation.item.delete",
item_id=message_id,
event_id=utils.shortuuid("chat_ctx_delete_"),
)
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Potential stale remote context if update_chat_ctx races with the delete

After truncate() fires a delete event (line 1636-1642), the item remains in _remote_chat_ctx until the server confirms with conversation.item.deleted. If update_chat_ctx (agent_activity.py:3654) runs before that confirmation arrives, its diff computation will still see the item in the remote context. This means it won't try to recreate it β€” but the server has already deleted it, leaving the remote and local contexts out of sync. In practice this is unlikely to cause user-visible issues because: (1) the update_chat_ctx call at line 3652 requires any_skipped to be true, which is a different condition from the audio_end_ms==0 partial-play scenario; and (2) even if it does occur, the next full update_chat_ctx call would reconcile the difference. Still, this is a potential timing concern worth being aware of.

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RealtimeModel: "Only model output audio messages can be truncated" when interrupted before first audio frame

1 participant