Skip to content

fix: detect and recover from audio input stream stalls (#6075)#6255

Open
C1-BA-B1-F3 wants to merge 1 commit into
livekit:mainfrom
C1-BA-B1-F3:fix/audio-stall-detection
Open

fix: detect and recover from audio input stream stalls (#6075)#6255
C1-BA-B1-F3 wants to merge 1 commit into
livekit:mainfrom
C1-BA-B1-F3:fix/audio-stall-detection

Conversation

@C1-BA-B1-F3

Copy link
Copy Markdown

Summary

Fixes #6075 β€” When a microphone cable becomes loose (partial disconnection), the audio stream silently stops producing frames and the agent never recovers.

Root Cause

_ParticipantInputStream._forward_task uses async for event in stream: which blocks indefinitely when the underlying audio source stops producing frames. No timeout, no error, no recovery.

Fix

Added a watchdog-based stall detection mechanism to _ParticipantInputStream:

  • A concurrent watchdog task monitors time since the last received frame
  • When no frames arrive within stall_timeout (default 10s), the watchdog logs a warning and closes the stalled stream
  • The _forward_task loop then attempts to reopen the stream from the same track (up to 3 retries)
  • If recovery fails or max retries are exhausted, the task exits cleanly

The stall detection is inherited by _ParticipantAudioInputStream through the super()._forward_task() call chain. The stall_timeout is configurable via the constructor.

Changes

livekit-agents/livekit/agents/voice/room_io/_input.py:

  • Added stall_timeout: float = 10.0 parameter to _ParticipantInputStream.__init__
  • Refactored _forward_task into a recovery loop + _read_stream_with_stall_detection helper
  • Watchdog uses asyncio.wait_for with asyncio.shield on a cancellation event
  • Recovery re-creates the stream via _create_stream when the track is still available

tests/test_room_io.py:

  • test_stall_detection_and_recovery β€” verifies stall is detected and stream is reopened
  • test_stall_detection_max_retries_exhausted β€” verifies graceful exit after 3 failed recovery attempts
  • test_no_stall_normal_stream β€” verifies normal streams are unaffected

Testing

All 10 tests pass:

tests/test_room_io.py::test_stall_detection_and_recovery PASSED
tests/test_room_io.py::test_stall_detection_max_retries_exhausted PASSED
tests/test_room_io.py::test_no_stall_normal_stream PASSED

πŸ€– Generated with Claude Code

When a microphone cable becomes loose (partial disconnection), the audio
stream silently stops producing frames. The _forward_task loop blocked
indefinitely on the async iterator with no timeout or recovery mechanism.

Add a watchdog-based stall detection to _ParticipantInputStream that:
- Monitors time since last received frame via a concurrent watchdog task
- Logs a warning when no frames arrive within stall_timeout (default 10s)
- Closes the stalled stream and reopens it from the same track
- Retries up to 3 times before giving up

The stall detection is inherited by _ParticipantAudioInputStream through
the super()._forward_task() call chain.

Fixes livekit#6075

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@C1-BA-B1-F3 C1-BA-B1-F3 requested a review from a team as a code owner June 26, 2026 18:42

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Video streams get stall detection with default 10s timeout and no way to configure it

_ParticipantVideoInputStream.__init__ at livekit-agents/livekit/agents/voice/room_io/_input.py:476-485 does not accept or pass a stall_timeout parameter, so it inherits the default 10-second stall detection from the base class. This may be appropriate for audio but could cause false positives for video streams (e.g., screen share sources that go idle). Worth verifying whether stall detection is desired for video at all, or if it should be audio-only.

(Refers to lines 475-489)

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Comment on lines +232 to +239
async for event in stream:
if not self._attached:
# drop frames if the stream is detached
continue
last_frame_time = time.time()
frame = cast(T, event.frame)
self._process_frame(frame)
await self._data_ch.send(frame)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ”΄ Audio input falsely detected as stalled when listening is temporarily paused, permanently killing the audio stream

Received frames do not reset the stall timer when the stream is paused (continue at livekit-agents/livekit/agents/voice/room_io/_input.py:235 skips the timestamp update at line 236), so the watchdog closes and reopens the stream until all recovery attempts are exhausted and the forwarding loop exits permanently.

Impact: After audio input is disabled for longer than 10 seconds (e.g., during a warm transfer hold), re-enabling audio produces silence because no forwarding task is running.

Mechanism: detach flag blocks timestamp update, triggering false stall recovery

When set_audio_enabled(False) is called (e.g., warm_transfer.py:185), on_detached() sets self._attached = False. Inside _read_stream_with_stall_detection, the async for event in stream loop hits the if not self._attached: continue guard at line 233-235, which correctly drops the frame β€” but also skips last_frame_time = time.time() at line 236.

The watchdog (lines 206-227) compares time.time() - last_frame_time against self._stall_timeout. Since last_frame_time is never refreshed while detached, after stall_timeout seconds (default 10s) the watchdog calls await stream.aclose() at line 226.

The outer retry loop in _forward_task (lines 159-193) then tries to recover by creating a new stream. But the new stream also encounters the same detached state, so its watchdog also fires. After max_recoveries=3 failed attempts, the while loop exits and _forward_task returns.

When the warm transfer completes and _set_io_enabled(True) is called (warm_transfer.py:266), on_attached() sets self._attached = True, but there is no longer a running _forward_task to read frames from the track β€” audio is permanently dead for this session.

Suggested change
async for event in stream:
if not self._attached:
# drop frames if the stream is detached
continue
last_frame_time = time.time()
frame = cast(T, event.frame)
self._process_frame(frame)
await self._data_ch.send(frame)
async for event in stream:
last_frame_time = time.time()
if not self._attached:
# drop frames if the stream is detached
continue
frame = cast(T, event.frame)
self._process_frame(frame)
await self._data_ch.send(frame)
Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Comment on lines +240 to +243
except asyncio.CancelledError:
raise
except Exception:
pass

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Stream errors silently swallowed during stall detection

The except Exception: pass block at livekit-agents/livekit/agents/voice/room_io/_input.py:242-243 catches and silently discards all non-cancellation exceptions from the stream iteration. In the previous code, such exceptions would propagate up through _forward_task and be logged by the @log_exceptions decorator. Now they are invisible. This was likely added to handle exceptions raised when the watchdog closes the stream mid-iteration, but it also swallows legitimate stream errors (e.g., codec failures, protocol errors). Consider at minimum logging the exception at debug/warning level before suppressing it.

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audio stops in LiveKit client after microphone cable looseness, but microphone hardware works fine and client doesn't crash

1 participant