Skip to content

fix(eot): tighten eot cancellation by speech acitivity#6274

Open
chenghao-mou wants to merge 1 commit into
mainfrom
chenghao/fix/tighten-eot-cancel-by-speech
Open

fix(eot): tighten eot cancellation by speech acitivity#6274
chenghao-mou wants to merge 1 commit into
mainfrom
chenghao/fix/tighten-eot-cancel-by-speech

Conversation

@chenghao-mou

Copy link
Copy Markdown
Member

Previously, the cancellation can be triggered by inference done event where background noise makes it flaky. This PR drops that path so it is now only based on STT/VAD SOS.

Previously, the cancellation can be triggered by inference done event where background noise makes it flaky. This PR drops that path so it is now only based on STT/VAD SOS.
@chenghao-mou chenghao-mou requested a review from a team as a code owner June 29, 2026 21:19

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment on lines 1585 to 1593
if self._end_of_turn_task is not None:
# TODO(theomonnom): disallow cancel if the extra sleep is done
self._end_of_turn_task.cancel()

task_func = (
_bounce_eou_task_with_speaking_guard
if isinstance(self._turn_detector, _StreamingTurnDetector)
else _bounce_eou_task
)
# copy the last_speaking_time before awaiting (the value can change)
self._end_of_turn_task = asyncio.create_task(
task_func(
_bounce_eou_task(
self._last_speaking_time,
self._last_final_transcript_time,
self._user_turn_start,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Wider cancellation window when user resumes speech during endpointing

The removed _bounce_eou_task_with_speaking_guard raced _user_speaking_event.wait() against the bounce task. That event was set on INFERENCE_DONE with raw_accumulated_speech > 0 (audio_recognition.py:1246), which fires ~200-250ms before START_OF_SPEECH (gated by VAD's min_speech_duration). The new code relies solely on SOS cancelling _end_of_turn_task (audio_recognition.py:1236-1237), creating a timing window where the bounce could complete before SOS fires.

Concrete scenario: EOS at T=0 β†’ bounce starts with min_delay=0.3s β†’ user resumes at T=0.1 β†’ INFERENCE_DONE detects speech at T=0.15 (old code cancels here) β†’ bounce completes at Tβ‰ˆ0.3 β†’ SOS fires at Tβ‰ˆ0.35 (too late in new code).

However, this appears intentional: (1) text-based turn detectors always used raw _bounce_eou_task with this same gap, so the PR makes behavior consistent; (2) the old _user_speaking_event had robustness issues with sub-threshold spikes getting stuck, which was the regression the old TestSubThresholdSpeakingSpike tests covered; (3) all _run_eou_detection call sites already ensure _speaking=False, making the entry guard redundant.

(Refers to lines 1585-1595)

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant