feat(agent_session): allow update_options(stt=, tts=) for runtime STT/TTS swap#6235
feat(agent_session): allow update_options(stt=, tts=) for runtime STT/TTS swap#6235SamarthUrs18 wants to merge 24 commits into
Conversation
The activity method previously only rewired the audio_recognition pipeline, leaving session._stt / agent._stt untouched. A caller invoking update_options on the activity directly would see activity.stt resolve to the OLD instance on the next read, so the per-call STT lookup inside stt_node kept using the old WebSocket. Mirror the swap onto self._session._stt / self._agent._stt (and the tts counterparts) so the method is symmetric regardless of whether it was invoked via AgentSession.update_options (the normal path) or directly. Also tightens the tts parameter type from NotGivenOr[object] to NotGivenOr[tts.TTS | None] for symmetry with stt. Addresses the asymmetry flagged by Devin Review on PR livekit#6235.
Two new regression tests for the asymmetry flagged by Devin Review on PR livekit#6235: when update_options is invoked on the AgentActivity directly (rather than going through AgentSession), it must still update session._stt / agent._stt / session._tts / agent._tts, not just rewire the audio_recognition pipeline. - test_update_options_directly_on_activity_swaps_state: session has its own STT/TTS, agent doesn't β verify session, agent, and activity all read the new instances after a direct activity.update_options call. - test_update_options_directly_on_activity_with_agent_bound_stt: agent has its own STT β verify the agent-level mirror also fires from a direct activity call.
|
you can achieve this by |
The activity method previously only rewired the audio_recognition pipeline, leaving session._stt / agent._stt untouched. A caller invoking update_options on the activity directly would see activity.stt resolve to the OLD instance on the next read, so the per-call STT lookup inside stt_node kept using the old WebSocket. Mirror the swap onto self._session._stt / self._agent._stt (and the tts counterparts) so the method is symmetric regardless of whether it was invoked via AgentSession.update_options (the normal path) or directly. Also tightens the tts parameter type from NotGivenOr[object] to NotGivenOr[tts.TTS | None] for symmetry with stt. Addresses the asymmetry flagged by Devin Review on PR livekit#6235.
Two new regression tests for the asymmetry flagged by Devin Review on PR livekit#6235: when update_options is invoked on the AgentActivity directly (rather than going through AgentSession), it must still update session._stt / agent._stt / session._tts / agent._tts, not just rewire the audio_recognition pipeline. - test_update_options_directly_on_activity_swaps_state: session has its own STT/TTS, agent doesn't β verify session, agent, and activity all read the new instances after a direct activity.update_options call. - test_update_options_directly_on_activity_with_agent_bound_stt: agent has its own STT β verify the agent-level mirror also fires from a direct activity call.
β¦m new STT/TTS Long's review note: update_options should not reach across into agent-owned plugin state. This drops the agent._stt / agent._tts mutations that were previously added in 1210eb3, so the activity-level swap only writes to session._stt / session._tts. Agent-bound state stays untouched and continues to take precedence via the existing activity.stt / activity.tts resolution order; callers wanting a full redirect can use session.update_agent. While in there, picked up Devin's prewarm gap on the same path: prewarm() is now called on the new STT/TTS instances during the swap so providers that eagerly open connections (Sarvam TTS via its connection pool, OpenAI TTS, inference.TTS) don't pay first-call handshake latency after a swap. Base-class prewarm() is a no-op, so this is free for plugins that don't override it.
Long's review note: update_options should only manage session-owned state, never reach across into agent-owned plugins. The agent._stt / agent._tts mirrors that were previously added in 3b34430 are removed here; the docstring is updated to make the precedence rule explicit. This mirrors the same change in AgentActivity.update_options so both entry points stay consistent. Agent-bound state is now never overwritten by update_options on either path.
β¦d prewarms new instances Renames and flips test_update_options_stt_mirrors_to_agent_when_agent_has_own_stt to test_update_options_does_not_overwrite_agent_stt β the new contract is that agent-bound STT is preserved across update_options (Long's review note). Updates test_update_options_directly_on_activity_with_agent_bound_stt with the same flipped assertion. Adds two new tests for the prewarm behavior introduced in the previous commit: - test_update_options_prewarms_new_stt_and_tts uses Spy subclasses to assert prewarm() is called once on each new instance and not on the old ones. - test_update_options_prewarm_skipped_when_stt_none asserts passing stt=None skips prewarm (would otherwise AttributeError on None). Both tests follow the manual-wiring pattern used elsewhere in this file to avoid the leaked _SegmentSynchronizerImpl tasks that session.start() triggers in the absence of real audio input.
Devin Review on PR livekit#6235: when the agent was constructed with its own STT (agent._stt is set), the session-level swap was triggering an unnecessary audio_recognition.update_stt call. The pipeline restart tears down the in-progress stream, clears the transcript buffer, and resets interruption state β all for no functional change, because activity.stt still resolves to agent._stt (unchanged by the swap). Adds a guard: only rewire the pipeline when the agent does NOT own its STT. When the agent owns its STT, the swap is invisible at the activity layer (per Long's review note on the agent-bound contract) and a rewire would just disrupt the live pipeline. Also documents that the framework does not take ownership of STT/TTS instances β the caller is responsible for closing old instances if desired (consistent with AgentSession's constructor contract).
β¦hen agent owns STT Regression test for the agent-bound guard added in the previous commit. Spies on audio_recognition.update_stt and asserts it is NOT called when the agent was constructed with its own STT and a session-level swap is performed (because the effective STT is unchanged).
d77f4f2 to
a4a4fea
Compare
β¦gent-bound state When the current agent was constructed with its own stt (e.g. explicitly Agent(stt=None) or Agent(stt=SomeSTT)), session-level update_options(stt=X) silently stores X on session._stt but activity.stt continues to resolve to agent._stt β the swap is invisible to the activity. The previous fix (dropped agent-mirror per Long's review) preserved the agent-bound contract but introduced a silent failure mode for callers who expected the swap to take effect. Log a WARNING when the caller passes a non-None stt and the agent-bound contract shadows it, pointing them at session.update_agent(...) or constructing the agent without explicit stt. The no-op case (agent_owns_stt and resolved_stt is None) is left silent: consistent β caller is disabling STT and the agent-bound value still wins.
β¦ swap Regression tests for the agent-bound-STT warning path: - test_update_options_does_not_rewire_pipeline_when_agent_owns_stt now asserts the warning fires (agent has explicit non-None STT). - test_update_options_warns_when_agent_stt_is_explicit_none covers the Agent(stt=None) edge case explicitly. - test_update_options_no_warning_when_agent_has_no_stt asserts the common path stays silent (no warning when agent has no stt). Tests use unittest.mock.patch on agent_activity.logger to capture the warning call without modifying live framework state.
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
β¦t-bound state Apply the symmetric agent_owns_tts guard to the TTS branch: when the agent binds its own TTS, the session-level swap is silently shadowed by the agent-bound value on the next read. Log a warning (mirror of the STT one) and skip prewarm so providers like Sarvam TTS that eagerly open WebSockets don't waste a pool connection on an instance that will never be used. Also flattens the STT branch nesting (single combined condition replacing the awkward 'if x: if y: ...' shape) and moves stt.prewarm() inside the safe else-branch. The earlier remote fix attempted this but left prewarm outside the if/elif, so a blocked STT swap still opened provider connections. Closing that gap. Devin Review on PR livekit#6235: 3 flags addressed: 1. TTS branch missing agent-owns-tts guard 2. Prewarm called on new STT even when agent-bound STT shadows it 3. TTS swap silently fails without warning when agent owns its TTS
β¦ent-bound state Adds test_update_options_tts_warns_and_skips_prewarm_when_agent_owns_tts. Mirrors the existing STT-side regression test but for TTS: - asserts activity.tts is unchanged (agent's TTS keeps precedence) - asserts prewarm() is NOT called on the unused new instance - asserts the warning includes 'tts' and 'no-op' Closes the test coverage gap from Devin's symmetric-TTS review flag on PR livekit#6235.
β¦ent_owns_stt and resolved_stt is not None Devin Review on PR livekit#6235: pipeline teardown when agent owns STT and session passes stt=None. Bug: when agent_owns_stt=True and resolved_stt=None, the old guard condition 'agent_owns_stt and resolved_stt is not None' evaluated False, falling through to the elif which called self._audio_recognition.update_stt(None). That call cancels the consumer task and tears down the STT pipeline. But activity.stt still resolves to the agent's _stt (still configured), so speech recognition would silently stop for the rest of the session. Fix: split the agent_owns_stt guard from the warning condition. When the agent owns its STT, skip the rewire unconditionally β the caller can never redirect activity.stt via session-level swap. Warning only fires when the caller passes a non-None STT (the 'you tried to set' case), since the stt=None case is consistent with the resolution order. Symmetric TTS branch was already correct (no pipeline to tear down, only prewarm waste).
β¦ pipeline Regression test for the silent speech-recognition-stop bug caught by Devin Review on PR livekit#6235. Confirms that when agent was constructed with its own STT and session.update_options(stt=None) is called, the pipeline rewire is skipped entirely (no update_stt(None) call), activity.stt still resolves to agent._stt, and no warning is logged (the caller disabled STT, agent wins, that's consistent with the resolution order β warning only fires when caller passes a non-None STT). Verified failing-on-buggy-code: with the production fix reverted, the test fails with 'assert [None] == []' (update_stt(None) was called, proving the bug existed).
β¦ STT and TTS swaps Devin Review on PR livekit#6235: 'Metrics and error events silently stop after swapping the speech-to-text or text-to-speech component at runtime.' The bug: _start_session (lines ~906-908, ~912-914) attaches metrics_collected/error listeners to whichever STT/TTS instance is current at start time. _stop_session (lines ~1188-1194) detaches from whichever instance is current at teardown time. update_options previously rewired the pipeline and updated session._stt/_tts but never moved the listeners, so after a runtime swap: 1. The new instance never emits metrics_collected or error events (silently dropping metrics and error forwarding). 2. The old instance retains orphaned listener references (preventing GC and routing stale events). 3. _stop_session calls .off() on the new instance (no-op) instead of the old one, leaking the old listeners. The fix has two parts: 1. AgentSession.update_options no longer writes self._stt / self._tts directly. It delegates to AgentActivity.update_options, which now reads the prior session._stt/_tts, mutates it, and migrates listeners in one method. This ordering is required because the listener migration captures the old instance before mutating β and if the session pre-mutates, the captured reference is already the new one (caught via testing β the regression test failed without this ordering). 2. AgentActivity.update_options adds listener migration on the common-path (no-agent-STT) branch for both STT and TTS: - capture old before mutation - rewire / prewarm as before - .off() metrics_collected and error on old (if stt.STT / tts.TTS) - .on() the same on new (if it isn't the same object) Skipped on the agent_owns_stt / agent_owns_tts branches where the effective instance doesn't change. Verified: 17/17 update_options tests pass (including 3 new regression tests for listener migration). With the listener migration reverted, the new tests fail with 'metrics_collected not in new_on'.
β¦rate on swap
Three new tests for the listener-migration fix:
- test_update_options_stt_migrates_metrics_collected_and_error_listeners:
asserts after session.update_options(stt=new_stt), new_stt has both
listeners attached and old_stt has both detached.
- test_update_options_tts_migrates_metrics_collected_and_error_listeners:
mirror of the above for TTS.
- test_update_options_no_listener_swap_when_agent_owns_stt: ensures
the agent-bound path does NOT migrate listeners (since the effective
instance doesn't change).
Each test uses a no-op spy_update_stt on the audio_recognition to
avoid leaking the _stt_pump consumer task under the manually-wired
fixture.
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
β¦ pre-start swap Two Devin Review fixes on PR livekit#6235: 1. Symmetric TTS agent-owns-tts guard (analogous to the STT guard from the prior commit). When the agent is constructed with its own TTS, session-level update_options(tts=X) used to: - run the listener migration on the resolved_tts=None path, because the old guard 'agent_owns_tts and resolved_tts is not None' fell through to the common-path else branch. - call new_tts.on() again, double-attaching listeners that _start_session already attached at startup. - never clean them up, leaking a set of listeners on every call. Fix: the TTS guard is now 'if agent_owns_tts:' with the warning nested inside (mirrors the STT branch structure). The listener migration lives in the else branch and is unreachable when the agent owns TTS. 2. AgentSession.update_options now writes self._stt / self._tts even when self._activity is None (pre-agent-connect swaps or swaps during a handoff gap). Previously the writes were entirely deferred to the activity, so calls before start() silently dropped the swap and the new STT/TTS would never take effect. The post-call write is harmless when activity exists (activity already wrote the same value); it's required when activity doesn't exist. Verified: 19/19 update_options tests pass; two new regression tests (test_update_options_tts_none_with_agent_bound_tts_does_not_double_attach, test_update_options_pre_start_writes_session_state) fail without the fixes and pass with them.
β¦t swap
Two regression tests for the symmetric TTS double-attach bug and the
pre-start swap fix from the prior commit:
- test_update_options_tts_none_with_agent_bound_tts_does_not_double_attach:
asserts that when agent is constructed with its own TTS and the
caller passes tts=None, NO listener migration runs (neither .on()
nor .off() on the agent's TTS). _start_session already attached
listeners at startup, and the previous code would re-attach them,
double-delivering events and leaking listeners that _stop_session
never cleans up.
- test_update_options_pre_start_writes_session_state:
asserts that session.update_options called BEFORE session.start()
still writes to session._stt / session._tts, so the next start()
picks up the swapped values.
Regression test for Devin Review on PR livekit#6235: 'update_stt(None) leaves old pipeline and consumer task running'. The flag was based on an incomplete read of AudioRecognition.update_stt (lines 708-723 only) which showed the new-pipeline path but missed the teardown branch at lines 725-736. The actual code DOES tear down the consumer task and aclose the pipeline when stt=None. This test exercises the real update_stt(None) path with a fake consumer task and confirms aio.cancel_and_wait is invoked on it (firing cancel() and add_done_callback). This documents the contract and guards against future regressions where the teardown logic might be removed.
6faa3c6 to
6cc0534
Compare
Devin Review on PR livekit#6235: 'Activity mutates session._stt directly, AgentSession relies on this side effect.' The previous code only wrote self._stt/self._tts when _activity was None, relying on AgentActivity.update_options to mutate session._stt as a side effect when _activity existed. This implicit coupling meant a refactor of activity.update_options that removed the session write would silently break pre-start swaps. Fix: Session now explicitly writes self._stt/self._tts unconditionally. Activity still writes session._stt for its internal listener-migration logic, but the session owns the public state contract.
SummaryAdd Designed for Sarvam language switching where the WebSocket must be torn down and reconnected on language change (e.g., MotivationPreviously, changing STT/TTS mid-call required Changes (25 commits)
Verification
Devin Review Items Addressed
Breaking ChangesNone. Pure additions to public API. Existing callers of Test Plan# Unit tests
uv run pytest tests/test_agent_session.py -k update_options -v
# Full test suite
uv run pytest tests/test_agent_session.py
# Type check + lint
make check
# Manual E2E (requires .env with LIVEKIT_*, SARVAM_API_KEY, GROQ_API_KEY)
uv run python examples/dev/sarvam_language_swap.py console
# Example Usage
session = AgentSession(stt=sarvam.STT(language="hi-IN"), tts=sarvam.TTS(target_language_code="hi-IN"))
await session.start(agent=MyAgent(), room=ctx.room)
# Later, mid-call:
await session.update_options(
stt=sarvam.STT(language="kn-IN"),
tts=sarvam.TTS(target_language_code="kn-IN")
) |
|
@longcw β all Devin flags resolved, agent mirror dropped per your earlier feedback. Ready for another look when you have cycles. |
Motivation
There is currently no public API to swap STT/TTS at runtime. The workaround
requires reaching into private internals:
update_stt()already exists and is already the right mechanism β this PRjust exposes it through the existing
update_options()surface.Use case
Mid-call language switching. User says "speak in Kannada" β agent swaps the
Sarvam STT language code and the TTS voice without dropping the call.
Verified end-to-end with
livekit-plugins-sarvamagainst a local LiveKitserver in console mode: the Sarvam STT WebSocket closes and reconnects
with the new language code on
update_options(stt=...); TTS takes effecton the next synthesis call.
A reference agent that uses this as an LLM-callable tool lives in
examples/dev/sarvam_language_swap.py(gitignored underexamples/dev/*). It exposes aswitch_language(language_code)tool andswaps STT/TTS based on the user's request.
Changes
AgentSession.update_optionsgainsstt=andtts=kwargs.AgentActivity.update_optionsforwards them and callsaudio_recognition.update_stt(...)for STT; TTS takes effect on the nextsynthesis call because
tts_nodereadsactivity.ttsper call.mirrored onto
agent._stt/agent._ttssoactivity.stt/activity.ttscontinue to prefer the agent-bound instance (matching theexisting resolution order).
The
stt/ttsparameter names shadow the same-named module imports insidethe method body. Since the body never references the modules after the
parameter list, no aliasing is required; this is documented in a Note
block in the docstring.
Tests
Five new tests in
tests/test_agent_session.py:test_update_options_stt_swaps_session_and_rewires_activityβ verifiessession._sttis swapped,activity.sttresolves to the new STT, andaudio_recognition.update_sttis called with the agent's boundstt_node.test_update_options_tts_swaps_session_and_agent_resolves_to_new_ttsβTTS swap mirrors the session-level behavior.
test_update_options_stt_mirrors_to_agent_when_agent_has_own_sttβconfirms the agent-level mirror when the agent has its own STT.
test_update_options_stt_none_disables_pipelineβstt=Noneclearsthe pipeline.
test_update_options_stt_unchanged_when_not_providedβ passing onlyendpointing_optsleaves STT/TTS untouched.All 53 tests in
test_agent_session.pypass;make checkis clean.Backwards compatibility
update_optionswas already sync (not async), so the new kwargs arepurely additive. All existing callers continue to work unchanged. The
parameter list keeps
*-only style and usesNotGivenOr[T | None]tomatch the rest of the codebase's "explicit None to disable" convention.