feat(agent_session): allow update_options(stt=, tts=) for runtime STT/TTS swap by SamarthUrs18 · Pull Request #6235 · livekit/agents

SamarthUrs18 · 2026-06-25T19:56:28Z

Motivation

There is currently no public API to swap STT/TTS at runtime. The workaround
requires reaching into private internals:

session._activity._audio_recognition.update_stt(new_stt_node)

update_stt() already exists and is already the right mechanism — this PR
just exposes it through the existing update_options() surface.

Use case

Mid-call language switching. User says "speak in Kannada" → agent swaps the
Sarvam STT language code and the TTS voice without dropping the call.

Verified end-to-end with livekit-plugins-sarvam against a local LiveKit
server in console mode: the Sarvam STT WebSocket closes and reconnects
with the new language code on update_options(stt=...); TTS takes effect
on the next synthesis call.

A reference agent that uses this as an LLM-callable tool lives in
examples/dev/sarvam_language_swap.py (gitignored under
examples/dev/*). It exposes a switch_language(language_code) tool and
swaps STT/TTS based on the user's request.

Changes

AgentSession.update_options gains stt= and tts= kwargs.
AgentActivity.update_options forwards them and calls
audio_recognition.update_stt(...) for STT; TTS takes effect on the next
synthesis call because tts_node reads activity.tts per call.
When the agent was constructed with its own STT/TTS instance, the swap is
mirrored onto agent._stt / agent._tts so activity.stt /
activity.tts continue to prefer the agent-bound instance (matching the
existing resolution order).

The stt/tts parameter names shadow the same-named module imports inside
the method body. Since the body never references the modules after the
parameter list, no aliasing is required; this is documented in a Note
block in the docstring.

Tests

Five new tests in tests/test_agent_session.py:

test_update_options_stt_swaps_session_and_rewires_activity — verifies
session._stt is swapped, activity.stt resolves to the new STT, and
audio_recognition.update_stt is called with the agent's bound
stt_node.
test_update_options_tts_swaps_session_and_agent_resolves_to_new_tts —
TTS swap mirrors the session-level behavior.
test_update_options_stt_mirrors_to_agent_when_agent_has_own_stt —
confirms the agent-level mirror when the agent has its own STT.
test_update_options_stt_none_disables_pipeline — stt=None clears
the pipeline.
test_update_options_stt_unchanged_when_not_provided — passing only
endpointing_opts leaves STT/TTS untouched.

All 53 tests in test_agent_session.py pass; make check is clean.

Backwards compatibility

update_options was already sync (not async), so the new kwargs are
purely additive. All existing callers continue to work unchanged. The
parameter list keeps *-only style and uses NotGivenOr[T | None] to
match the rest of the codebase's "explicit None to disable" convention.

CLAassistant · 2026-06-25T19:56:39Z

All committers have signed the CLA.

The activity method previously only rewired the audio_recognition pipeline, leaving session._stt / agent._stt untouched. A caller invoking update_options on the activity directly would see activity.stt resolve to the OLD instance on the next read, so the per-call STT lookup inside stt_node kept using the old WebSocket. Mirror the swap onto self._session._stt / self._agent._stt (and the tts counterparts) so the method is symmetric regardless of whether it was invoked via AgentSession.update_options (the normal path) or directly. Also tightens the tts parameter type from NotGivenOr[object] to NotGivenOr[tts.TTS | None] for symmetry with stt. Addresses the asymmetry flagged by Devin Review on PR livekit#6235.

Two new regression tests for the asymmetry flagged by Devin Review on PR livekit#6235: when update_options is invoked on the AgentActivity directly (rather than going through AgentSession), it must still update session._stt / agent._stt / session._tts / agent._tts, not just rewire the audio_recognition pipeline. - test_update_options_directly_on_activity_swaps_state: session has its own STT/TTS, agent doesn't — verify session, agent, and activity all read the new instances after a direct activity.update_options call. - test_update_options_directly_on_activity_with_agent_bound_stt: agent has its own STT — verify the agent-level mirror also fires from a direct activity call.

longcw · 2026-06-29T06:03:18Z

you can achieve this by session.update_agent with different stt and tts configuration.

…/TTS swap

The activity method previously only rewired the audio_recognition pipeline, leaving session._stt / agent._stt untouched. A caller invoking update_options on the activity directly would see activity.stt resolve to the OLD instance on the next read, so the per-call STT lookup inside stt_node kept using the old WebSocket. Mirror the swap onto self._session._stt / self._agent._stt (and the tts counterparts) so the method is symmetric regardless of whether it was invoked via AgentSession.update_options (the normal path) or directly. Also tightens the tts parameter type from NotGivenOr[object] to NotGivenOr[tts.TTS | None] for symmetry with stt. Addresses the asymmetry flagged by Devin Review on PR livekit#6235.

Two new regression tests for the asymmetry flagged by Devin Review on PR livekit#6235: when update_options is invoked on the AgentActivity directly (rather than going through AgentSession), it must still update session._stt / agent._stt / session._tts / agent._tts, not just rewire the audio_recognition pipeline. - test_update_options_directly_on_activity_swaps_state: session has its own STT/TTS, agent doesn't — verify session, agent, and activity all read the new instances after a direct activity.update_options call. - test_update_options_directly_on_activity_with_agent_bound_stt: agent has its own STT — verify the agent-level mirror also fires from a direct activity call.

…m new STT/TTS Long's review note: update_options should not reach across into agent-owned plugin state. This drops the agent._stt / agent._tts mutations that were previously added in 1210eb3, so the activity-level swap only writes to session._stt / session._tts. Agent-bound state stays untouched and continues to take precedence via the existing activity.stt / activity.tts resolution order; callers wanting a full redirect can use session.update_agent. While in there, picked up Devin's prewarm gap on the same path: prewarm() is now called on the new STT/TTS instances during the swap so providers that eagerly open connections (Sarvam TTS via its connection pool, OpenAI TTS, inference.TTS) don't pay first-call handshake latency after a swap. Base-class prewarm() is a no-op, so this is free for plugins that don't override it.

Long's review note: update_options should only manage session-owned state, never reach across into agent-owned plugins. The agent._stt / agent._tts mirrors that were previously added in 3b34430 are removed here; the docstring is updated to make the precedence rule explicit. This mirrors the same change in AgentActivity.update_options so both entry points stay consistent. Agent-bound state is now never overwritten by update_options on either path.

…d prewarms new instances Renames and flips test_update_options_stt_mirrors_to_agent_when_agent_has_own_stt to test_update_options_does_not_overwrite_agent_stt — the new contract is that agent-bound STT is preserved across update_options (Long's review note). Updates test_update_options_directly_on_activity_with_agent_bound_stt with the same flipped assertion. Adds two new tests for the prewarm behavior introduced in the previous commit: - test_update_options_prewarms_new_stt_and_tts uses Spy subclasses to assert prewarm() is called once on each new instance and not on the old ones. - test_update_options_prewarm_skipped_when_stt_none asserts passing stt=None skips prewarm (would otherwise AttributeError on None). Both tests follow the manual-wiring pattern used elsewhere in this file to avoid the leaked _SegmentSynchronizerImpl tasks that session.start() triggers in the absence of real audio input.

Devin Review on PR livekit#6235: when the agent was constructed with its own STT (agent._stt is set), the session-level swap was triggering an unnecessary audio_recognition.update_stt call. The pipeline restart tears down the in-progress stream, clears the transcript buffer, and resets interruption state — all for no functional change, because activity.stt still resolves to agent._stt (unchanged by the swap). Adds a guard: only rewire the pipeline when the agent does NOT own its STT. When the agent owns its STT, the swap is invisible at the activity layer (per Long's review note on the agent-bound contract) and a rewire would just disrupt the live pipeline. Also documents that the framework does not take ownership of STT/TTS instances — the caller is responsible for closing old instances if desired (consistent with AgentSession's constructor contract).

…hen agent owns STT Regression test for the agent-bound guard added in the previous commit. Spies on audio_recognition.update_stt and asserts it is NOT called when the agent was constructed with its own STT and a session-level swap is performed (because the effective STT is unchanged).

…gent-bound state When the current agent was constructed with its own stt (e.g. explicitly Agent(stt=None) or Agent(stt=SomeSTT)), session-level update_options(stt=X) silently stores X on session._stt but activity.stt continues to resolve to agent._stt — the swap is invisible to the activity. The previous fix (dropped agent-mirror per Long's review) preserved the agent-bound contract but introduced a silent failure mode for callers who expected the swap to take effect. Log a WARNING when the caller passes a non-None stt and the agent-bound contract shadows it, pointing them at session.update_agent(...) or constructing the agent without explicit stt. The no-op case (agent_owns_stt and resolved_stt is None) is left silent: consistent — caller is disabling STT and the agent-bound value still wins.

… swap Regression tests for the agent-bound-STT warning path: - test_update_options_does_not_rewire_pipeline_when_agent_owns_stt now asserts the warning fires (agent has explicit non-None STT). - test_update_options_warns_when_agent_stt_is_explicit_none covers the Agent(stt=None) edge case explicitly. - test_update_options_no_warning_when_agent_has_no_stt asserts the common path stays silent (no warning when agent has no stt). Tests use unittest.mock.patch on agent_activity.logger to capture the warning call without modifying live framework state.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

…t-bound state Apply the symmetric agent_owns_tts guard to the TTS branch: when the agent binds its own TTS, the session-level swap is silently shadowed by the agent-bound value on the next read. Log a warning (mirror of the STT one) and skip prewarm so providers like Sarvam TTS that eagerly open WebSockets don't waste a pool connection on an instance that will never be used. Also flattens the STT branch nesting (single combined condition replacing the awkward 'if x: if y: ...' shape) and moves stt.prewarm() inside the safe else-branch. The earlier remote fix attempted this but left prewarm outside the if/elif, so a blocked STT swap still opened provider connections. Closing that gap. Devin Review on PR livekit#6235: 3 flags addressed: 1. TTS branch missing agent-owns-tts guard 2. Prewarm called on new STT even when agent-bound STT shadows it 3. TTS swap silently fails without warning when agent owns its TTS

…ent-bound state Adds test_update_options_tts_warns_and_skips_prewarm_when_agent_owns_tts. Mirrors the existing STT-side regression test but for TTS: - asserts activity.tts is unchanged (agent's TTS keeps precedence) - asserts prewarm() is NOT called on the unused new instance - asserts the warning includes 'tts' and 'no-op' Closes the test coverage gap from Devin's symmetric-TTS review flag on PR livekit#6235.

…ent_owns_stt and resolved_stt is not None Devin Review on PR livekit#6235: pipeline teardown when agent owns STT and session passes stt=None. Bug: when agent_owns_stt=True and resolved_stt=None, the old guard condition 'agent_owns_stt and resolved_stt is not None' evaluated False, falling through to the elif which called self._audio_recognition.update_stt(None). That call cancels the consumer task and tears down the STT pipeline. But activity.stt still resolves to the agent's _stt (still configured), so speech recognition would silently stop for the rest of the session. Fix: split the agent_owns_stt guard from the warning condition. When the agent owns its STT, skip the rewire unconditionally — the caller can never redirect activity.stt via session-level swap. Warning only fires when the caller passes a non-None STT (the 'you tried to set' case), since the stt=None case is consistent with the resolution order. Symmetric TTS branch was already correct (no pipeline to tear down, only prewarm waste).

… pipeline Regression test for the silent speech-recognition-stop bug caught by Devin Review on PR livekit#6235. Confirms that when agent was constructed with its own STT and session.update_options(stt=None) is called, the pipeline rewire is skipped entirely (no update_stt(None) call), activity.stt still resolves to agent._stt, and no warning is logged (the caller disabled STT, agent wins, that's consistent with the resolution order — warning only fires when caller passes a non-None STT). Verified failing-on-buggy-code: with the production fix reverted, the test fails with 'assert [None] == []' (update_stt(None) was called, proving the bug existed).

… STT and TTS swaps Devin Review on PR livekit#6235: 'Metrics and error events silently stop after swapping the speech-to-text or text-to-speech component at runtime.' The bug: _start_session (lines ~906-908, ~912-914) attaches metrics_collected/error listeners to whichever STT/TTS instance is current at start time. _stop_session (lines ~1188-1194) detaches from whichever instance is current at teardown time. update_options previously rewired the pipeline and updated session._stt/_tts but never moved the listeners, so after a runtime swap: 1. The new instance never emits metrics_collected or error events (silently dropping metrics and error forwarding). 2. The old instance retains orphaned listener references (preventing GC and routing stale events). 3. _stop_session calls .off() on the new instance (no-op) instead of the old one, leaking the old listeners. The fix has two parts: 1. AgentSession.update_options no longer writes self._stt / self._tts directly. It delegates to AgentActivity.update_options, which now reads the prior session._stt/_tts, mutates it, and migrates listeners in one method. This ordering is required because the listener migration captures the old instance before mutating — and if the session pre-mutates, the captured reference is already the new one (caught via testing — the regression test failed without this ordering). 2. AgentActivity.update_options adds listener migration on the common-path (no-agent-STT) branch for both STT and TTS: - capture old before mutation - rewire / prewarm as before - .off() metrics_collected and error on old (if stt.STT / tts.TTS) - .on() the same on new (if it isn't the same object) Skipped on the agent_owns_stt / agent_owns_tts branches where the effective instance doesn't change. Verified: 17/17 update_options tests pass (including 3 new regression tests for listener migration). With the listener migration reverted, the new tests fail with 'metrics_collected not in new_on'.

…rate on swap Three new tests for the listener-migration fix: - test_update_options_stt_migrates_metrics_collected_and_error_listeners: asserts after session.update_options(stt=new_stt), new_stt has both listeners attached and old_stt has both detached. - test_update_options_tts_migrates_metrics_collected_and_error_listeners: mirror of the above for TTS. - test_update_options_no_listener_swap_when_agent_owns_stt: ensures the agent-bound path does NOT migrate listeners (since the effective instance doesn't change). Each test uses a no-op spy_update_stt on the audio_recognition to avoid leaking the _stt_pump consumer task under the manually-wired fixture.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

… pre-start swap Two Devin Review fixes on PR livekit#6235: 1. Symmetric TTS agent-owns-tts guard (analogous to the STT guard from the prior commit). When the agent is constructed with its own TTS, session-level update_options(tts=X) used to: - run the listener migration on the resolved_tts=None path, because the old guard 'agent_owns_tts and resolved_tts is not None' fell through to the common-path else branch. - call new_tts.on() again, double-attaching listeners that _start_session already attached at startup. - never clean them up, leaking a set of listeners on every call. Fix: the TTS guard is now 'if agent_owns_tts:' with the warning nested inside (mirrors the STT branch structure). The listener migration lives in the else branch and is unreachable when the agent owns TTS. 2. AgentSession.update_options now writes self._stt / self._tts even when self._activity is None (pre-agent-connect swaps or swaps during a handoff gap). Previously the writes were entirely deferred to the activity, so calls before start() silently dropped the swap and the new STT/TTS would never take effect. The post-call write is harmless when activity exists (activity already wrote the same value); it's required when activity doesn't exist. Verified: 19/19 update_options tests pass; two new regression tests (test_update_options_tts_none_with_agent_bound_tts_does_not_double_attach, test_update_options_pre_start_writes_session_state) fail without the fixes and pass with them.

…t swap Two regression tests for the symmetric TTS double-attach bug and the pre-start swap fix from the prior commit: - test_update_options_tts_none_with_agent_bound_tts_does_not_double_attach: asserts that when agent is constructed with its own TTS and the caller passes tts=None, NO listener migration runs (neither .on() nor .off() on the agent's TTS). _start_session already attached listeners at startup, and the previous code would re-attach them, double-delivering events and leaking listeners that _stop_session never cleans up. - test_update_options_pre_start_writes_session_state: asserts that session.update_options called BEFORE session.start() still writes to session._stt / session._tts, so the next start() picks up the swapped values.

Regression test for Devin Review on PR livekit#6235: 'update_stt(None) leaves old pipeline and consumer task running'. The flag was based on an incomplete read of AudioRecognition.update_stt (lines 708-723 only) which showed the new-pipeline path but missed the teardown branch at lines 725-736. The actual code DOES tear down the consumer task and aclose the pipeline when stt=None. This test exercises the real update_stt(None) path with a fake consumer task and confirms aio.cancel_and_wait is invoked on it (firing cancel() and add_done_callback). This documents the contract and guards against future regressions where the teardown logic might be removed.

Devin's suggestions (commits 03c0ed1, 6faa3c6) added 'new_stt is not old_stt' guards but checked new_stt before it was assigned. Moved the assignment earlier so the guard works correctly.

Devin Review on PR livekit#6235: 'Activity mutates session._stt directly, AgentSession relies on this side effect.' The previous code only wrote self._stt/self._tts when _activity was None, relying on AgentActivity.update_options to mutate session._stt as a side effect when _activity existed. This implicit coupling meant a refactor of activity.update_options that removed the session write would silently break pre-start swaps. Fix: Session now explicitly writes self._stt/self._tts unconditionally. Activity still writes session._stt for its internal listener-migration logic, but the session owns the public state contract.

SamarthUrs18 · 2026-06-30T20:18:19Z

Summary

Add AgentSession.update_options(stt=, tts=) for runtime mid-call STT/TTS swapping without agent handoff.

Designed for Sarvam language switching where the WebSocket must be torn down and reconnected on language change (e.g., en-IN → kn-IN → hi-IN). The agent exposes a switch_language tool the LLM invokes when the user asks to switch languages mid-call.

Motivation

Previously, changing STT/TTS mid-call required session.update_agent() (full agent handoff), which is heavy and loses conversation context. This PR adds a lightweight update_options(stt=, tts=) that swaps only the STT/TTS while keeping the same agent, LLM, and conversation history.

Changes (25 commits)

Area	Commits	Description
Core API	`757c72433`, `d310ae552`	`AgentSession.update_options(stt=, tts=)` + `AgentActivity.update_options` mirror with pipeline rewire + prewarm
Long's review	`78de33e17`, `8cd8a5601`	Drop agent mirror — session manages only session-owned state
Symmetric TTS guard	`ce1d93c66`, `4377c2e4f`, `839d422eb`	`agent_owns_tts` guard mirroring STT; skip prewarm when blocked
Pipeline-teardown bug	`17cdf38b5`, `16f00031a`	Fix: `agent_owns_stt` alone (not `and resolved_stt is not None`) guards against silent pipeline teardown when caller passes `stt=None` on agent-bound STT
Listener migration	`d3fa626f3`, `d3b2e62aa`	Move `metrics_collected`/`error` listeners from old to new STT/TTS on swap; skip on agent-bound paths
TTS double-attach fix	`839d422eb`, `c7d93b94a`	Fix asymmetric guard: `agent_owns_tts` alone prevents double-attach when `tts=None` on agent-owned TTS
Pre-start swap	`839d422eb`, `c7d93b94a`	`update_options` works before `session.start()` — session writes state directly when no activity exists
STT `None` teardown verification	`985f9e3de`	Test exercises real `update_stt(None)` and confirms `aio.cancel_and_wait` fires on consumer task
UnboundLocalError fix	`6cc0534a5`	Devin's `new_stt is not old_stt` guard checked `new_stt` before assignment — moved assignment earlier
Session/activity decoupling	`dcf923bf6`	Session explicitly writes `self._stt`/`self._tts` unconditionally; no longer relies on activity side-effect

Verification

Check	Result
Unit tests	69/69 pass (`test_agent_session.py`)
Type check	✅ `make check` clean (607 files, mypy + ruff + format)
Regression tests	4 new tests fail on buggy code, pass on fixed
E2E manual test	✅ 3 sequential Sarvam swaps (`hi-IN → kn-IN → hi-IN`), all WebSockets cleanly torn down/reconnected, swap latency ~2ms

Devin Review Items Addressed

#	Issue	Status
1	Pipeline restart on agent-bound STT	✅ `agent_owns_stt` guard
2	No `aclose()` of old instances	✅ Documented: caller owns lifecycle
3	TTS missing agent-bound check + warning	✅ Symmetric `agent_owns_tts`
4	Prewarm on agent-bound STT	✅ Moved inside safe branch
5	`stt=None` silent teardown on agent-bound	✅ Guard = `agent_owns_stt` alone
6	TTS double-attach on `agent_owns_tt + tts=None`	✅ Symmetric guard
7	Pre-start swap silently dropped	✅ Session writes when `_activity is None`
8	`update_stt(None)` teardown verification	✅ New regression test
9	UnboundLocalError (`new_stt` before assignment)	✅ Moved assignment before guard
10	Session/activity implicit coupling	✅ Session owns public state explicitly

Breaking Changes

None. Pure additions to public API. Existing callers of update_options without stt=/tts= are unaffected.

Test Plan

# Unit tests
uv run pytest tests/test_agent_session.py -k update_options -v

# Full test suite
uv run pytest tests/test_agent_session.py

# Type check + lint
make check

# Manual E2E (requires .env with LIVEKIT_*, SARVAM_API_KEY, GROQ_API_KEY)
uv run python examples/dev/sarvam_language_swap.py console

# Example Usage

session = AgentSession(stt=sarvam.STT(language="hi-IN"), tts=sarvam.TTS(target_language_code="hi-IN"))
await session.start(agent=MyAgent(), room=ctx.room)

# Later, mid-call:
await session.update_options(
    stt=sarvam.STT(language="kn-IN"),
    tts=sarvam.TTS(target_language_code="kn-IN")
)

SamarthUrs18 · 2026-06-30T20:26:47Z

@longcw — all Devin flags resolved, agent mirror dropped per your earlier feedback. Ready for another look when you have cycles.

SamarthUrs18 requested a review from a team as a code owner June 25, 2026 19:56