Skip to content

feat(agent_session): allow update_options(stt=, tts=) for runtime STT/TTS swap#6235

Open
SamarthUrs18 wants to merge 24 commits into
livekit:mainfrom
SamarthUrs18:feat/update-options-stt-tts
Open

feat(agent_session): allow update_options(stt=, tts=) for runtime STT/TTS swap#6235
SamarthUrs18 wants to merge 24 commits into
livekit:mainfrom
SamarthUrs18:feat/update-options-stt-tts

Conversation

@SamarthUrs18

Copy link
Copy Markdown

Motivation

There is currently no public API to swap STT/TTS at runtime. The workaround
requires reaching into private internals:

session._activity._audio_recognition.update_stt(new_stt_node)

update_stt() already exists and is already the right mechanism β€” this PR
just exposes it through the existing update_options() surface.

Use case

Mid-call language switching. User says "speak in Kannada" β†’ agent swaps the
Sarvam STT language code and the TTS voice without dropping the call.

Verified end-to-end with livekit-plugins-sarvam against a local LiveKit
server in console mode: the Sarvam STT WebSocket closes and reconnects
with the new language code on update_options(stt=...); TTS takes effect
on the next synthesis call.

A reference agent that uses this as an LLM-callable tool lives in
examples/dev/sarvam_language_swap.py (gitignored under
examples/dev/*). It exposes a switch_language(language_code) tool and
swaps STT/TTS based on the user's request.

Changes

  • AgentSession.update_options gains stt= and tts= kwargs.
  • AgentActivity.update_options forwards them and calls
    audio_recognition.update_stt(...) for STT; TTS takes effect on the next
    synthesis call because tts_node reads activity.tts per call.
  • When the agent was constructed with its own STT/TTS instance, the swap is
    mirrored onto agent._stt / agent._tts so activity.stt /
    activity.tts continue to prefer the agent-bound instance (matching the
    existing resolution order).

The stt/tts parameter names shadow the same-named module imports inside
the method body. Since the body never references the modules after the
parameter list, no aliasing is required; this is documented in a Note
block in the docstring.

Tests

Five new tests in tests/test_agent_session.py:

  • test_update_options_stt_swaps_session_and_rewires_activity β€” verifies
    session._stt is swapped, activity.stt resolves to the new STT, and
    audio_recognition.update_stt is called with the agent's bound
    stt_node.
  • test_update_options_tts_swaps_session_and_agent_resolves_to_new_tts β€”
    TTS swap mirrors the session-level behavior.
  • test_update_options_stt_mirrors_to_agent_when_agent_has_own_stt β€”
    confirms the agent-level mirror when the agent has its own STT.
  • test_update_options_stt_none_disables_pipeline β€” stt=None clears
    the pipeline.
  • test_update_options_stt_unchanged_when_not_provided β€” passing only
    endpointing_opts leaves STT/TTS untouched.

All 53 tests in test_agent_session.py pass; make check is clean.

Backwards compatibility

update_options was already sync (not async), so the new kwargs are
purely additive. All existing callers continue to work unchanged. The
parameter list keeps *-only style and uses NotGivenOr[T | None] to
match the rest of the codebase's "explicit None to disable" convention.

@SamarthUrs18 SamarthUrs18 requested a review from a team as a code owner June 25, 2026 19:56
@CLAassistant

CLAassistant commented Jun 25, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

SamarthUrs18 added a commit to SamarthUrs18/agents that referenced this pull request Jun 25, 2026
The activity method previously only rewired the audio_recognition pipeline,
leaving session._stt / agent._stt untouched. A caller invoking
update_options on the activity directly would see activity.stt resolve to
the OLD instance on the next read, so the per-call STT lookup inside
stt_node kept using the old WebSocket.

Mirror the swap onto self._session._stt / self._agent._stt (and the tts
counterparts) so the method is symmetric regardless of whether it was
invoked via AgentSession.update_options (the normal path) or directly.
Also tightens the tts parameter type from NotGivenOr[object] to
NotGivenOr[tts.TTS | None] for symmetry with stt.

Addresses the asymmetry flagged by Devin Review on PR livekit#6235.
SamarthUrs18 added a commit to SamarthUrs18/agents that referenced this pull request Jun 25, 2026
Two new regression tests for the asymmetry flagged by Devin Review on
PR livekit#6235: when update_options is invoked on the AgentActivity directly
(rather than going through AgentSession), it must still update
session._stt / agent._stt / session._tts / agent._tts, not just rewire
the audio_recognition pipeline.

- test_update_options_directly_on_activity_swaps_state: session has its
  own STT/TTS, agent doesn't β€” verify session, agent, and activity all
  read the new instances after a direct activity.update_options call.
- test_update_options_directly_on_activity_with_agent_bound_stt: agent
  has its own STT β€” verify the agent-level mirror also fires from a
  direct activity call.
@longcw

longcw commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

you can achieve this by session.update_agent with different stt and tts configuration.

Comment thread livekit-agents/livekit/agents/voice/agent_activity.py Outdated
devin-ai-integration[bot]

This comment was marked as resolved.

The activity method previously only rewired the audio_recognition pipeline,
leaving session._stt / agent._stt untouched. A caller invoking
update_options on the activity directly would see activity.stt resolve to
the OLD instance on the next read, so the per-call STT lookup inside
stt_node kept using the old WebSocket.

Mirror the swap onto self._session._stt / self._agent._stt (and the tts
counterparts) so the method is symmetric regardless of whether it was
invoked via AgentSession.update_options (the normal path) or directly.
Also tightens the tts parameter type from NotGivenOr[object] to
NotGivenOr[tts.TTS | None] for symmetry with stt.

Addresses the asymmetry flagged by Devin Review on PR livekit#6235.
Two new regression tests for the asymmetry flagged by Devin Review on
PR livekit#6235: when update_options is invoked on the AgentActivity directly
(rather than going through AgentSession), it must still update
session._stt / agent._stt / session._tts / agent._tts, not just rewire
the audio_recognition pipeline.

- test_update_options_directly_on_activity_swaps_state: session has its
  own STT/TTS, agent doesn't β€” verify session, agent, and activity all
  read the new instances after a direct activity.update_options call.
- test_update_options_directly_on_activity_with_agent_bound_stt: agent
  has its own STT β€” verify the agent-level mirror also fires from a
  direct activity call.
…m new STT/TTS

Long's review note: update_options should not reach across into agent-owned
plugin state. This drops the agent._stt / agent._tts mutations that were
previously added in 1210eb3, so the activity-level swap only writes to
session._stt / session._tts. Agent-bound state stays untouched and continues
to take precedence via the existing activity.stt / activity.tts resolution
order; callers wanting a full redirect can use session.update_agent.

While in there, picked up Devin's prewarm gap on the same path: prewarm()
is now called on the new STT/TTS instances during the swap so providers
that eagerly open connections (Sarvam TTS via its connection pool,
OpenAI TTS, inference.TTS) don't pay first-call handshake latency after
a swap. Base-class prewarm() is a no-op, so this is free for plugins that
don't override it.
Long's review note: update_options should only manage session-owned state,
never reach across into agent-owned plugins. The agent._stt / agent._tts
mirrors that were previously added in 3b34430 are removed here; the
docstring is updated to make the precedence rule explicit.

This mirrors the same change in AgentActivity.update_options so both
entry points stay consistent. Agent-bound state is now never overwritten
by update_options on either path.
…d prewarms new instances

Renames and flips test_update_options_stt_mirrors_to_agent_when_agent_has_own_stt
to test_update_options_does_not_overwrite_agent_stt β€” the new contract is that
agent-bound STT is preserved across update_options (Long's review note).

Updates test_update_options_directly_on_activity_with_agent_bound_stt with the
same flipped assertion.

Adds two new tests for the prewarm behavior introduced in the previous commit:
- test_update_options_prewarms_new_stt_and_tts uses Spy subclasses to assert
  prewarm() is called once on each new instance and not on the old ones.
- test_update_options_prewarm_skipped_when_stt_none asserts passing stt=None
  skips prewarm (would otherwise AttributeError on None).

Both tests follow the manual-wiring pattern used elsewhere in this file to
avoid the leaked _SegmentSynchronizerImpl tasks that session.start() triggers
in the absence of real audio input.
Devin Review on PR livekit#6235: when the agent was constructed with its own
STT (agent._stt is set), the session-level swap was triggering an
unnecessary audio_recognition.update_stt call. The pipeline restart
tears down the in-progress stream, clears the transcript buffer, and
resets interruption state β€” all for no functional change, because
activity.stt still resolves to agent._stt (unchanged by the swap).

Adds a guard: only rewire the pipeline when the agent does NOT own its
STT. When the agent owns its STT, the swap is invisible at the activity
layer (per Long's review note on the agent-bound contract) and a rewire
would just disrupt the live pipeline.

Also documents that the framework does not take ownership of STT/TTS
instances β€” the caller is responsible for closing old instances if
desired (consistent with AgentSession's constructor contract).
…hen agent owns STT

Regression test for the agent-bound guard added in the previous commit.
Spies on audio_recognition.update_stt and asserts it is NOT called when
the agent was constructed with its own STT and a session-level swap is
performed (because the effective STT is unchanged).
@SamarthUrs18 SamarthUrs18 force-pushed the feat/update-options-stt-tts branch from d77f4f2 to a4a4fea Compare June 29, 2026 12:19
devin-ai-integration[bot]

This comment was marked as resolved.

…gent-bound state

When the current agent was constructed with its own stt (e.g. explicitly
Agent(stt=None) or Agent(stt=SomeSTT)), session-level update_options(stt=X)
silently stores X on session._stt but activity.stt continues to resolve to
agent._stt β€” the swap is invisible to the activity.

The previous fix (dropped agent-mirror per Long's review) preserved the
agent-bound contract but introduced a silent failure mode for callers who
expected the swap to take effect. Log a WARNING when the caller passes a
non-None stt and the agent-bound contract shadows it, pointing them at
session.update_agent(...) or constructing the agent without explicit stt.

The no-op case (agent_owns_stt and resolved_stt is None) is left silent:
consistent β€” caller is disabling STT and the agent-bound value still wins.
… swap

Regression tests for the agent-bound-STT warning path:
- test_update_options_does_not_rewire_pipeline_when_agent_owns_stt now
  asserts the warning fires (agent has explicit non-None STT).
- test_update_options_warns_when_agent_stt_is_explicit_none covers the
  Agent(stt=None) edge case explicitly.
- test_update_options_no_warning_when_agent_has_no_stt asserts the common
  path stays silent (no warning when agent has no stt).

Tests use unittest.mock.patch on agent_activity.logger to capture the
warning call without modifying live framework state.
devin-ai-integration[bot]

This comment was marked as resolved.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…t-bound state

Apply the symmetric agent_owns_tts guard to the TTS branch: when the
agent binds its own TTS, the session-level swap is silently shadowed
by the agent-bound value on the next read. Log a warning (mirror of
the STT one) and skip prewarm so providers like Sarvam TTS that
eagerly open WebSockets don't waste a pool connection on an instance
that will never be used.

Also flattens the STT branch nesting (single combined condition
replacing the awkward 'if x: if y: ...' shape) and moves stt.prewarm()
inside the safe else-branch. The earlier remote fix attempted this but
left prewarm outside the if/elif, so a blocked STT swap still opened
provider connections. Closing that gap.

Devin Review on PR livekit#6235: 3 flags addressed:
  1. TTS branch missing agent-owns-tts guard
  2. Prewarm called on new STT even when agent-bound STT shadows it
  3. TTS swap silently fails without warning when agent owns its TTS
…ent-bound state

Adds test_update_options_tts_warns_and_skips_prewarm_when_agent_owns_tts.
Mirrors the existing STT-side regression test but for TTS:

  - asserts activity.tts is unchanged (agent's TTS keeps precedence)
  - asserts prewarm() is NOT called on the unused new instance
  - asserts the warning includes 'tts' and 'no-op'

Closes the test coverage gap from Devin's symmetric-TTS review flag
on PR livekit#6235.
…ent_owns_stt and resolved_stt is not None

Devin Review on PR livekit#6235: pipeline teardown when agent owns STT and
session passes stt=None.

Bug: when agent_owns_stt=True and resolved_stt=None, the old guard
condition 'agent_owns_stt and resolved_stt is not None' evaluated
False, falling through to the elif which called
self._audio_recognition.update_stt(None). That call cancels the
consumer task and tears down the STT pipeline. But activity.stt
still resolves to the agent's _stt (still configured), so speech
recognition would silently stop for the rest of the session.

Fix: split the agent_owns_stt guard from the warning condition.
When the agent owns its STT, skip the rewire unconditionally β€”
the caller can never redirect activity.stt via session-level
swap. Warning only fires when the caller passes a non-None STT
(the 'you tried to set' case), since the stt=None case is
consistent with the resolution order.

Symmetric TTS branch was already correct (no pipeline to tear
down, only prewarm waste).
… pipeline

Regression test for the silent speech-recognition-stop bug caught
by Devin Review on PR livekit#6235.

Confirms that when agent was constructed with its own STT and
session.update_options(stt=None) is called, the pipeline rewire
is skipped entirely (no update_stt(None) call), activity.stt still
resolves to agent._stt, and no warning is logged (the caller
disabled STT, agent wins, that's consistent with the resolution
order β€” warning only fires when caller passes a non-None STT).

Verified failing-on-buggy-code: with the production fix reverted,
the test fails with 'assert [None] == []' (update_stt(None) was
called, proving the bug existed).
devin-ai-integration[bot]

This comment was marked as resolved.

… STT and TTS swaps

Devin Review on PR livekit#6235: 'Metrics and error events silently stop
after swapping the speech-to-text or text-to-speech component at
runtime.'

The bug: _start_session (lines ~906-908, ~912-914) attaches
metrics_collected/error listeners to whichever STT/TTS instance is
current at start time. _stop_session (lines ~1188-1194) detaches
from whichever instance is current at teardown time. update_options
previously rewired the pipeline and updated session._stt/_tts but
never moved the listeners, so after a runtime swap:

  1. The new instance never emits metrics_collected or error events
     (silently dropping metrics and error forwarding).
  2. The old instance retains orphaned listener references (preventing
     GC and routing stale events).
  3. _stop_session calls .off() on the new instance (no-op) instead of
     the old one, leaking the old listeners.

The fix has two parts:

1. AgentSession.update_options no longer writes self._stt / self._tts
   directly. It delegates to AgentActivity.update_options, which now
   reads the prior session._stt/_tts, mutates it, and migrates listeners
   in one method. This ordering is required because the listener
   migration captures the old instance before mutating β€” and if the
   session pre-mutates, the captured reference is already the new
   one (caught via testing β€” the regression test failed without this
   ordering).

2. AgentActivity.update_options adds listener migration on the
   common-path (no-agent-STT) branch for both STT and TTS:
     - capture old before mutation
     - rewire / prewarm as before
     - .off() metrics_collected and error on old (if stt.STT / tts.TTS)
     - .on() the same on new (if it isn't the same object)
   Skipped on the agent_owns_stt / agent_owns_tts branches where the
   effective instance doesn't change.

Verified: 17/17 update_options tests pass (including 3 new regression
tests for listener migration). With the listener migration reverted,
the new tests fail with 'metrics_collected not in new_on'.
…rate on swap

Three new tests for the listener-migration fix:

  - test_update_options_stt_migrates_metrics_collected_and_error_listeners:
    asserts after session.update_options(stt=new_stt), new_stt has both
    listeners attached and old_stt has both detached.
  - test_update_options_tts_migrates_metrics_collected_and_error_listeners:
    mirror of the above for TTS.
  - test_update_options_no_listener_swap_when_agent_owns_stt: ensures
    the agent-bound path does NOT migrate listeners (since the effective
    instance doesn't change).

Each test uses a no-op spy_update_stt on the audio_recognition to
avoid leaking the _stt_pump consumer task under the manually-wired
fixture.
devin-ai-integration[bot]

This comment was marked as resolved.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

… pre-start swap

Two Devin Review fixes on PR livekit#6235:

1. Symmetric TTS agent-owns-tts guard (analogous to the STT guard from
   the prior commit). When the agent is constructed with its own TTS,
   session-level update_options(tts=X) used to:
   - run the listener migration on the resolved_tts=None path, because
     the old guard 'agent_owns_tts and resolved_tts is not None' fell
     through to the common-path else branch.
   - call new_tts.on() again, double-attaching listeners that
     _start_session already attached at startup.
   - never clean them up, leaking a set of listeners on every call.
   Fix: the TTS guard is now 'if agent_owns_tts:' with the warning nested
   inside (mirrors the STT branch structure). The listener migration lives
   in the else branch and is unreachable when the agent owns TTS.

2. AgentSession.update_options now writes self._stt / self._tts even
   when self._activity is None (pre-agent-connect swaps or swaps during
   a handoff gap). Previously the writes were entirely deferred to the
   activity, so calls before start() silently dropped the swap and the
   new STT/TTS would never take effect. The post-call write is harmless
   when activity exists (activity already wrote the same value); it's
   required when activity doesn't exist.

Verified: 19/19 update_options tests pass; two new regression tests
(test_update_options_tts_none_with_agent_bound_tts_does_not_double_attach,
test_update_options_pre_start_writes_session_state) fail without the
fixes and pass with them.
…t swap

Two regression tests for the symmetric TTS double-attach bug and the
pre-start swap fix from the prior commit:

  - test_update_options_tts_none_with_agent_bound_tts_does_not_double_attach:
    asserts that when agent is constructed with its own TTS and the
    caller passes tts=None, NO listener migration runs (neither .on()
    nor .off() on the agent's TTS). _start_session already attached
    listeners at startup, and the previous code would re-attach them,
    double-delivering events and leaking listeners that _stop_session
    never cleans up.

  - test_update_options_pre_start_writes_session_state:
    asserts that session.update_options called BEFORE session.start()
    still writes to session._stt / session._tts, so the next start()
    picks up the swapped values.
devin-ai-integration[bot]

This comment was marked as resolved.

Regression test for Devin Review on PR livekit#6235: 'update_stt(None) leaves
old pipeline and consumer task running'.

The flag was based on an incomplete read of AudioRecognition.update_stt
(lines 708-723 only) which showed the new-pipeline path but missed the
teardown branch at lines 725-736. The actual code DOES tear down the
consumer task and aclose the pipeline when stt=None.

This test exercises the real update_stt(None) path with a fake consumer
task and confirms aio.cancel_and_wait is invoked on it (firing cancel()
and add_done_callback). This documents the contract and guards against
future regressions where the teardown logic might be removed.
devin-ai-integration[bot]

This comment was marked as resolved.

Devin's suggestions (commits 03c0ed1, 6faa3c6) added 'new_stt is not old_stt'
guards but checked new_stt before it was assigned. Moved the assignment
earlier so the guard works correctly.
devin-ai-integration[bot]

This comment was marked as resolved.

@SamarthUrs18 SamarthUrs18 force-pushed the feat/update-options-stt-tts branch from 6faa3c6 to 6cc0534 Compare June 29, 2026 21:07
Devin Review on PR livekit#6235: 'Activity mutates session._stt directly,
AgentSession relies on this side effect.'

The previous code only wrote self._stt/self._tts when _activity was None,
relying on AgentActivity.update_options to mutate session._stt as a side
effect when _activity existed. This implicit coupling meant a refactor of
activity.update_options that removed the session write would silently break
pre-start swaps.

Fix: Session now explicitly writes self._stt/self._tts unconditionally.
Activity still writes session._stt for its internal listener-migration
logic, but the session owns the public state contract.
@SamarthUrs18

Copy link
Copy Markdown
Author

Summary

Add AgentSession.update_options(stt=, tts=) for runtime mid-call STT/TTS swapping without agent handoff.

Designed for Sarvam language switching where the WebSocket must be torn down and reconnected on language change (e.g., en-IN β†’ kn-IN β†’ hi-IN). The agent exposes a switch_language tool the LLM invokes when the user asks to switch languages mid-call.

Motivation

Previously, changing STT/TTS mid-call required session.update_agent() (full agent handoff), which is heavy and loses conversation context. This PR adds a lightweight update_options(stt=, tts=) that swaps only the STT/TTS while keeping the same agent, LLM, and conversation history.


Changes (25 commits)

Area Commits Description
Core API 757c72433, d310ae552 AgentSession.update_options(stt=, tts=) + AgentActivity.update_options mirror with pipeline rewire + prewarm
Long's review 78de33e17, 8cd8a5601 Drop agent mirror β€” session manages only session-owned state
Symmetric TTS guard ce1d93c66, 4377c2e4f, 839d422eb agent_owns_tts guard mirroring STT; skip prewarm when blocked
Pipeline-teardown bug 17cdf38b5, 16f00031a Fix: agent_owns_stt alone (not and resolved_stt is not None) guards against silent pipeline teardown when caller passes stt=None on agent-bound STT
Listener migration d3fa626f3, d3b2e62aa Move metrics_collected/error listeners from old to new STT/TTS on swap; skip on agent-bound paths
TTS double-attach fix 839d422eb, c7d93b94a Fix asymmetric guard: agent_owns_tts alone prevents double-attach when tts=None on agent-owned TTS
Pre-start swap 839d422eb, c7d93b94a update_options works before session.start() β€” session writes state directly when no activity exists
STT None teardown verification 985f9e3de Test exercises real update_stt(None) and confirms aio.cancel_and_wait fires on consumer task
UnboundLocalError fix 6cc0534a5 Devin's new_stt is not old_stt guard checked new_stt before assignment β€” moved assignment earlier
Session/activity decoupling dcf923bf6 Session explicitly writes self._stt/self._tts unconditionally; no longer relies on activity side-effect

Verification

Check Result
Unit tests 69/69 pass (test_agent_session.py)
Type check βœ… make check clean (607 files, mypy + ruff + format)
Regression tests 4 new tests fail on buggy code, pass on fixed
E2E manual test βœ… 3 sequential Sarvam swaps (hi-IN β†’ kn-IN β†’ hi-IN), all WebSockets cleanly torn down/reconnected, swap latency ~2ms

Devin Review Items Addressed

# Issue Status
1 Pipeline restart on agent-bound STT βœ… agent_owns_stt guard
2 No aclose() of old instances βœ… Documented: caller owns lifecycle
3 TTS missing agent-bound check + warning βœ… Symmetric agent_owns_tts
4 Prewarm on agent-bound STT βœ… Moved inside safe branch
5 stt=None silent teardown on agent-bound βœ… Guard = agent_owns_stt alone
6 TTS double-attach on agent_owns_tt + tts=None βœ… Symmetric guard
7 Pre-start swap silently dropped βœ… Session writes when _activity is None
8 update_stt(None) teardown verification βœ… New regression test
9 UnboundLocalError (new_stt before assignment) βœ… Moved assignment before guard
10 Session/activity implicit coupling βœ… Session owns public state explicitly

Breaking Changes

None. Pure additions to public API. Existing callers of update_options without stt=/tts= are unaffected.


Test Plan

# Unit tests
uv run pytest tests/test_agent_session.py -k update_options -v

# Full test suite
uv run pytest tests/test_agent_session.py

# Type check + lint
make check

# Manual E2E (requires .env with LIVEKIT_*, SARVAM_API_KEY, GROQ_API_KEY)
uv run python examples/dev/sarvam_language_swap.py console

# Example Usage

session = AgentSession(stt=sarvam.STT(language="hi-IN"), tts=sarvam.TTS(target_language_code="hi-IN"))
await session.start(agent=MyAgent(), room=ctx.room)

# Later, mid-call:
await session.update_options(
    stt=sarvam.STT(language="kn-IN"),
    tts=sarvam.TTS(target_language_code="kn-IN")
)

@SamarthUrs18 SamarthUrs18 requested a review from longcw June 30, 2026 20:24
@SamarthUrs18

Copy link
Copy Markdown
Author

@longcw β€” all Devin flags resolved, agent mirror dropped per your earlier feedback. Ready for another look when you have cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants