feat(webchat): add voice reply (TTS) support to ChatUI by jayzen33 · Pull Request #8728 · AstrBotDevs/AstrBot

jayzen33 · 2026-06-11T12:06:35Z

为 ChatUI 增加「语音回复」支持：用户可在聊天界面开启 Voice Reply 开关，让 Bot 以 TTS 语音形式回复，并附带文字转写。

Modifications / 改动点

后端

webchat_event.py：record 消息段透传 text 字幕，前端可在语音气泡下方渲染转写文本。
routes/chat.py：
- 新增 enable_tts 请求参数，将客户端偏好持久化为会话 TTS 状态；
- TTS 可用时关闭流式输出，使 result-decorate 阶段能合成语音；TTS 不可用且客户端显式请求语音时，通过 tts_notice SSE 事件返回可本地化的原因码（流式与非流式请求均生效）；
- TTS 判定与管线对齐（trigger_probability 为 0 时视为禁用，不放弃流式）；
- 轮次仅剩 agent_stats/refs 时合并进上一条记录，避免空气泡；无可附着记录时回落保存 metadata-only 记录；
- _save_bot_message 返回 (record, content) 消除重复构建；线程 UMO 构造复用 _build_webchat_umo。
session_llm_manager.py：TTS 会话状态写入在值未变化时跳过，避免每条消息一次冗余 DB 写。

前端

新增 AudioMessagePart.vue 语音播放器：波形可视化、指针/键盘拖动进度（含 a11y）、文字转写、新语音自动播放；波形解码懒加载（首次播放才下载）且全局共享单个 AudioContext，历史会话不会逐条下载音频。
Chat.vue 新增 Voice Reply 开关（localStorage 持久化）；tts_notice 原因码显式映射为本地化 toast（en-US / zh-CN / ru-RU）。
This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

pytest tests/test_webchat_tts_replies.py tests/test_conversation_checkpoint.py — 18 passed
vue-tsc --noEmit — 通过
ruff check / ruff format --check — 通过

Checklist / 检查清单

😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
🤓 I have ensured that no new dependencies are introduced.
😮 My changes do not introduce malicious code.标题：feat(webchat): add voice reply (TTS) support to ChatUI

Summary by Sourcery

Introduce end-to-end voice reply support for webchat sessions, wiring ChatUI preferences through to backend TTS enablement, enriching audio messages with transcripts, and improving audio playback UX.

New Features:

Add configurable voice reply (TTS) support in webchat, including a ChatUI toggle and autoplaying audio replies with transcripts.

Bug Fixes:

Prevent empty bot message bubbles by merging trailing metadata-only turns into the previous record when possible.

Enhancements:

Persist per-session TTS preferences and align TTS enablement checks with backend pipeline behavior, including global, session, and provider availability.
Reuse unified message origin construction helpers and avoid redundant session config writes when TTS state is unchanged.
Replace native audio elements with a custom accessible audio player featuring waveform visualization and keyboard/pointer seeking.

Tests:

Add backend tests covering webchat TTS session IDs, enablement decisions, and provider selection behavior.

Add a per-client "Voice Reply" toggle to ChatUI. When enabled, the chat route persists the preference as the session's TTS state, disables streaming so the result-decorate stage can synthesize audio, and emits a tts_notice SSE event when TTS was requested but cannot run so the client can show a localized hint. Backend: - Pass the Record component's text caption through the webchat queue so the UI can render a transcript under the audio bubble. - Mirror the result-decorate stage's trigger_probability in the route's TTS gate: probability 0 keeps streaming instead of waiting for audio that will never be synthesized. - Resolve TTS (and emit tts_notice) for non-streaming requests too, not only when streaming was requested. - Attach trailing agent_stats/refs to the previously saved record instead of inserting an empty bubble; fall back to a metadata-only record when there is nothing to attach to. - Return the built content from _save_bot_message so flush no longer rebuilds it; skip the session TTS state write when unchanged. - Reuse _build_webchat_umo for thread UMOs. Frontend: - New AudioMessagePart player with waveform, seek (pointer + keyboard), caption, and autoplay for freshly streamed voice replies. Waveform decoding is lazy (first playback) and shares a single AudioContext so loading a history full of voice messages does not fetch every clip. - Map all tts_notice codes to localized toasts with a generic fallback (en-US / zh-CN / ru-RU). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a "Voice Reply" (Text-to-Speech) feature to the chat interface, including a new custom AudioMessagePart.vue component with a waveform visualizer, backend support for TTS resolution and session status management, and localization updates. The review feedback highlights several important issues: in AudioMessagePart.vue, the sharedAudioCtx is incorrectly scoped inside <script setup> preventing it from being shared globally, and there are missing defensive checks for zero-width elements, non-2xx fetch responses, and zero-channel audio buffers. On the backend, potential runtime errors could occur if provider_tts_settings is configured as null, and static type-checking warnings may arise from unsafe access to last_saved_content.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-11T12:09:12Z

+
+// One AudioContext shared by every player instance: browsers cap concurrent
+// contexts, and decoding is the only thing we need it for.
+let sharedAudioCtx: AudioContext | null = null;


In Vue <script setup>, all top-level variables are compiled inside the component's setup() function. This means sharedAudioCtx is instantiated per component instance rather than being shared globally across all player instances.

To truly share a single AudioContext globally (as intended by the comment), sharedAudioCtx should be declared in a separate, standard <script> block or imported from a shared utility module.

gemini-code-assist · 2026-06-11T12:09:12Z

+        tts_settings = self.core_lifecycle.astrbot_config.get(
+            "provider_tts_settings", {}
+        )


If provider_tts_settings is explicitly configured as null (or None in Python) in the configuration file, self.core_lifecycle.astrbot_config.get("provider_tts_settings", {}) will return None instead of the default {}. This will lead to an AttributeError: 'NoneType' object has no attribute 'get' when calling tts_settings.get("enable").

Using or {} instead of the default argument in .get() ensures robust defensive handling.

Suggested change

tts_settings = self.core_lifecycle.astrbot_config.get(

"provider_tts_settings", {}

)

tts_settings = self.core_lifecycle.astrbot_config.get(

"provider_tts_settings"

) or {}

gemini-code-assist · 2026-06-11T12:09:12Z

+                # to the previously saved record instead of inserting an empty bubble.
+                # With no prior record to attach to, fall through and persist a
+                # metadata-only record so stats/refs are not silently dropped.
+                if not message_parts_to_save and last_saved_record is not None:


last_saved_content is typed as dict | None and initialized to None. Although logically it is populated whenever last_saved_record is not None, static type checkers (like mypy or pyright) cannot infer this correlation and will flag last_saved_content.get(...) as unsafe.

Adding an explicit None check for last_saved_content satisfies type checkers and prevents potential runtime errors.

Suggested change

if not message_parts_to_save and last_saved_record is not None:

if not message_parts_to_save and last_saved_record is not None and last_saved_content is not None:

gemini-code-assist · 2026-06-11T12:09:12Z

+  const rect = wave.getBoundingClientRect();
+  const ratio = Math.min(1, Math.max(0, (clientX - rect.left) / rect.width));


If the waveform element is hidden or not fully rendered, rect.width can be 0. This will result in a division by zero, causing ratio and time to become NaN, which can throw errors when assigned to el.currentTime.

Adding a defensive check to return early if rect.width is 0 prevents this issue.

const rect = wave.getBoundingClientRect(); if (rect.width === 0) return; const ratio = Math.min(1, Math.max(0, (clientX - rect.left) / rect.width));

gemini-code-assist · 2026-06-11T12:09:12Z

+    const resp = await fetch(url);
+    const arrayBuffer = await resp.arrayBuffer();


fetch(url) does not throw an error on non-2xx HTTP status codes (such as 404 Not Found or 500 Internal Server Error). It will proceed to call resp.arrayBuffer() and attempt to decode invalid data, which eventually throws in decodeAudioData.

Checking resp.ok first avoids unnecessary processing and provides clearer error handling.

const resp = await fetch(url); if (!resp.ok) throw new Error("Failed to fetch audio: " + resp.status); const arrayBuffer = await resp.arrayBuffer();

gemini-code-assist · 2026-06-11T12:09:12Z

+    const audioBuffer = await ctx.decodeAudioData(arrayBuffer);
+    if (token !== decodeToken) return;
+
+    const channel = audioBuffer.getChannelData(0);


If a corrupted or empty audio file is decoded, audioBuffer might have 0 channels. Calling getChannelData(0) on an empty buffer will throw an IndexSizeError DOMException.

Adding a guard clause to check numberOfChannels ensures robust error handling.

if (audioBuffer.numberOfChannels === 0) return; const channel = audioBuffer.getChannelData(0);

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

In AudioMessagePart.vue, several user-visible strings (e.g. aria-labels Play / Pause / Seek) are hard-coded in English; consider wiring these through the existing i18n layer so they can be localized alongside the rest of the chat UI.
The waveform decoding in AudioMessagePart.vue uses a token to ignore stale results but does not cancel the in-flight fetch / decodeAudioData; if users scrub quickly between messages, consider adding an AbortController or equivalent to avoid unnecessary downloads/decodes for audio that will never be shown.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `AudioMessagePart.vue`, several user-visible strings (e.g. aria-labels `Play` / `Pause` / `Seek`) are hard-coded in English; consider wiring these through the existing i18n layer so they can be localized alongside the rest of the chat UI.
- The waveform decoding in `AudioMessagePart.vue` uses a token to ignore stale results but does not cancel the in-flight `fetch` / `decodeAudioData`; if users scrub quickly between messages, consider adding an `AbortController` or equivalent to avoid unnecessary downloads/decodes for audio that will never be shown.

## Individual Comments

### Comment 1
<location path="dashboard/src/components/chat/message_list_comps/AudioMessagePart.vue" line_range="218-219" />
<code_context>
+// downloads the full file, so it only runs lazily on first playback — a chat
+// history full of voice messages must not fetch every clip on mount.
+let decodeToken = 0;
+let waveformStarted = false;
+function ensureWaveform() {
+  if (waveformStarted) return;
+  waveformStarted = true;
</code_context>
<issue_to_address>
**issue (bug_risk):** waveformStarted is shared across all component instances, so only one audio message ever builds a decoded waveform

Since `waveformStarted` is module-scoped, all `AudioMessagePart` instances share it. After the first call to `ensureWaveform()` sets it to true, later instances never decode their own waveform and stay on the fallback bars. Please make this flag instance-specific (e.g., a `ref(false)` or deriving from whether `bars` is still the fallback), so each clip can lazily decode its own waveform while still sharing the `AudioContext`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-11T12:09:59Z

+let waveformStarted = false;
+function ensureWaveform() {


issue (bug_risk): waveformStarted is shared across all component instances, so only one audio message ever builds a decoded waveform

Since waveformStarted is module-scoped, all AudioMessagePart instances share it. After the first call to ensureWaveform() sets it to true, later instances never decode their own waveform and stay on the fallback bars. Please make this flag instance-specific (e.g., a ref(false) or deriving from whether bars is still the fallback), so each clip can lazily decode its own waveform while still sharing the AudioContext.

- Move the shared AudioContext to a plain <script> block: top-level <script setup> state is per component instance, so the previous "shared" context was actually created once per audio player. - Guard waveform seeking against a zero-width element (NaN currentTime). - Check resp.ok before decoding and skip zero-channel buffers in the waveform builder for clearer failure handling. - Treat an explicit null provider_tts_settings config as empty dict. - Add an explicit None check for last_saved_content alongside last_saved_record to satisfy static type checkers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. feature:chatui The bug / feature is about astrbot's chatui, webchat labels Jun 11, 2026

gemini-code-assist Bot reviewed Jun 11, 2026

View reviewed changes

sourcery-ai Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(webchat): add voice reply (TTS) support to ChatUI#8728

feat(webchat): add voice reply (TTS) support to ChatUI#8728
jayzen33 wants to merge 2 commits into
AstrBotDevs:masterfrom
jayzen33:feat/webui-tts-replies

jayzen33 commented Jun 11, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if not message_parts_to_save and last_saved_record is not None:
	if not message_parts_to_save and last_saved_record is not None and last_saved_content is not None:

		const rect = wave.getBoundingClientRect();
		const ratio = Math.min(1, Math.max(0, (clientX - rect.left) / rect.width));

		const resp = await fetch(url);
		const arrayBuffer = await resp.arrayBuffer();

Uh oh!

Conversation

jayzen33 commented Jun 11, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Modifications / 改动点

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

Summary by Sourcery

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jayzen33 commented Jun 11, 2026 •

edited by sourcery-ai Bot

Loading