feat(webchat): add voice reply (TTS) support to ChatUI#8728
Conversation
Add a per-client "Voice Reply" toggle to ChatUI. When enabled, the chat route persists the preference as the session's TTS state, disables streaming so the result-decorate stage can synthesize audio, and emits a tts_notice SSE event when TTS was requested but cannot run so the client can show a localized hint. Backend: - Pass the Record component's text caption through the webchat queue so the UI can render a transcript under the audio bubble. - Mirror the result-decorate stage's trigger_probability in the route's TTS gate: probability 0 keeps streaming instead of waiting for audio that will never be synthesized. - Resolve TTS (and emit tts_notice) for non-streaming requests too, not only when streaming was requested. - Attach trailing agent_stats/refs to the previously saved record instead of inserting an empty bubble; fall back to a metadata-only record when there is nothing to attach to. - Return the built content from _save_bot_message so flush no longer rebuilds it; skip the session TTS state write when unchanged. - Reuse _build_webchat_umo for thread UMOs. Frontend: - New AudioMessagePart player with waveform, seek (pointer + keyboard), caption, and autoplay for freshly streamed voice replies. Waveform decoding is lazy (first playback) and shares a single AudioContext so loading a history full of voice messages does not fetch every clip. - Map all tts_notice codes to localized toasts with a generic fallback (en-US / zh-CN / ru-RU). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a "Voice Reply" (Text-to-Speech) feature to the chat interface, including a new custom AudioMessagePart.vue component with a waveform visualizer, backend support for TTS resolution and session status management, and localization updates. The review feedback highlights several important issues: in AudioMessagePart.vue, the sharedAudioCtx is incorrectly scoped inside <script setup> preventing it from being shared globally, and there are missing defensive checks for zero-width elements, non-2xx fetch responses, and zero-channel audio buffers. On the backend, potential runtime errors could occur if provider_tts_settings is configured as null, and static type-checking warnings may arise from unsafe access to last_saved_content.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| // One AudioContext shared by every player instance: browsers cap concurrent | ||
| // contexts, and decoding is the only thing we need it for. | ||
| let sharedAudioCtx: AudioContext | null = null; |
There was a problem hiding this comment.
In Vue <script setup>, all top-level variables are compiled inside the component's setup() function. This means sharedAudioCtx is instantiated per component instance rather than being shared globally across all player instances.
To truly share a single AudioContext globally (as intended by the comment), sharedAudioCtx should be declared in a separate, standard <script> block or imported from a shared utility module.
| tts_settings = self.core_lifecycle.astrbot_config.get( | ||
| "provider_tts_settings", {} | ||
| ) |
There was a problem hiding this comment.
If provider_tts_settings is explicitly configured as null (or None in Python) in the configuration file, self.core_lifecycle.astrbot_config.get("provider_tts_settings", {}) will return None instead of the default {}. This will lead to an AttributeError: 'NoneType' object has no attribute 'get' when calling tts_settings.get("enable").
Using or {} instead of the default argument in .get() ensures robust defensive handling.
| tts_settings = self.core_lifecycle.astrbot_config.get( | |
| "provider_tts_settings", {} | |
| ) | |
| tts_settings = self.core_lifecycle.astrbot_config.get( | |
| "provider_tts_settings" | |
| ) or {} |
| # to the previously saved record instead of inserting an empty bubble. | ||
| # With no prior record to attach to, fall through and persist a | ||
| # metadata-only record so stats/refs are not silently dropped. | ||
| if not message_parts_to_save and last_saved_record is not None: |
There was a problem hiding this comment.
last_saved_content is typed as dict | None and initialized to None. Although logically it is populated whenever last_saved_record is not None, static type checkers (like mypy or pyright) cannot infer this correlation and will flag last_saved_content.get(...) as unsafe.
Adding an explicit None check for last_saved_content satisfies type checkers and prevents potential runtime errors.
| if not message_parts_to_save and last_saved_record is not None: | |
| if not message_parts_to_save and last_saved_record is not None and last_saved_content is not None: |
| const rect = wave.getBoundingClientRect(); | ||
| const ratio = Math.min(1, Math.max(0, (clientX - rect.left) / rect.width)); |
There was a problem hiding this comment.
If the waveform element is hidden or not fully rendered, rect.width can be 0. This will result in a division by zero, causing ratio and time to become NaN, which can throw errors when assigned to el.currentTime.
Adding a defensive check to return early if rect.width is 0 prevents this issue.
const rect = wave.getBoundingClientRect();
if (rect.width === 0) return;
const ratio = Math.min(1, Math.max(0, (clientX - rect.left) / rect.width));
| const resp = await fetch(url); | ||
| const arrayBuffer = await resp.arrayBuffer(); |
There was a problem hiding this comment.
fetch(url) does not throw an error on non-2xx HTTP status codes (such as 404 Not Found or 500 Internal Server Error). It will proceed to call resp.arrayBuffer() and attempt to decode invalid data, which eventually throws in decodeAudioData.
Checking resp.ok first avoids unnecessary processing and provides clearer error handling.
const resp = await fetch(url);
if (!resp.ok) throw new Error("Failed to fetch audio: " + resp.status);
const arrayBuffer = await resp.arrayBuffer();
| const audioBuffer = await ctx.decodeAudioData(arrayBuffer); | ||
| if (token !== decodeToken) return; | ||
|
|
||
| const channel = audioBuffer.getChannelData(0); |
There was a problem hiding this comment.
If a corrupted or empty audio file is decoded, audioBuffer might have 0 channels. Calling getChannelData(0) on an empty buffer will throw an IndexSizeError DOMException.
Adding a guard clause to check numberOfChannels ensures robust error handling.
if (audioBuffer.numberOfChannels === 0) return;
const channel = audioBuffer.getChannelData(0);
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
AudioMessagePart.vue, several user-visible strings (e.g. aria-labelsPlay/Pause/Seek) are hard-coded in English; consider wiring these through the existing i18n layer so they can be localized alongside the rest of the chat UI. - The waveform decoding in
AudioMessagePart.vueuses a token to ignore stale results but does not cancel the in-flightfetch/decodeAudioData; if users scrub quickly between messages, consider adding anAbortControlleror equivalent to avoid unnecessary downloads/decodes for audio that will never be shown.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `AudioMessagePart.vue`, several user-visible strings (e.g. aria-labels `Play` / `Pause` / `Seek`) are hard-coded in English; consider wiring these through the existing i18n layer so they can be localized alongside the rest of the chat UI.
- The waveform decoding in `AudioMessagePart.vue` uses a token to ignore stale results but does not cancel the in-flight `fetch` / `decodeAudioData`; if users scrub quickly between messages, consider adding an `AbortController` or equivalent to avoid unnecessary downloads/decodes for audio that will never be shown.
## Individual Comments
### Comment 1
<location path="dashboard/src/components/chat/message_list_comps/AudioMessagePart.vue" line_range="218-219" />
<code_context>
+// downloads the full file, so it only runs lazily on first playback — a chat
+// history full of voice messages must not fetch every clip on mount.
+let decodeToken = 0;
+let waveformStarted = false;
+function ensureWaveform() {
+ if (waveformStarted) return;
+ waveformStarted = true;
</code_context>
<issue_to_address>
**issue (bug_risk):** waveformStarted is shared across all component instances, so only one audio message ever builds a decoded waveform
Since `waveformStarted` is module-scoped, all `AudioMessagePart` instances share it. After the first call to `ensureWaveform()` sets it to true, later instances never decode their own waveform and stay on the fallback bars. Please make this flag instance-specific (e.g., a `ref(false)` or deriving from whether `bars` is still the fallback), so each clip can lazily decode its own waveform while still sharing the `AudioContext`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| let waveformStarted = false; | ||
| function ensureWaveform() { |
There was a problem hiding this comment.
issue (bug_risk): waveformStarted is shared across all component instances, so only one audio message ever builds a decoded waveform
Since waveformStarted is module-scoped, all AudioMessagePart instances share it. After the first call to ensureWaveform() sets it to true, later instances never decode their own waveform and stay on the fallback bars. Please make this flag instance-specific (e.g., a ref(false) or deriving from whether bars is still the fallback), so each clip can lazily decode its own waveform while still sharing the AudioContext.
- Move the shared AudioContext to a plain <script> block: top-level <script setup> state is per component instance, so the previous "shared" context was actually created once per audio player. - Guard waveform seeking against a zero-width element (NaN currentTime). - Check resp.ok before decoding and skip zero-channel buffers in the waveform builder for clearer failure handling. - Treat an explicit null provider_tts_settings config as empty dict. - Add an explicit None check for last_saved_content alongside last_saved_record to satisfy static type checkers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
为 ChatUI 增加「语音回复」支持:用户可在聊天界面开启 Voice Reply 开关,让 Bot 以 TTS 语音形式回复,并附带文字转写。
Modifications / 改动点
后端
webchat_event.py:record 消息段透传text字幕,前端可在语音气泡下方渲染转写文本。routes/chat.py:enable_tts请求参数,将客户端偏好持久化为会话 TTS 状态;tts_noticeSSE 事件返回可本地化的原因码(流式与非流式请求均生效);trigger_probability为 0 时视为禁用,不放弃流式);_save_bot_message返回(record, content)消除重复构建;线程 UMO 构造复用_build_webchat_umo。session_llm_manager.py:TTS 会话状态写入在值未变化时跳过,避免每条消息一次冗余 DB 写。前端
新增
AudioMessagePart.vue语音播放器:波形可视化、指针/键盘拖动进度(含 a11y)、文字转写、新语音自动播放;波形解码懒加载(首次播放才下载)且全局共享单个 AudioContext,历史会话不会逐条下载音频。Chat.vue新增 Voice Reply 开关(localStorage 持久化);tts_notice原因码显式映射为本地化 toast(en-US / zh-CN / ru-RU)。This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
pytest tests/test_webchat_tts_replies.py tests/test_conversation_checkpoint.py— 18 passedvue-tsc --noEmit— 通过ruff check/ruff format --check— 通过Checklist / 检查清单
Summary by Sourcery
Introduce end-to-end voice reply support for webchat sessions, wiring ChatUI preferences through to backend TTS enablement, enriching audio messages with transcripts, and improving audio playback UX.
New Features:
Bug Fixes:
Enhancements:
Tests: