fix(web): dispatch ConsumeResult::ToolCalls in WebUI path#7
Conversation
🔴 REJECTED — Governance Violations (3)Violation 1: §3.1 — Missing tests
Required: Add tests covering at minimum:
Violation 2: §1.1 / R11 — File lengthThis PR pushes Required: Split Violation 3: R10 — Missing module doc comment
Required: Add a The bug diagnosis is correct — the |
`stream_consumer::consume_stream` returns `ConsumeResult::ToolCalls(vec)`
for `tool_calls.len() > 1` (`stream_consumer.rs:345`). The platform/*
handlers (`platform_ingest.rs:192`, `platform_exec.rs:268`,
`platform_reinfer.rs:60`) all match this variant. `ws.rs` and `ws_l1.rs`
only matched the singular `ToolCall`, with the plural case falling
through to `_ => {}` — silently dropping the turn.
Symptom: model emits a response with ≥2 tool calls, SSE [DONE] arrives
with `tool_calls=N has_content=true`, then nothing. No observer audit,
no tool execution, no UI text. Black hole. Hits any model that
parallelizes tool calls (Qwen3.5/3.6, Gemma 4, Llama 3.2/3.3, recent
Mistral with tools). The default LlamaCppConfig::default() model
`gemma-4-31B-it-Q4_K_M.gguf` would trip this.
Fix: adds `ConsumeResult::ToolCalls(calls)` arms in both `ws.rs`
(turn-1 dispatch) and `ws_l1.rs` (chain-iteration dispatch). Shared
helper `dispatch_leading_tool_calls` in `ws_l1.rs` executes all but the
last call inline (sending tool_executing / tool_completed UI events and
appending messages), returns the last call so the caller drives the L1
chain on it. Returns `None` only on empty Vec (stream_consumer invariant
violation), surfaced explicitly via tracing::error! and a WS error event.
Three v2 changes addressing the review:
§3.1 — tests for `dispatch_leading_tool_calls`. Refactored generic over
a `DispatchSink` + `ToolExecutor` trait pair scoped to ws_l1.rs only.
Production callers construct `WsDispatchSink` (forwarding to the
existing `send_ws`) + `StateToolExecutor` (forwarding to the existing
`execute_tool_with_state`); unit tests use `CapturingDispatchSink` +
`MockToolExecutor` to assert the full side-effect contract directly.
Tests in new sibling `src/web/ws_l1_tests.rs` cover all three review
cases:
- empty Vec → None; no WS events fired; no messages pushed; executor
uninvoked
- single-element → Some(elem); no leading WS events; no leading
messages pushed; executor uninvoked
- multi-element(3) → Some(third); 4 WS events in exact order
(tool_executing[id1], tool_completed[id1], tool_executing[id2],
tool_completed[id2]); executor invoked exactly 2 times (on id1
then id2); 4 messages pushed (2 assistant_tool_call + 2 tool_result)
This verifies "no leading dispatch" and "dispatches N-1 leading calls"
from the review wording directly.
§1.1 — ws.rs under 500. Extracted the new ConsumeResult::ToolCalls arm
body into pub fn `handle_initial_multi_tool_dispatch` in `ws_l1.rs`.
ws.rs final: 494 lines (was 510 after rebase onto v3.1).
R10 — `//!` module doc comment added to `ws_l1.rs` describing its L1
chain-iteration role.
Post-merge sizes: ws.rs 494, ws_l1.rs 441, ws_l1_tests.rs 154. All
under §1.1 cap.
cargo check + cargo test green.
|
All three resolved. |
22c4807 to
3da2fde
Compare
MettaMazza
left a comment
There was a problem hiding this comment.
✅ APPROVED
All three v1 governance violations resolved:
- §3.1:
dispatch_leading_tool_callsnow has 3 unit tests via trait-pair injection (DispatchSink+ToolExecutor). Tests assert the full side-effect contract: empty Vec → None + zero side effects; single-element → Some(elem) + no leading dispatch; multi(3) → dispatches N-1 leading calls + returns the Nth. Real behavioural contracts, not theatre. - §1.1:
ws.rsreduced to 494 lines via extractinghandle_initial_multi_tool_dispatchtows_l1.rs. - R10:
//!module doc added tows_l1.rs.
Code quality is high. The bug diagnosis (V4-class silent tool-call drop via _ => {}) is correct and the fix is sound. Commit message is forensic-grade.
Advisory (non-blocking): ws.rs at 494/500 — 6 lines of headroom. Next contributor touching this file risks R11. Consider proactive extraction in a follow-up chore PR.
Merging now — this is a critical bug fix affecting the default model.
Pre-reviewed via the #dev-general submission earlier today (the bug-patch markdown). Your verification confirmed the bug is real, the diagnosis is correct, and the fix shape is right. The one revision you flagged —
enforce_context_budgetdoesn't belong in the helper — has been applied (call removed; budget enforcement is intentionally deferred to the chain, mirroringplatform_ingest.rs:192-215).What this fixes
stream_consumer::consume_streamreturnsConsumeResult::ToolCalls(vec)fortool_calls.len() > 1(stream_consumer.rs:345). The platform/* handlers (platform_ingest.rs:192,platform_exec.rs:268,platform_reinfer.rs:60) all match this variant.ws.rs:377andws_l1.rs:203only matched the singularToolCallvariant, with the plural case falling through to_ => {}— silently dropping the turn.Symptom: model emits a response with ≥2 tool calls, SSE [DONE] arrives with
tool_calls=N has_content=true, then nothing. No observer audit, no tool execution, no UI text. Reasoning trace renders (it streamed live) but the response body never lands. Black hole.Hits any model that parallelizes tool calls (Qwen3.5/3.6, Gemma 4, Llama 3.2/3.3, recent Mistral with tools). The default
LlamaCppConfig::default()modelgemma-4-31B-it-Q4_K_M.ggufwould trip this.The fix
Adds a
ConsumeResult::ToolCalls(calls)arm in bothws.rs(turn-1 dispatch) andws_l1.rs(chain-iteration dispatch), sharing a newdispatch_leading_tool_callshelper inws_l1.rs. The helper executes all but the last call inline (sendingtool_executing/tool_completedUI events and appending messages), returns the last call so the caller can drive the L1 chain on it. ReturnsNoneon emptyVec(stream_consumer invariant violation), surfaced explicitly by callers viatracing::error!and a WS error event — no silent fall-through.Governance alignment
tracing::error!with context, WS error events surfacedunwrap, no silent fallback, single concernTest note
ws.rsandws_l1.rshave effectively no test scaffolding (ws.rshas onlytest_ws_module_compiles;ws_l1.rshas none). Building a mockableWS sink + providerinfra is larger than the fix. Per your #dev-general feedback, this gap is acceptable for this PR; happy to bootstrap test infra as a separate follow-up if you want.Cousin sites flagged
While in here,
platform_stream.rs:267andws_stream.rs:115/:169have the sameConsumeResult::ToolCall-only matching pattern. Confirmed by your observer's verification. Per §10.2 these are not in this PR's scope; will file as a separate report after this lands.Independent of the three mesh PRs (#4, #5, #6) — no blocker between them, can land in either order.