Skip to content

DeviceCommand seam: one module for server→firmware MCP calls (kills 3 audit bug classes)#160

Open
BrettKinny wants to merge 2 commits into
mainfrom
feat/device-command-seam
Open

DeviceCommand seam: one module for server→firmware MCP calls (kills 3 audit bug classes)#160
BrettKinny wants to merge 2 commits into
mainfrom
feat/device-command-seam

Conversation

@BrettKinny

Copy link
Copy Markdown
Owner

What

Candidate 3 from the 2026-06-11 pre-release architecture review. Twelve call sites — the five /xiaozhi/admin/* MCP handlers in http_server.py and seven helpers in receiveAudioHandle.py — each hand-rolled the same MCP JSON-RPC envelope, so every shared defect was twelve defects. All server→firmware tool calls now go through one module, custom-providers/xiaozhi-patches/device_command.py, mounted at core/utils/device_command.py (same pattern as textUtils.py) so both patch surfaces import it.

Audit bug classes killed

Finding (AUDIT-REPORT 2026-06-06) Fix
MCP request ids int(time.time()*1000) % 0x7FFFFFFF collide within a millisecond (×12 sites) Monotonic per-connection counter, collision-free for the connection's lifetime; a test guards the old pattern from resurfacing in either file
Admin handlers call conn.websocket.send() concurrently with each other / play-asset's frame stream (websockets forbids interleaved sends) Per-connection asyncio.Lock; every send routed through the seam is mutually exclusive — including play-asset's opus frames and tts lifecycle frames, so a concurrent head-turn can't interleave mid-frame
_dotty_inject_text reads conn.headers without the or {} guard used elsewhere All eight handlers now share _dotty_resolve_conn() / _dotty_conn_device_id() — one copy of the lookup instead of eight

Honest scope notes:

  • Upstream xiaozhi's own chat-path writer does not take the lock (that would mean patching upstream send sites) — admin-vs-admin and admin-vs-play-asset interleaving is fixed; admin-vs-chat is narrowed to the seam where it can be finished.
  • Reply correlation is deliberately not implemented — device-side MCP replies stay fire-and-forget. call_tool returns the request id so a correlation layer can be added behind this interface without touching the twelve callers again (the audit's "device-side failures invisible" finding remains open, now one-module-sized).

Handlers shrink from ~50 lines to ~15; net −280 lines of copy-paste. Both compose files gain the bind mount (volume parity kept — the all-in-one's header rule).

Test plan

  • ruff check . clean; full suite 320 passing (107 root/pi_voice + 213 behaviour), combined coverage 69.1% (gate 56%); both compose files YAML-validated
  • New tests/test_device_command.py (8 cases): monotonic + per-connection ids under a 1000-call burst, wire-shape parity with the old hand-rolled envelope, send serialization under concurrency (max-overlap == 1), http_server handler wiring (ids 1,2 on the wire), 503/fallback conn resolution, and the no-% 0x7FFFFFFF-anywhere guard
  • The two existing http_server harnesses (test_admin_auth_middleware.py, test_play_asset_route.py) now inject the real seam module instead of a MagicMock
  • Bench-pending before release sign-off: live-device smoke — LED pips on reconnect, sound-turner head-turn, take-photo, play-asset purr

Refs the AUDIT-REPORT 2026-06-06 findings (ms-truncated ids / concurrent sends / inject-text headers nit). Independent of #158 and #159 except for a trivial CHANGELOG-adjacency conflict with whichever merges first (keep all entries).

AI assistance: drafted by Claude Fable 5 (analysis, implementation, tests, and this PR body); human review pending per AI_TRANSPARENCY.md.

🤖 Generated with Claude Code

BrettKinny and others added 2 commits June 11, 2026 19:43
…ge labels, domain docs)

Adds docs/agents/{issue-tracker,triage-labels,domain}.md and an
## Agent skills section in CLAUDE.md so the Matt Pocock engineering
skills know: issues live on GitHub (BrettKinny/dotty-stackchan via gh),
the five triage roles map to default label strings (orthogonal to the
existing status:/area: labels), and domain docs are single-context.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Twelve call sites — five /xiaozhi/admin/* handlers in http_server.py
and seven helpers in receiveAudioHandle.py — each hand-rolled the same
MCP JSON-RPC envelope, so every shared defect was twelve defects
(2026-06-06 audit): request ids were int(time.time()*1000)%0x7FFFFFFF
(same-millisecond calls collided), and every site fired
conn.websocket.send() uncoordinated against the other senders on the
same ServerConnection (the websockets library forbids interleaved
sends).

New module custom-providers/xiaozhi-patches/device_command.py, mounted
at core/utils/device_command.py so both patch surfaces import it:

- monotonic per-connection request ids (collision-free for the life of
  the connection; stored on the conn so state dies with it)
- the MCP envelope in one place
- per-connection serialized sends via an asyncio.Lock; play-asset's
  opus frame stream + tts lifecycle frames route through the same lock
  so a concurrent admin call (e.g. a sound-turner head-turn) can't
  interleave mid-frame. (Upstream's own chat-path writer doesn't take
  the lock yet — this seam is where that lands.)

Reply correlation is deliberately NOT implemented: call_tool returns
the request id so a correlation layer can be added behind this
interface without touching the twelve callers again.

http_server.py also gains _dotty_resolve_conn(): one copy of the
named-or-first-device lookup instead of eight, which fixes the audit's
inject-text missing-`or {}` headers guard as a side effect. Handlers
shrink from ~50 lines to ~15.

Both compose files gain the new bind mount (volume/env parity kept).

Tests: tests/test_device_command.py (8 cases — monotonic/per-conn ids,
wire-shape parity with the old envelope, send serialization under
concurrency, http_server wiring, and a guard that the ms-truncated id
pattern never resurfaces in either file); the two existing http_server
harnesses now inject the real seam module. Full suite 320 passing,
coverage 69.1% (gate 56%).

Bench-pending: live-device smoke (LED pips, head-turn, take-photo,
play-asset) before release sign-off.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant