From 5a9bcb7542e2d7d9693213435c8aab3c63669fba Mon Sep 17 00:00:00 2001 From: Mickael Farina Date: Tue, 9 Jun 2026 22:46:49 +0200 Subject: [PATCH] feat(voice): flash / default / think modes + consent-gated gmail send MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three modes on the live voice-to-voice chat, same interface (docs/VOICE-MODES-DESIGN.md, operator-approved): - flash: prefill-surgery for snappy turns — context trim 20→8 turns, skip per-turn memory + observer injections, max_tokens 2000→400, one-sentence rule, TTS 1.25x. Same local Qwen 35B. - default: byte-identical to previous behavior. - think: live multi-step tool calling — utterances route through a voice-scoped codec_agents.Agent over a curated allowlist (hue, music, chrome, web search/fetch, calendar, gmail, imessage, timers, reminders, weather, notes). Each tool call is narrated over TTS (extended-wait keepalive); speaking interrupts the loop; 6-tool + 120s budgets. Terminal/ python_exec/file_write etc. are hard-excluded and config-immune. The single-skill fast path still fires first. Switching: voice command ("flash/think/normal mode", terse-utterance guard so "i think mode collapse…" can't hijack) + three UI pills on /voice + additive WS {"type":"mode"} protocol. Kill switch VOICE_MODES_ENABLED. New audit event voice_mode_changed (AGENTS.md §6 updated); think tool calls ride the existing run_with_hooks audit chokepoint. gmail: new consent-gated `send` action (think-mode "send an email" demo). Strict Step-3 consent — CODEC reads the draft, only the literal spoken word "send" sends; timeout/disabled/generic-yes fail CLOSED. gmail.modify scope already covers send (no re-auth). Manifest regenerated (PR-1A). Tests (TDD red-then-green): tests/test_voice_modes.py (11) + tests/test_gmail_send.py (6). Full suite 2391 passed. Co-Authored-By: Claude Fable 5 --- AGENTS.md | 8 ++ codec_voice.html | 45 +++++++ codec_voice.py | 268 ++++++++++++++++++++++++++++++++----- docs/VOICE-MODES-DESIGN.md | 91 +++++++++++++ skills/.manifest.json | 2 +- skills/google_gmail.py | 64 ++++++++- tests/test_gmail_send.py | 121 +++++++++++++++++ tests/test_voice_modes.py | 199 +++++++++++++++++++++++++++ 8 files changed, 758 insertions(+), 40 deletions(-) create mode 100644 docs/VOICE-MODES-DESIGN.md create mode 100644 tests/test_gmail_send.py create mode 100644 tests/test_voice_modes.py diff --git a/AGENTS.md b/AGENTS.md index 48f2126..037ce9b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -562,6 +562,14 @@ Three event names, all info-level. `agent_message_sent` and `agent_message_recei `PHASE3_STEP10_EVENTS` frozenset exposed. +#### Voice modes event (flash / default / think — docs/VOICE-MODES-DESIGN.md) + +| Event | Source | level | extra fields | +|---|---|---|---| +| `voice_mode_changed` | `codec-voice` | info | `mode` (`flash` \| `default` \| `think`), `via` (`voice` \| `ui`) | + +Single-emit, fresh or session cid. Think-mode tool calls need no new events — they route through the skill `Tool` wrappers → `run_with_hooks` → existing `tool_call`/`tool_result` envelope. + ### Notifications (`~/.codec/notifications.json`) Four sources can produce notifications: scheduler (crew completion), heartbeat (threshold alert), autopilot (ambient trigger), and Phase 1 Step 3's AskUserQuestion (`type="question"`). All write through `routes/_shared.py:51-127` except AskUserQuestion which writes via `codec_ask_user._write_question_notification`. diff --git a/codec_voice.html b/codec_voice.html index 69489bf..9d81075 100644 --- a/codec_voice.html +++ b/codec_voice.html @@ -455,6 +455,20 @@ .modal-overlay.show { display:flex; } .modal-overlay img { max-width:100%; max-height:90vh; border-radius:var(--radius); border:1px solid var(--border); } .modal-close { position:fixed; top:16px; right:16px; background:var(--surface); color:var(--text); border:1px solid var(--border); width:32px; height:32px; border-radius:50%; font-size:18px; cursor:pointer; z-index:101; display:flex; align-items:center; justify-content:center; } + /* ── Voice mode pills (flash / default / think) ── */ + .mode-row { display:flex; gap:8px; justify-content:center; margin:10px 0 2px; } + .mode-pill { + background:var(--surface-2,#1f1f22); border:1px solid var(--border,#2a2a2f); + color:var(--text-dim,#6a6a70); font-size:11px; font-weight:700; + letter-spacing:1px; padding:5px 14px; border-radius:999px; cursor:pointer; + transition:all .15s; text-transform:uppercase; + } + .mode-pill:hover { color:var(--text,#ececec); border-color:var(--border-hover,#3a3a40); } + .mode-pill.active { + color:#fff; background:var(--accent,#E8711A); border-color:var(--accent,#E8711A); + box-shadow:0 0 12px rgba(232,113,26,.35); + } + .mode-pill.think.active { box-shadow:0 0 12px rgba(232,113,26,.5); } @@ -514,6 +528,13 @@

CODEC

Tap to start voice call
+ +
+ + + +
+