feat(amd): add MACHINE_SCREENING category for carrier call-screening services#1386
feat(amd): add MACHINE_SCREENING category for carrier call-screening services#1386karan-dhir wants to merge 2 commits into
Conversation
Adds a 6th `AMDCategory` for carrier-injected call-screening prompts
(Google Pixel Call Screen, iOS 18 Call Screening, and similar). Today
these prompts are most often classified as `MACHINE_VM` because they
sound like a TTS message after pickup, but they're not voicemail —
the callee is reachable, the caller is being asked to record a brief
identification.
Surface contract:
- `result.category === MACHINE_SCREENING` on detection.
- `result.isMachine === true` so consumers' "did a machine answer?"
checks behave intuitively.
- `interruptOnMachine: true` does NOT auto-interrupt on screening.
Callers handling screening typically need to play a short
identification greeting in response, which an automatic interrupt
would cancel. Implementation note: auto-interrupt now gates on
`isMachineCategory(result.category)` (the narrower set excluding
SCREENING), while `result.isMachine` uses the wider
`isMachineResult` set including SCREENING. This is the intended
asymmetry.
Documented in the enum JSDoc + with examples in the LLM classification
prompt (`AMD_PROMPT`).
Test: new case asserts SCREENING → `isMachine: true` AND `interrupt`
NOT called when `interruptOnMachine: true`. Existing 4 test cases
unchanged. Vitest, lint, and tsc all clean on the changed files.
Motivated by an outbound voice agent (Woflow) where Pixel Call Screen
prompts were being classified as voicemail and the agent dumped a full
voicemail script into Pixel's screened transcript shown to the
recipient. With MACHINE_SCREENING as a distinct verdict, callers can
respond appropriately without false-flagging as voicemail.
|
…-amd-category # Conflicts: # agents/src/voice/amd.ts
There was a problem hiding this comment.
🔴 save_prediction tool uses isMachineCategory instead of isMachineResult, so MACHINE_SCREENING gets isMachine: false
The save_prediction tool's execute callback at line 994 uses isMachineCategory(normalized) to compute isMachine. Since MACHINE_SCREENING is intentionally excluded from MACHINE_CATEGORIES (to prevent auto-interrupt), this returns false. However, the design intent of this PR — documented in the enum comment, the MACHINE_RESULT_CATEGORIES set, and the other two code paths — is that screening should have isMachine: true. The fallback plain-JSON path (agents/src/voice/amd.ts:1104) and the silence-timer path (agents/src/voice/amd.ts:732) both correctly use isMachineResult(). This means the tool-call path produces isMachine: false while every other path produces isMachine: true for the same category.
(Refers to line 994)
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
🔴 save_prediction tool schema omits MACHINE_SCREENING, preventing LLM from classifying screening via tool call
The save_prediction tool's z.enum at lines 974-980 lists only 5 categories and omits AMDCategory.MACHINE_SCREENING. The system prompt (agents/src/voice/amd.ts:186) tells the LLM that machine-screening is a valid category, but the tool schema — which is sent to the LLM as the function signature — does not include it. Since toolChoice: 'required' is set at agents/src/voice/amd.ts:1046, the LLM must respond with a tool call. Without machine-screening in the schema, compliant LLMs will never call save_prediction with that label; they may misclassify as one of the other categories or fall back to plain JSON content (which does handle screening correctly via parseDetection). The fix is to add AMDCategory.MACHINE_SCREENING to the enum.
(Refers to lines 974-980)
Was this helpful? React with 👍 or 👎 to provide feedback.
Closes #1374 (issue filed first per CONTRIBUTING.md; opening PR for review now that the proposal has been written up there).
Summary
Adds a 6th
AMDCategoryfor carrier-injected call-screening prompts (Google Pixel Call Screen, iOS 18 Call Screening, and similar TTS services that intercept the call and ask the caller to identify themselves before reaching the human owner).Today these prompts most often classify as
MACHINE_VMbecause they sound like a TTS message after pickup — but they're not voicemail. The callee is reachable; the caller is being asked to record a brief identification. Distinguishing them at the AMD layer lets consumers respond appropriately (e.g. play a short identification greeting in response, instead of a voicemail-message script).Motivated by an outbound voice agent (Woflow) where Pixel Call Screen prompts were classifying as voicemail and the agent was dumping a full voicemail script into Pixel's screened transcript shown to the recipient. See #1374 for the full motivation.
Surface contract
For
MACHINE_SCREENING:result.category'machine-screening'result.isMachinetrue(consumers' "did a machine answer?" checks behave intuitively)interruptOnMachine: trueThe asymmetry between
result.isMachineandinterruptOnMachineis implemented by splitting the existing machine-categories set into two:MACHINE_CATEGORIES— drives auto-interrupt. ExcludesMACHINE_SCREENING.MACHINE_RESULT_CATEGORIES— drivesresult.isMachine. IncludesMACHINE_SCREENING.Auto-interrupt at
finish()now gates onisMachineCategory(result.category)directly instead ofresult.isMachine, preserving the asymmetry without making the field-level contract surprising.Implementation
4 changes in
agents/src/voice/amd.ts+ 1 new test inagents/src/voice/amd.test.ts:MACHINE_SCREENINGto theAMDCategoryenum (with JSDoc explaining the contract).MACHINE_RESULT_CATEGORIESset +isMachineResulthelper next to the existingMACHINE_CATEGORIES/isMachineCategory. Doc-comments explain the asymmetry.isMachine: isMachineCategory(category)callsites to useisMachineResultinstead.finish()to useisMachineCategory(result.category)directly.AMD_PROMPTwith the new category description + 4 example prompts (Pixel + iOS + generic).Test added:
should classify call screening as machine without auto-interrupt— asserts bothisMachine: trueANDsession.interruptnot called wheninterruptOnMachine: true.Risk / breakage
interruptOnMachinesemantics for HUMAN / IVR / VM / UNAVAILABLE / UNCERTAIN are unchanged byte-for-byte.MACHINE_SCREENINGfrom the LLM yet (because they have a custom prompt, or are running on an older model) get the same behaviour as before.Verification done locally
pnpm exec vitest run agents/src/voice/amd.test.ts— 5/5 pass (4 existing + 1 new)pnpm exec tsc --noEmit -p agents/tsconfig.json— cleanpnpm format:write— clean (no diff after format)pnpm lint --filter=@livekit/agents— zero warnings on changed files (pre-existing 120 warnings on other files unaffected)Python parity
The JS classifier doc-comments reference
python classifier.pypatterns, suggesting a sibling implementation inlivekit/agents. Happy to mirror this change there once the JS direction is settled — let me know which order you'd prefer.Diff stats
+77 / -4LOC across 2 files. Single commit.Notes for reviewers
MACHINE_SCREENINGwas chosen for parallelism with the existingMACHINE_*prefix and the fact that screening is technically a machine (TTS) intercepting the call. Open to alternatives if there's a project convention I'm missing.isMachine(true) andinterruptOnMachine(no interrupt) was deliberate; happy to switch to either:isMachine: falsefor screening (consumers' machine-checks ignore screening) + auto-interrupt off — simpler but less informativeisMachine: true+ auto-interrupt on — matches existing pattern but breaks the screening use case (the greeting gets cancelled)isMachine: true+ auto-interrupt off — informative + functionalThanks for the review!