Skip to content

Commit 79b5dc0

Browse files
committed
docs(learnings): add stopSpeakingPlan.voiceSeconds 0.5 max constraint
The Vapi API rejects voiceSeconds > 0.5 with a 400. Surfaced during PRISM-481 voicemail-triage tuning on Mudflap when widening barge-in tolerance to 0.75 to suppress Soniox partial-driven LLM clearing during request-start TTS playback. Schema cap is undocumented in the API reference but enforced server-side.
1 parent f72b89e commit 79b5dc0

1 file changed

Lines changed: 11 additions & 0 deletions

File tree

docs/learnings/assistants.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -496,6 +496,17 @@ These are complementary, not alternatives.
496496

497497
`numWords: 2` means the user must speak 2 words before the assistant stops talking. Lower values make the assistant more interruptible.
498498

499+
### `voiceSeconds` maximum is 0.5
500+
501+
The Vapi API enforces a hard maximum of **0.5 seconds** on `stopSpeakingPlan.voiceSeconds`. Setting this higher (e.g., `0.75` or `1.0`) fails at push time with:
502+
503+
```
504+
PATCH /assistant/<id> → 400
505+
stopSpeakingPlan.voiceSeconds must not be greater than 0.5
506+
```
507+
508+
The cap is undocumented in the schema reference but enforced server-side. When widening barge-in tolerance for assistants that handle continuous speech (voicemail prompts, IVR menus, fast personas), `numWords` is the load-bearing knob — `voiceSeconds` can only be tightened up to the cap (default 0.2 → max 0.5). For example, on a Soniox-transcribed classifier handling voicemail audio, `numWords: 5` does most of the work; `voiceSeconds: 0.5` is just a tighter ceiling.
509+
499510
### `numWords: 2` produces a 500–800ms TTS overlap window
500511

501512
**Why this matters for transcript quality, not just feel:** While the assistant waits for the second word to land before stopping, both speakers are talking simultaneously. That overlap window is typically **500–800ms** at conversational pace. STT confidence drops sharply during overlap, so the customer's first sentence after a barge-in often arrives garbled — wrong words, dropped clauses, or low-confidence transcripts that get filtered out (see `confidenceThreshold` above).

0 commit comments

Comments
 (0)