You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(learnings): add .vapi-ignore lifecycle, pronunciation decision tree, name 40-char cap, PATCH semantics, ElevenLabs phoneme model compatibility
Six wiki additions + AGENTS.md and CLAUDE.md routing fix to surface them:
- yaml-conventions.md: 'Working with .vapi-ignore' — recovery flow
('was that not in the .vapi-ignore?'), cardinal rule against
silent edits, anti-pattern of editing .vapi-ignore to suppress
unexpected drift instead of resolving the cause.
- assistants.md: 'Choosing the right pronunciation layer' — symptom
-> layer decision tree (word misheard = transcriber, word
mispronounced = TTS), with diagnostic question and forward/back
cross-links from Transcriber Configuration and Pronunciation
dictionaries (TTS-level) sections.
- assistants.md: 'Assistant top-level name is limited to 1-40
characters' — separate enforcement site from structuredOutput.name,
not surfaced in the public schema reference.
- assistants.md: 'PATCH /assistant/:id semantics: shallow replacement
at the top-level field' — wholesale replacement of object/array
subtrees; safe-append pattern is GET -> mutate -> PATCH; explicit
contrast with assistantOverrides which deep-merges per multilingual.md.
- voice-providers.md: 'Pronunciation dictionary support: per-provider
field shapes' — Cartesia (pronunciationDictId, sonic-3 only),
ElevenLabs (pronunciationDictionaryLocators, dictionaryName upstream
field NOT name), Vapi voices (schema-level support; dashboard UI in
active PRISM-474 rollout; runtime needs call-test verification).
Public-docs out-of-date callout.
- voice-providers.md: 'ElevenLabs phoneme rule model compatibility' —
alias rules universal; phoneme rules silently no-op'd on the default
eleven_turbo_v2_5 and other current models. Customer impact: zero
benefit, zero signal. Workarounds: alias-only authoring or pin to
eleven_flash_v2.
- AGENTS.md + CLAUDE.md: add yaml-conventions.md to the Learnings &
recipes routing table (was missing, making any yaml-conventions.md
content invisible to agents).
Cross-checks performed:
- multilingual.md:148-160 deep-merge claim verified to be about
assistantOverrides, NOT PATCH. No wiki contradiction.
- Customer-name + UUID scrubs clean against all additions.
- Engine state re-verified: candidates that were superseded by recent
main commits (vapi sync scoping by org-scoping/dry-run/drift; non-
transactional pushes by snapshot-on-push #21 + validate #17) were
dropped. Active platform bug PRISM-641 dropped per user instruction.
Skipping code-reviewer per docs-only carve-out in the always-apply
rule. All cross-references manually verified; anchor slugs match
GitHub heading-to-slug conventions.
Copy file name to clipboardExpand all lines: docs/learnings/assistants.md
+59-1Lines changed: 59 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -127,8 +127,27 @@ voice:
127
127
128
128
---
129
129
130
+
## Choosing the right pronunciation layer
131
+
132
+
Pronunciation problems live in two unrelated layers — picking the wrong one wastes a debugging cycle. Reproduce the failure first, then map symptom to layer.
133
+
134
+
| Symptom | Fix on | How |
135
+
|---|---|---|
136
+
| Word **misheard** by the agent (e.g. STT decodes "VAT" as "that") | Transcriber (input side) | `customVocabulary` (Soniox), `keyterm` (Deepgram). See [Transcriber Configuration](#transcriber-configuration) for syntax. |
137
+
| Word **mispronounced** by the agent (e.g. TTS reads "VAT" as "vee-ay-tee") | Voice / TTS (output side) | `pronunciationDictId` (Cartesia), `pronunciationDictionaryLocators` (ElevenLabs). See [Pronunciation dictionaries (TTS-level)](#pronunciation-dictionaries-tts-level) for the per-provider config. |
138
+
139
+
**Diagnostic question:** Did the transcript record what the user actually said?
140
+
- **No** — the STT got it wrong. Fix on the transcriber.
141
+
- **Yes, but the agent then said it wrong** — the TTS is mispronouncing. Fix on the voice.
142
+
143
+
Don't try both layers at once. They shape independent halves of the call and the wrong layer adds config noise without addressing the failure. For per-provider voice-side field shapes (Cartesia vs ElevenLabs vs Vapi), see [voice-providers.md → Pronunciation dictionary support](voice-providers.md#pronunciation-dictionary-support-per-provider-field-shapes).
144
+
145
+
---
146
+
130
147
## Transcriber Configuration
131
148
149
+
> **If a word is being misheard by the agent**, this is the right layer to fix it (input side). If a word is being mispronounced by the agent, fix the voice/TTS layer instead — see [Choosing the right pronunciation layer](#choosing-the-right-pronunciation-layer).
150
+
132
151
### Provider recommendations by language
133
152
134
153
| Language | Recommended Provider |
@@ -282,12 +301,14 @@ startSpeakingPlan:
282
301
283
302
### Pronunciation dictionaries (TTS-level)
284
303
304
+
> **If a word is being mispronounced by the agent**, this is the right layer to fix it (output side). If a word is being misheard, fix the transcriber instead — see [Choosing the right pronunciation layer](#choosing-the-right-pronunciation-layer). For per-provider voice-side field shapes, see [voice-providers.md → Pronunciation dictionary support](voice-providers.md#pronunciation-dictionary-support-per-provider-field-shapes).
305
+
285
306
Pronunciation dictionaries control how TTS voices say specific words. They are **provider-specific**:
286
307
287
308
| Provider | Support | Config field | Model requirement |
| **Cartesia** | Full IPA + sounds-like across all languages | `pronunciationDictId` on voice config | `sonic-3` only |
290
-
| **ElevenLabs** | Phoneme rules (IPA/CMU, English only) + alias rules (all languages) | `pronunciationDictionaryLocators` on voice config | Phoneme: `eleven_turbo_v2`, `eleven_flash_v2`. Alias: all models |
311
+
| **ElevenLabs** | Phoneme rules (IPA/CMU, English only) + alias rules (all languages) | `pronunciationDictionaryLocators` on voice config | Alias: all models. Phoneme: model-dependent and silently no-op'd on most current models — see [voice-providers.md → ElevenLabs phoneme rule model compatibility](voice-providers.md#elevenlabs-phoneme-rule-model-compatibility). |
291
312
| **Vapi built-in** | None | N/A | N/A |
292
313
293
314
**Pronunciation dictionaries** are created via the Vapi API, then referenced by ID in the voice config. This is the same pattern as `credentialId` — the provider resource lives outside gitops, the reference is gitops-managed.
@@ -425,6 +446,19 @@ If a hook references a `toolId` that doesn't exist, Vapi logs a warning and cont
425
446
426
447
`customer.speech.timeout`(hook) and `silenceTimeoutSeconds` (assistant) are separate mechanisms. The hook fires an action; the timeout ends the call. Configure them independently.
427
448
449
+
### Assistant top-level `name` is limited to 1-40 characters
450
+
451
+
The Vapi API enforces a hard 40-character maximum on the top-level `name` field of an assistant resource. Push-time error:
452
+
453
+
```
454
+
PATCH /assistant/<id> → 400
455
+
name must be shorter than or equal to 40 characters
456
+
```
457
+
458
+
This is **a separate field from `structuredOutput.name`** — both share the 40-char cap, but the enforcement sites are independent (see [structured-outputs.md](structured-outputs.md#structuredoutputname-is-limited-to-1-40-characters)). The constraint is not surfaced in the public schema reference; it's only enforced server-side at PATCH/POST time.
459
+
460
+
**Recommendation:** when generating descriptive assistant names from templates ("Triage Classifier — Multilingual Classic Variant" = 51 chars), trim before push or use shorter abbreviations. Put descriptive nuance in a comment in the YAML or in the system prompt body, not the `name` field.
461
+
428
462
### `silenceTimeoutSeconds` minimum is 10
429
463
430
464
The Vapi API enforces a hard minimum of **10 seconds** on `silenceTimeoutSeconds`. Setting this field to anything less than 10 (e.g., `5` or `8`) will fail at push time with:
@@ -448,6 +482,30 @@ The minimum is not documented in the gitops engine README and is only surfaced w
448
482
449
483
---
450
484
485
+
## PATCH /assistant/:id semantics: shallow replacement at the top-level field
486
+
487
+
`PATCH /assistant/:id` is partial-update at the **top level only** — fields not in the request body stay untouched. But within each field you DO send, replacement is **wholesale, NOT deep-merged**. `PATCH { hooks: [oneNewHook] }` leaves the assistant with exactly one hook even if it had three before.
488
+
489
+
The same shallow-replace rule applies to: `model.messages`, `analysisPlan`, `voice`, `transcriber`, `messagePlan`, `serverMessages`, and any other object or array field. Whatever subtree you send overwrites the entire subtree on the resource.
490
+
491
+
**Safe-append pattern** — GET → mutate the returned array/object → PATCH the full structure back:
492
+
493
+
```yaml
494
+
# 1. GET /assistant/:id, capture existing.hooks
495
+
# 2. Append your new hook locally
496
+
# 3. PATCH with the full hooks array (existing + new)
497
+
hooks:
498
+
- { ...existing hook 1 }
499
+
- { ...existing hook 2 }
500
+
- { ...new hook you wanted to add }
501
+
```
502
+
503
+
**Important distinction:** this is the REST API PATCH semantic. It is **different** from `assistantOverrides` in squad configs, which **deep-merges** partial nested objects per [multilingual.md → What Can Be Overridden](multilingual.md#what-can-be-overridden). When working through `assistantOverrides`, partial subtrees compose with the base assistant's config; when working through PATCH, partial subtrees replace.
504
+
505
+
See also: [fallbacks.md](fallbacks.md#phone-number-fallback-hook) for the same gotcha applied to phone-number hooks.
Copy file name to clipboardExpand all lines: docs/learnings/voice-providers.md
+72Lines changed: 72 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,3 +95,75 @@ If a customer changes the provider on the dashboard and your local YAML still ha
95
95
## Adding a new provider
96
96
97
97
If you find yourself reaching for a provider not in the table above, append a row here in the same PR. The cheat-sheet only stays useful if it grows with the platform.
98
+
99
+
---
100
+
101
+
## Pronunciation dictionary support: per-provider field shapes
102
+
103
+
Pronunciation dictionaries do not share a field shape across voice providers. Same conceptual feature, three different surfaces.
104
+
105
+
> **Public-docs note:** As of 2026-05-08 the public Vapi docs state pronunciation dictionaries are "exclusive to ElevenLabs voices." This is out of date — Cartesia has been confirmed in production deployments and Vapi-voice schema-level support is in active rollout (PRISM-474). Treat this wiki as the more current source.
106
+
107
+
### Cartesia
108
+
109
+
- **Field**: `voice.pronunciationDictId` — single string ID on the voice config.
110
+
- **Model requirement**: `model: sonic-3` only. Other Cartesia models silently ignore the field.
111
+
- **Upstream resource shape**: the Cartesia dictionary resource exposes a `name` field.
112
+
- **Full config example**: see [assistants.md → Pronunciation dictionaries (TTS-level)](assistants.md#pronunciation-dictionaries-tts-level).
113
+
114
+
### ElevenLabs
115
+
116
+
- **Field**: `voice.pronunciationDictionaryLocators` — array of `{ pronunciationDictionaryId, versionId? }`.
117
+
- **Model requirement**: alias rules work on all ElevenLabs models. **Phoneme rules are silently no-op'd** on `eleven_turbo_v2_5` (Vapi's default), `eleven_flash_v2_5`, `eleven_multilingual_v2`, and `eleven_v3`. See [ElevenLabs phoneme rule model compatibility](#elevenlabs-phoneme-rule-model-compatibility) below for the full breakdown.
118
+
- **Upstream resource shape**: the ElevenLabs dictionary resource exposes a `dictionaryName` field — **NOT `name`**. This trips up wrappers that fetch dictionaries via API and surface them in tools that also handle Cartesia.
119
+
120
+
### Vapi voices
121
+
122
+
- **Schema-level**: accepts pronunciation dictionary configs at the API.
123
+
- **Dashboard UI surface**: in active rollout (PRISM-474, Q2 2026). Schema acceptance does **not** guarantee runtime TTS engine honors the dictionary.
124
+
- **Recommendation**: verify runtime behavior with a call test before depending on it for production Vapi-voice deployments.
125
+
126
+
### Field shape gotcha
127
+
128
+
The three provider families do NOT use the same field name on the upstream pronunciation-dictionary resource:
129
+
130
+
| Provider | Upstream display-name field |
131
+
|---|---|
132
+
| Cartesia | `name` |
133
+
| ElevenLabs | `dictionaryName` |
134
+
| Vapi voices | shape pending finalization |
135
+
136
+
If you're authoring a wrapper or migration tool that handles all three, gracefully handle the divergence. A single `name`-only path will silently render ElevenLabs dictionaries with empty labels.
137
+
138
+
### ElevenLabs phoneme rule model compatibility
139
+
140
+
ElevenLabs splits pronunciation rules into two types:
141
+
142
+
- **Alias rules** — word substitution ("MyBrand" → "my-brand"). **Work universally** on all ElevenLabs models.
- `eleven_turbo_v2_5`— Vapi's default ElevenLabs model
147
+
- `eleven_flash_v2_5`
148
+
- `eleven_multilingual_v2`
149
+
- `eleven_v3`
150
+
151
+
**Confirmed supported:**
152
+
- `eleven_flash_v2`
153
+
- Likely `eleven_monolingual_v1` (ElevenLabs docs disagree across pages on the exact set — verify before depending on it)
154
+
155
+
**Silent-skip behavior:** when a phoneme rule is sent to an unsupported model, ElevenLabs does NOT error. It bypasses the rule and uses standard pronunciation. **Customer impact:** attaching a phoneme-only dict to the default voice gets zero benefit with no signal — the call sounds exactly like the no-dict baseline.
156
+
157
+
**Workarounds:**
158
+
1. **Author dict as alias rules** — they work everywhere. Trade phoneme precision for portability.
159
+
2. **Pin to `eleven_flash_v2`** — explicit model lock if phoneme accuracy matters more than the latency profile of `eleven_turbo_v2_5` / `eleven_flash_v2_5`.
Copy file name to clipboardExpand all lines: docs/learnings/yaml-conventions.md
+23Lines changed: 23 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -183,6 +183,29 @@ The blank line after `---` is conventional; the strict requirement is just that
183
183
184
184
---
185
185
186
+
## Working with `.vapi-ignore`
187
+
188
+
`.vapi-ignore`lives at `resources/<org>/.vapi-ignore` and excludes specific resources from pull and push so the dashboard stays the source of truth for them. See `AGENTS.md` (line 13) for the basic gitignore-style syntax.
189
+
190
+
The recovery flow when a sync surfaces "drift" you didn't expect — typically prompted by "was that not in the .vapi-ignore?":
191
+
192
+
1. **Inspect first**, don't edit. Diff the file against `main` to see whether the path was already ignored:
2. **If a dashboard-only asset is genuinely missing from `.vapi-ignore`**, add the pattern. Otherwise stop here — the asset belongs in yaml.
197
+
3. **Dry-run before applying** to confirm only the intended assets will change:
198
+
```bash
199
+
npm run push -- <org> --dry-run
200
+
```
201
+
4. **Apply** once the dry-run is clean: `npm run push -- <org>`.
202
+
203
+
**Cardinal rule:** don't edit `.vapi-ignore` without explicit user direction. The file encodes intentional dashboard-vs-yaml ownership splits the user (or an earlier customer-engagement decision) knows about. Removing a pattern silently re-claims an asset for gitops control, which can blow away dashboard-only edits on the next push.
204
+
205
+
**Anti-pattern:** editing `.vapi-ignore` because a sync surfaced an unexpected diff is *removing the protection*, not fixing the cause. The cause is usually upstream: the asset was edited in both places, or a new asset that should be dashboard-owned was created via gitops. Resolve at the source, then leave `.vapi-ignore` alone.
0 commit comments