Skip to content

Commit d6679c9

Browse files
committed
docs(learnings): add .vapi-ignore lifecycle, pronunciation decision tree, name 40-char cap, PATCH semantics, ElevenLabs phoneme model compatibility
Six wiki additions + AGENTS.md and CLAUDE.md routing fix to surface them: - yaml-conventions.md: 'Working with .vapi-ignore' — recovery flow ('was that not in the .vapi-ignore?'), cardinal rule against silent edits, anti-pattern of editing .vapi-ignore to suppress unexpected drift instead of resolving the cause. - assistants.md: 'Choosing the right pronunciation layer' — symptom -> layer decision tree (word misheard = transcriber, word mispronounced = TTS), with diagnostic question and forward/back cross-links from Transcriber Configuration and Pronunciation dictionaries (TTS-level) sections. - assistants.md: 'Assistant top-level name is limited to 1-40 characters' — separate enforcement site from structuredOutput.name, not surfaced in the public schema reference. - assistants.md: 'PATCH /assistant/:id semantics: shallow replacement at the top-level field' — wholesale replacement of object/array subtrees; safe-append pattern is GET -> mutate -> PATCH; explicit contrast with assistantOverrides which deep-merges per multilingual.md. - voice-providers.md: 'Pronunciation dictionary support: per-provider field shapes' — Cartesia (pronunciationDictId, sonic-3 only), ElevenLabs (pronunciationDictionaryLocators, dictionaryName upstream field NOT name), Vapi voices (schema-level support; dashboard UI in active PRISM-474 rollout; runtime needs call-test verification). Public-docs out-of-date callout. - voice-providers.md: 'ElevenLabs phoneme rule model compatibility' — alias rules universal; phoneme rules silently no-op'd on the default eleven_turbo_v2_5 and other current models. Customer impact: zero benefit, zero signal. Workarounds: alias-only authoring or pin to eleven_flash_v2. - AGENTS.md + CLAUDE.md: add yaml-conventions.md to the Learnings & recipes routing table (was missing, making any yaml-conventions.md content invisible to agents). Cross-checks performed: - multilingual.md:148-160 deep-merge claim verified to be about assistantOverrides, NOT PATCH. No wiki contradiction. - Customer-name + UUID scrubs clean against all additions. - Engine state re-verified: candidates that were superseded by recent main commits (vapi sync scoping by org-scoping/dry-run/drift; non- transactional pushes by snapshot-on-push #21 + validate #17) were dropped. Active platform bug PRISM-641 dropped per user instruction. Skipping code-reviewer per docs-only carve-out in the always-apply rule. All cross-references manually verified; anchor slugs match GitHub heading-to-slug conventions.
1 parent 79b5dc0 commit d6679c9

5 files changed

Lines changed: 156 additions & 1 deletion

File tree

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ This project manages **Vapi voice agent configurations** as code. All resources
3232
| Voicemail detection / VM vs human classification | `docs/learnings/voicemail-detection.md` |
3333
| Enforcing call time limits / graceful call ending | `docs/learnings/call-duration.md` |
3434
| Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.) | `docs/learnings/voice-providers.md` |
35+
| YAML authoring conventions, .vapi-ignore lifecycle | `docs/learnings/yaml-conventions.md` |
3536

3637
---
3738

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t
2525
- Multilingual agents → `docs/learnings/multilingual.md`
2626
- WebSocket transport → `docs/learnings/websocket.md`
2727
- Call time limits / graceful ending → `docs/learnings/call-duration.md`
28+
- YAML authoring conventions, .vapi-ignore lifecycle → `docs/learnings/yaml-conventions.md`
2829

2930
## Improvements log
3031

docs/learnings/assistants.md

Lines changed: 59 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,8 +127,27 @@ voice:
127127

128128
---
129129

130+
## Choosing the right pronunciation layer
131+
132+
Pronunciation problems live in two unrelated layers — picking the wrong one wastes a debugging cycle. Reproduce the failure first, then map symptom to layer.
133+
134+
| Symptom | Fix on | How |
135+
|---|---|---|
136+
| Word **misheard** by the agent (e.g. STT decodes "VAT" as "that") | Transcriber (input side) | `customVocabulary` (Soniox), `keyterm` (Deepgram). See [Transcriber Configuration](#transcriber-configuration) for syntax. |
137+
| Word **mispronounced** by the agent (e.g. TTS reads "VAT" as "vee-ay-tee") | Voice / TTS (output side) | `pronunciationDictId` (Cartesia), `pronunciationDictionaryLocators` (ElevenLabs). See [Pronunciation dictionaries (TTS-level)](#pronunciation-dictionaries-tts-level) for the per-provider config. |
138+
139+
**Diagnostic question:** Did the transcript record what the user actually said?
140+
- **No** — the STT got it wrong. Fix on the transcriber.
141+
- **Yes, but the agent then said it wrong** — the TTS is mispronouncing. Fix on the voice.
142+
143+
Don't try both layers at once. They shape independent halves of the call and the wrong layer adds config noise without addressing the failure. For per-provider voice-side field shapes (Cartesia vs ElevenLabs vs Vapi), see [voice-providers.md → Pronunciation dictionary support](voice-providers.md#pronunciation-dictionary-support-per-provider-field-shapes).
144+
145+
---
146+
130147
## Transcriber Configuration
131148

149+
> **If a word is being misheard by the agent**, this is the right layer to fix it (input side). If a word is being mispronounced by the agent, fix the voice/TTS layer instead — see [Choosing the right pronunciation layer](#choosing-the-right-pronunciation-layer).
150+
132151
### Provider recommendations by language
133152

134153
| Language | Recommended Provider |
@@ -282,12 +301,14 @@ startSpeakingPlan:
282301

283302
### Pronunciation dictionaries (TTS-level)
284303

304+
> **If a word is being mispronounced by the agent**, this is the right layer to fix it (output side). If a word is being misheard, fix the transcriber instead — see [Choosing the right pronunciation layer](#choosing-the-right-pronunciation-layer). For per-provider voice-side field shapes, see [voice-providers.md → Pronunciation dictionary support](voice-providers.md#pronunciation-dictionary-support-per-provider-field-shapes).
305+
285306
Pronunciation dictionaries control how TTS voices say specific words. They are **provider-specific**:
286307

287308
| Provider | Support | Config field | Model requirement |
288309
|----------|---------|-------------|-------------------|
289310
| **Cartesia** | Full IPA + sounds-like across all languages | `pronunciationDictId` on voice config | `sonic-3` only |
290-
| **ElevenLabs** | Phoneme rules (IPA/CMU, English only) + alias rules (all languages) | `pronunciationDictionaryLocators` on voice config | Phoneme: `eleven_turbo_v2`, `eleven_flash_v2`. Alias: all models |
311+
| **ElevenLabs** | Phoneme rules (IPA/CMU, English only) + alias rules (all languages) | `pronunciationDictionaryLocators` on voice config | Alias: all models. Phoneme: model-dependent and silently no-op'd on most current models — see [voice-providers.md → ElevenLabs phoneme rule model compatibility](voice-providers.md#elevenlabs-phoneme-rule-model-compatibility). |
291312
| **Vapi built-in** | None | N/A | N/A |
292313

293314
**Pronunciation dictionaries** are created via the Vapi API, then referenced by ID in the voice config. This is the same pattern as `credentialId` — the provider resource lives outside gitops, the reference is gitops-managed.
@@ -425,6 +446,19 @@ If a hook references a `toolId` that doesn't exist, Vapi logs a warning and cont
425446

426447
`customer.speech.timeout` (hook) and `silenceTimeoutSeconds` (assistant) are separate mechanisms. The hook fires an action; the timeout ends the call. Configure them independently.
427448

449+
### Assistant top-level `name` is limited to 1-40 characters
450+
451+
The Vapi API enforces a hard 40-character maximum on the top-level `name` field of an assistant resource. Push-time error:
452+
453+
```
454+
PATCH /assistant/<id> → 400
455+
name must be shorter than or equal to 40 characters
456+
```
457+
458+
This is **a separate field from `structuredOutput.name`** — both share the 40-char cap, but the enforcement sites are independent (see [structured-outputs.md](structured-outputs.md#structuredoutputname-is-limited-to-1-40-characters)). The constraint is not surfaced in the public schema reference; it's only enforced server-side at PATCH/POST time.
459+
460+
**Recommendation:** when generating descriptive assistant names from templates ("Triage Classifier — Multilingual Classic Variant" = 51 chars), trim before push or use shorter abbreviations. Put descriptive nuance in a comment in the YAML or in the system prompt body, not the `name` field.
461+
428462
### `silenceTimeoutSeconds` minimum is 10
429463

430464
The Vapi API enforces a hard minimum of **10 seconds** on `silenceTimeoutSeconds`. Setting this field to anything less than 10 (e.g., `5` or `8`) will fail at push time with:
@@ -448,6 +482,30 @@ The minimum is not documented in the gitops engine README and is only surfaced w
448482

449483
---
450484

485+
## PATCH /assistant/:id semantics: shallow replacement at the top-level field
486+
487+
`PATCH /assistant/:id` is partial-update at the **top level only** — fields not in the request body stay untouched. But within each field you DO send, replacement is **wholesale, NOT deep-merged**. `PATCH { hooks: [oneNewHook] }` leaves the assistant with exactly one hook even if it had three before.
488+
489+
The same shallow-replace rule applies to: `model.messages`, `analysisPlan`, `voice`, `transcriber`, `messagePlan`, `serverMessages`, and any other object or array field. Whatever subtree you send overwrites the entire subtree on the resource.
490+
491+
**Safe-append pattern** — GET → mutate the returned array/object → PATCH the full structure back:
492+
493+
```yaml
494+
# 1. GET /assistant/:id, capture existing.hooks
495+
# 2. Append your new hook locally
496+
# 3. PATCH with the full hooks array (existing + new)
497+
hooks:
498+
- { ...existing hook 1 }
499+
- { ...existing hook 2 }
500+
- { ...new hook you wanted to add }
501+
```
502+
503+
**Important distinction:** this is the REST API PATCH semantic. It is **different** from `assistantOverrides` in squad configs, which **deep-merges** partial nested objects per [multilingual.md → What Can Be Overridden](multilingual.md#what-can-be-overridden). When working through `assistantOverrides`, partial subtrees compose with the base assistant's config; when working through PATCH, partial subtrees replace.
504+
505+
See also: [fallbacks.md](fallbacks.md#phone-number-fallback-hook) for the same gotcha applied to phone-number hooks.
506+
507+
---
508+
451509
## Idle Messages (messagePlan)
452510

453511
### Defaults

docs/learnings/voice-providers.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,3 +95,75 @@ If a customer changes the provider on the dashboard and your local YAML still ha
9595
## Adding a new provider
9696

9797
If you find yourself reaching for a provider not in the table above, append a row here in the same PR. The cheat-sheet only stays useful if it grows with the platform.
98+
99+
---
100+
101+
## Pronunciation dictionary support: per-provider field shapes
102+
103+
Pronunciation dictionaries do not share a field shape across voice providers. Same conceptual feature, three different surfaces.
104+
105+
> **Public-docs note:** As of 2026-05-08 the public Vapi docs state pronunciation dictionaries are "exclusive to ElevenLabs voices." This is out of date — Cartesia has been confirmed in production deployments and Vapi-voice schema-level support is in active rollout (PRISM-474). Treat this wiki as the more current source.
106+
107+
### Cartesia
108+
109+
- **Field**: `voice.pronunciationDictId` — single string ID on the voice config.
110+
- **Model requirement**: `model: sonic-3` only. Other Cartesia models silently ignore the field.
111+
- **Upstream resource shape**: the Cartesia dictionary resource exposes a `name` field.
112+
- **Full config example**: see [assistants.md → Pronunciation dictionaries (TTS-level)](assistants.md#pronunciation-dictionaries-tts-level).
113+
114+
### ElevenLabs
115+
116+
- **Field**: `voice.pronunciationDictionaryLocators` — array of `{ pronunciationDictionaryId, versionId? }`.
117+
- **Model requirement**: alias rules work on all ElevenLabs models. **Phoneme rules are silently no-op'd** on `eleven_turbo_v2_5` (Vapi's default), `eleven_flash_v2_5`, `eleven_multilingual_v2`, and `eleven_v3`. See [ElevenLabs phoneme rule model compatibility](#elevenlabs-phoneme-rule-model-compatibility) below for the full breakdown.
118+
- **Upstream resource shape**: the ElevenLabs dictionary resource exposes a `dictionaryName` field — **NOT `name`**. This trips up wrappers that fetch dictionaries via API and surface them in tools that also handle Cartesia.
119+
120+
### Vapi voices
121+
122+
- **Schema-level**: accepts pronunciation dictionary configs at the API.
123+
- **Dashboard UI surface**: in active rollout (PRISM-474, Q2 2026). Schema acceptance does **not** guarantee runtime TTS engine honors the dictionary.
124+
- **Recommendation**: verify runtime behavior with a call test before depending on it for production Vapi-voice deployments.
125+
126+
### Field shape gotcha
127+
128+
The three provider families do NOT use the same field name on the upstream pronunciation-dictionary resource:
129+
130+
| Provider | Upstream display-name field |
131+
|---|---|
132+
| Cartesia | `name` |
133+
| ElevenLabs | `dictionaryName` |
134+
| Vapi voices | shape pending finalization |
135+
136+
If you're authoring a wrapper or migration tool that handles all three, gracefully handle the divergence. A single `name`-only path will silently render ElevenLabs dictionaries with empty labels.
137+
138+
### ElevenLabs phoneme rule model compatibility
139+
140+
ElevenLabs splits pronunciation rules into two types:
141+
142+
- **Alias rules** — word substitution ("MyBrand" → "my-brand"). **Work universally** on all ElevenLabs models.
143+
- **Phoneme rules** — exact pronunciation via IPA / CMU Arpabet. **Model-dependent.**
144+
145+
**Confirmed unsupported (silent no-op):**
146+
- `eleven_turbo_v2_5` — Vapi's default ElevenLabs model
147+
- `eleven_flash_v2_5`
148+
- `eleven_multilingual_v2`
149+
- `eleven_v3`
150+
151+
**Confirmed supported:**
152+
- `eleven_flash_v2`
153+
- Likely `eleven_monolingual_v1` (ElevenLabs docs disagree across pages on the exact set — verify before depending on it)
154+
155+
**Silent-skip behavior:** when a phoneme rule is sent to an unsupported model, ElevenLabs does NOT error. It bypasses the rule and uses standard pronunciation. **Customer impact:** attaching a phoneme-only dict to the default voice gets zero benefit with no signal — the call sounds exactly like the no-dict baseline.
156+
157+
**Workarounds:**
158+
1. **Author dict as alias rules** — they work everywhere. Trade phoneme precision for portability.
159+
2. **Pin to `eleven_flash_v2`** — explicit model lock if phoneme accuracy matters more than the latency profile of `eleven_turbo_v2_5` / `eleven_flash_v2_5`.
160+
161+
```yaml
162+
# Phoneme-rule-dependent — pin the model
163+
voice:
164+
provider: 11labs
165+
model: eleven_flash_v2
166+
voiceId: <your-voice-id>
167+
pronunciationDictionaryLocators:
168+
- pronunciationDictionaryId: <your-dict-id>
169+
```

docs/learnings/yaml-conventions.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,29 @@ The blank line after `---` is conventional; the strict requirement is just that
183183

184184
---
185185

186+
## Working with `.vapi-ignore`
187+
188+
`.vapi-ignore` lives at `resources/<org>/.vapi-ignore` and excludes specific resources from pull and push so the dashboard stays the source of truth for them. See `AGENTS.md` (line 13) for the basic gitignore-style syntax.
189+
190+
The recovery flow when a sync surfaces "drift" you didn't expect — typically prompted by "was that not in the .vapi-ignore?":
191+
192+
1. **Inspect first**, don't edit. Diff the file against `main` to see whether the path was already ignored:
193+
```bash
194+
git diff origin/main -- resources/<org>/.vapi-ignore
195+
```
196+
2. **If a dashboard-only asset is genuinely missing from `.vapi-ignore`**, add the pattern. Otherwise stop here — the asset belongs in yaml.
197+
3. **Dry-run before applying** to confirm only the intended assets will change:
198+
```bash
199+
npm run push -- <org> --dry-run
200+
```
201+
4. **Apply** once the dry-run is clean: `npm run push -- <org>`.
202+
203+
**Cardinal rule:** don't edit `.vapi-ignore` without explicit user direction. The file encodes intentional dashboard-vs-yaml ownership splits the user (or an earlier customer-engagement decision) knows about. Removing a pattern silently re-claims an asset for gitops control, which can blow away dashboard-only edits on the next push.
204+
205+
**Anti-pattern:** editing `.vapi-ignore` because a sync surfaced an unexpected diff is *removing the protection*, not fixing the cause. The cause is usually upstream: the asset was edited in both places, or a new asset that should be dashboard-owned was created via gitops. Resolve at the source, then leave `.vapi-ignore` alone.
206+
207+
---
208+
186209
## Cross-references
187210

188211
- `docs/learnings/assistants.md` — assistant-specific frontmatter authoring

0 commit comments

Comments
 (0)