Skip to content

Commit f72b89e

Browse files
committed
docs(learnings): document Soniox customVocabulary and multilingual support
Soniox stt-rt-v4 supports customVocabulary: [...] (the equivalent of Deepgram keyterm) and languages: [en, es, ...] for code-switching on a single universal model. Add Soniox row to the multilingual transcriber comparison, add a sample config, and call it out as the strongest pick when code-switching detection AND vocabulary boosting are both needed.
1 parent d908e9f commit f72b89e

2 files changed

Lines changed: 35 additions & 2 deletions

File tree

docs/learnings/assistants.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,22 @@ transcriber:
167167
- technical-acronym
168168
```
169169

170+
**Soniox supports the same idea as `customVocabulary`.** Soniox `stt-rt-v4` (a single universal model that handles all 60+ languages) accepts `customVocabulary: [...]` — an array of strings that biases recognition toward domain-specific terms. This is the Soniox equivalent of Deepgram `keyterm`, and unlike Deepgram nova-3, it works in multilingual mode without the English-bias caveat documented in [multilingual.md](multilingual.md). Pair with `languages: [en, es]` for code-switching plus vocabulary boost in the same call.
171+
172+
```yaml
173+
transcriber:
174+
provider: soniox
175+
model: stt-rt-v4
176+
language: en
177+
languages: [en, es] # optional; omit for single-language
178+
customVocabulary:
179+
- your-brand-name
180+
- industry-specific-term
181+
- product-name
182+
- tarjeta de combustible # non-English equivalents are fine
183+
confidenceThreshold: 0.3
184+
```
185+
170186
### Deepgram Flux: end-of-turn detection knobs
171187

172188
Vapi exposes all four of Deepgram Flux's end-of-turn detection parameters on the `transcriber` schema. They only apply when `model` starts with `flux-`.

docs/learnings/multilingual.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,9 @@ Use one assistant with a multilingual transcriber and voice.
2525

2626
| Provider | Config | How it works |
2727
|----------|--------|--------------|
28-
| **Deepgram** (recommended) | `language: "multi"` on nova-3 | Auto-detects per-utterance across supported languages |
28+
| **Deepgram** (recommended) | `language: "multi"` on nova-3 | Auto-detects per-utterance across supported languages. Supports `keyterm` for vocabulary boosting (since Nov 2025). |
2929
| **Gladia** | `languageBehaviour: "automatic multiple languages"` | V2 solaria model with native code-switching |
30+
| **Soniox** | `model: stt-rt-v4`, `languages: [en, es]` | Single universal model handles 60+ languages — no per-language model swap. Supports `customVocabulary: [...]` for vocabulary boosting and `maxEndpointDelayMs` for turn-taking tuning. |
3031
| **Speechmatics** | `language: "en_es"` | Bilingual mode for Spanish+English |
3132
| **AssemblyAI** | `language: "multi"` | Universal streaming multilingual model |
3233

@@ -263,10 +264,26 @@ voice: { provider: eleven-labs, voiceId: your-spanish-voice }
263264

264265
This is most visible when a Spanish-only customer is misrecognized as English on their first utterance, which then cascades — the assistant responds in English, the customer gets confused, and the loop continues.
265266

266-
**Recommendation for code-switching customers:** Use **Gladia Solaria** (`provider: gladia`, `languageBehaviour: automatic multiple languages`) instead of Deepgram `language: multi`. Solaria is built around code-switching as a first-class case and isn't biased by `keyterm` content the same way. See [Approach 1](#approach-1-single-static-agent) for the full transcriber comparison.
267+
**Recommendation for code-switching customers:** Use **Gladia Solaria** (`provider: gladia`, `languageBehaviour: automatic multiple languages`) or **Soniox** (`provider: soniox`, `model: stt-rt-v4`, `languages: [en, es]`) instead of Deepgram `language: multi`. Both are built around code-switching as a first-class case and aren't biased by vocabulary-boost content the same way. Soniox is the strongest pick when you need code-switching detection AND vocabulary boosting in the same call — it supports `customVocabulary: [...]` (the equivalent of Deepgram's `keyterm`) on the same single universal model that handles all 60+ languages. See [Approach 1](#approach-1-single-static-agent) for the full transcriber comparison.
267268

268269
**If you must stay on Deepgram multi:** Keep `keyterm` short (under 20 entries), include the customer's expected non-English equivalents, and avoid English-only acronyms that have no foreign-language form.
269270

271+
### Soniox sample config (multilingual + vocabulary boost)
272+
273+
```yaml
274+
transcriber:
275+
provider: soniox
276+
model: stt-rt-v4
277+
language: en # primary / default language (ISO 639-1)
278+
languages: [en, es] # hint set for code-switching; omit for single-language
279+
customVocabulary:
280+
- your-brand-name
281+
- domain-specific-term
282+
- non-English-equivalent
283+
confidenceThreshold: 0.3
284+
# maxEndpointDelayMs: 800 # optional turn-taking tuning
285+
```
286+
270287
---
271288

272289
## Further Reading

0 commit comments

Comments
 (0)