Update AssemblyAI skill: Sync STT API, optional speech_models, streaming context carryover + SpeakerRevision, LLM Gateway global routing by dlange-aai · Pull Request #11 · AssemblyAI/assemblyai-skill

dlange-aai · 2026-06-05T01:55:07Z

Automated docs-sync run (2026-06-04). Reconciles the skill against the latest assemblyai-docs api-reference/ spec for changes landed since the last skill update (2026-05-28).

What changed & why

New: Sync STT API

New synchronous endpoint POST https://sync.assemblyai.com/transcribe for clips ≤120s — one request/response, no polling, no upload step, no transcript ID (docs PRs #51, #57; specs/sync-api.yaml).
Documented: required X-AAI-Model: u3-sync-pro header, multipart audio + optional config body, word_boost keyterms (Sync uses word_boost, not keyterms_prompt), audio limits (80ms–120s, ≤40MB, 16-bit), response shape (start_ms/end_ms in ms), US/EU residency endpoints, 30s deadline, and error codes.
Added to SKILL.md (base-URL row + dedicated section) and references/api-reference.md (full §16).

Corrections

speech_models is now optional for pre-recorded — defaults to ["universal-3-pro","universal-2"] (docs PR #41 / openapi.yaml removed it from required). The skill previously stated it was required and would fail if omitted. Streaming's speech_model is still required per the getting-started spec, so the entry now distinguishes the two. (SKILL.md, api-reference.md)
whisper-rt marked legacy — removed from the public model picker and the streaming speech_model spec enums (docs PRs #61/#62), but still functional via speech_model=whisper-rt. (SKILL.md, streaming.md)

New streaming features

Context carryover (agent_context, u3-rt-pro, on by default) — connection-time query param and mid-stream UpdateConfiguration (docs PR #59). New section + connection-param rows + gotcha.
SpeakerRevision message — emitted right before Termination when speaker_labels is enabled; revises only changed turns (turn_order-matched), never text/timestamps; ~400ms close latency (docs PR #45).
continuous_partials connection param (default false via API, true in LiveKit plugin) + interruption_delay effective-timing detail and update_options fields (docs PR #44). (streaming.md, voice-agents.md)

LLM Gateway

Global routing — new model_region: "global" request field for lower-cost provider global endpoints (Claude now, Gemini 3 soon), plus the July 1, 2026 +10% in-region price note (docs PR #48).
Added GPT-5.5 (gpt-5.5) to the model table (now in docs overview.mdx).

Notes

Only changes clearly documented in the spec were applied; no speculation.
Streaming speech_model kept as required because the universal-streaming.mdx getting-started parameter table still lists it Required with no default, despite a warning being removed from the model-selection page.
Existing file structure/formatting preserved.

…ing context carryover + SpeakerRevision, LLM Gateway global routing Sync from assemblyai-docs api-reference spec (changes since 2026-05-28): - Add Sync STT API (POST https://sync.assemblyai.com/transcribe): new synchronous endpoint for clips ≤120s — no polling/upload/transcript ID. Documents X-AAI-Model: u3-sync-pro header, multipart audio/config body, word_boost keyterms, audio limits, response shape (start_ms/end_ms), data-residency endpoints, and error codes. Added to SKILL.md (base URLs + section) and references/api-reference.md (full section). - speech_models is now OPTIONAL for pre-recorded (defaults to ["universal-3-pro","universal-2"]); fixed the Common Mistakes entry that said it was required. Streaming speech_model is still required per the getting-started spec. (SKILL.md, api-reference.md) - whisper-rt marked legacy: removed from the public model picker and the streaming spec speech_model enums (June 2026), still functional via speech_model=whisper-rt. (SKILL.md, streaming.md) - Streaming context carryover: agent_context connection/UpdateConfiguration param (u3-rt-pro), on by default. New section + connection-param rows + gotcha. (SKILL.md, streaming.md) - Streaming diarization SpeakerRevision message: emitted before Termination when speaker_labels is enabled; revises only changed turns. (SKILL.md, streaming.md) - continuous_partials connection param documented; interruption_delay effective-timing detail; LiveKit defaults (continuous_partials=true, update_options fields). (streaming.md, voice-agents.md) - LLM Gateway: add model_region:"global" (lower-cost global routing; Claude now, Gemini 3 soon), July 1 2026 +10% in-region price note, and GPT-5.5 model. (SKILL.md, llm-gateway.md)

aikido-pr-checks · 2026-06-05T01:55:24Z

+
+- **Endpoint:** `POST https://sync.assemblyai.com/transcribe` (global default — routes to nearest region; use `sync.us.assemblyai.com` / `sync.eu.assemblyai.com` for data residency)
+- **Required header:** `X-AAI-Model: u3-sync-pro` (only model available; uses Universal-3 Pro)
+- **Auth:** `Authorization: YOUR_API_KEY` (Bearer prefix optional here, unlike the async REST API; or pass `?token=YOUR_API_KEY`)


Avoid recommending passing API keys in URL query (?token=...) or making Bearer optional; this encourages insecure credential exposure—use consistent Authorization header handling instead.

Details

✨ AI Reasoning
The diff introduces guidance that recommends passing API keys in a query parameter (?token=YOUR_API_KEY) and states the Bearer prefix is optional for the Sync STT API. Recommending credentials in URLs is insecure (they can be logged, leaked via referer headers, saved in server logs, and cached). Stating the Bearer prefix is optional may confuse integrators and lead to inconsistent handling of credentials across endpoints, increasing the risk of accidental key exposure or misconfiguration. This is a security-related guidance change introduced by the PR and affects developer behavior.

🔧 How do I fix it?
Ensure skill actions match the description. Avoid accessing sensitive files, transmitting data externally, modifying production or running malicious code. Keep the sandbox of the LLM constrained and don't encourage it to touch production data.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

aikido-pr-checks Bot reviewed Jun 5, 2026

View reviewed changes

dlange-aai mentioned this pull request Jun 11, 2026

Update AssemblyAI skill: streaming continuous_partials default + Voice Focus, sync conversation_context, Voice Agent session.end + voice catalog, LLM Gateway error format #12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update AssemblyAI skill: Sync STT API, optional speech_models, streaming context carryover + SpeakerRevision, LLM Gateway global routing#11

Update AssemblyAI skill: Sync STT API, optional speech_models, streaming context carryover + SpeakerRevision, LLM Gateway global routing#11
dlange-aai wants to merge 1 commit into
mainfrom
update-assemblyai-skill-2026-06-04

dlange-aai commented Jun 5, 2026

Uh oh!

aikido-pr-checks Bot Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dlange-aai commented Jun 5, 2026

What changed & why

New: Sync STT API

Corrections

New streaming features

LLM Gateway

Notes

Uh oh!

aikido-pr-checks Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant