Update AssemblyAI skill: Sync STT API, optional speech_models, streaming context carryover + SpeakerRevision, LLM Gateway global routing#11
Conversation
…ing context carryover + SpeakerRevision, LLM Gateway global routing Sync from assemblyai-docs api-reference spec (changes since 2026-05-28): - Add Sync STT API (POST https://sync.assemblyai.com/transcribe): new synchronous endpoint for clips ≤120s — no polling/upload/transcript ID. Documents X-AAI-Model: u3-sync-pro header, multipart audio/config body, word_boost keyterms, audio limits, response shape (start_ms/end_ms), data-residency endpoints, and error codes. Added to SKILL.md (base URLs + section) and references/api-reference.md (full section). - speech_models is now OPTIONAL for pre-recorded (defaults to ["universal-3-pro","universal-2"]); fixed the Common Mistakes entry that said it was required. Streaming speech_model is still required per the getting-started spec. (SKILL.md, api-reference.md) - whisper-rt marked legacy: removed from the public model picker and the streaming spec speech_model enums (June 2026), still functional via speech_model=whisper-rt. (SKILL.md, streaming.md) - Streaming context carryover: agent_context connection/UpdateConfiguration param (u3-rt-pro), on by default. New section + connection-param rows + gotcha. (SKILL.md, streaming.md) - Streaming diarization SpeakerRevision message: emitted before Termination when speaker_labels is enabled; revises only changed turns. (SKILL.md, streaming.md) - continuous_partials connection param documented; interruption_delay effective-timing detail; LiveKit defaults (continuous_partials=true, update_options fields). (streaming.md, voice-agents.md) - LLM Gateway: add model_region:"global" (lower-cost global routing; Claude now, Gemini 3 soon), July 1 2026 +10% in-region price note, and GPT-5.5 model. (SKILL.md, llm-gateway.md)
|
|
||
| - **Endpoint:** `POST https://sync.assemblyai.com/transcribe` (global default — routes to nearest region; use `sync.us.assemblyai.com` / `sync.eu.assemblyai.com` for data residency) | ||
| - **Required header:** `X-AAI-Model: u3-sync-pro` (only model available; uses Universal-3 Pro) | ||
| - **Auth:** `Authorization: YOUR_API_KEY` (Bearer prefix optional here, unlike the async REST API; or pass `?token=YOUR_API_KEY`) |
There was a problem hiding this comment.
Avoid recommending passing API keys in URL query (?token=...) or making Bearer optional; this encourages insecure credential exposure—use consistent Authorization header handling instead.
Details
✨ AI Reasoning
The diff introduces guidance that recommends passing API keys in a query parameter (?token=YOUR_API_KEY) and states the Bearer prefix is optional for the Sync STT API. Recommending credentials in URLs is insecure (they can be logged, leaked via referer headers, saved in server logs, and cached). Stating the Bearer prefix is optional may confuse integrators and lead to inconsistent handling of credentials across endpoints, increasing the risk of accidental key exposure or misconfiguration. This is a security-related guidance change introduced by the PR and affects developer behavior.
🔧 How do I fix it?
Ensure skill actions match the description. Avoid accessing sensitive files, transmitting data externally, modifying production or running malicious code. Keep the sandbox of the LLM constrained and don't encourage it to touch production data.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
Automated docs-sync run (2026-06-04). Reconciles the skill against the latest assemblyai-docs
api-reference/spec for changes landed since the last skill update (2026-05-28).What changed & why
New: Sync STT API
POST https://sync.assemblyai.com/transcribefor clips ≤120s — one request/response, no polling, no upload step, no transcript ID (docs PRs #51, #57;specs/sync-api.yaml).X-AAI-Model: u3-sync-proheader, multipartaudio+ optionalconfigbody,word_boostkeyterms (Sync usesword_boost, notkeyterms_prompt), audio limits (80ms–120s, ≤40MB, 16-bit), response shape (start_ms/end_msin ms), US/EU residency endpoints, 30s deadline, and error codes.SKILL.md(base-URL row + dedicated section) andreferences/api-reference.md(full §16).Corrections
speech_modelsis now optional for pre-recorded — defaults to["universal-3-pro","universal-2"](docs PR #41 /openapi.yamlremoved it fromrequired). The skill previously stated it was required and would fail if omitted. Streaming'sspeech_modelis still required per the getting-started spec, so the entry now distinguishes the two. (SKILL.md,api-reference.md)whisper-rtmarked legacy — removed from the public model picker and the streamingspeech_modelspec enums (docs PRs #61/#62), but still functional viaspeech_model=whisper-rt. (SKILL.md,streaming.md)New streaming features
agent_context, u3-rt-pro, on by default) — connection-time query param and mid-streamUpdateConfiguration(docs PR #59). New section + connection-param rows + gotcha.Terminationwhenspeaker_labelsis enabled; revises only changed turns (turn_order-matched), never text/timestamps; ~400ms close latency (docs PR #45).continuous_partialsconnection param (defaultfalsevia API,truein LiveKit plugin) +interruption_delayeffective-timing detail andupdate_optionsfields (docs PR #44). (streaming.md,voice-agents.md)LLM Gateway
model_region: "global"request field for lower-cost provider global endpoints (Claude now, Gemini 3 soon), plus the July 1, 2026 +10% in-region price note (docs PR #48).gpt-5.5) to the model table (now in docsoverview.mdx).Notes
speech_modelkept as required because theuniversal-streaming.mdxgetting-started parameter table still lists it Required with no default, despite a warning being removed from the model-selection page.