Skip to content

Update AssemblyAI skill: Sync STT API, optional speech_models, streaming context carryover + SpeakerRevision, LLM Gateway global routing#11

Open
dlange-aai wants to merge 1 commit into
mainfrom
update-assemblyai-skill-2026-06-04
Open

Update AssemblyAI skill: Sync STT API, optional speech_models, streaming context carryover + SpeakerRevision, LLM Gateway global routing#11
dlange-aai wants to merge 1 commit into
mainfrom
update-assemblyai-skill-2026-06-04

Conversation

@dlange-aai

Copy link
Copy Markdown
Collaborator

Automated docs-sync run (2026-06-04). Reconciles the skill against the latest assemblyai-docs api-reference/ spec for changes landed since the last skill update (2026-05-28).

What changed & why

New: Sync STT API

  • New synchronous endpoint POST https://sync.assemblyai.com/transcribe for clips ≤120s — one request/response, no polling, no upload step, no transcript ID (docs PRs #51, #57; specs/sync-api.yaml).
  • Documented: required X-AAI-Model: u3-sync-pro header, multipart audio + optional config body, word_boost keyterms (Sync uses word_boost, not keyterms_prompt), audio limits (80ms–120s, ≤40MB, 16-bit), response shape (start_ms/end_ms in ms), US/EU residency endpoints, 30s deadline, and error codes.
  • Added to SKILL.md (base-URL row + dedicated section) and references/api-reference.md (full §16).

Corrections

  • speech_models is now optional for pre-recorded — defaults to ["universal-3-pro","universal-2"] (docs PR #41 / openapi.yaml removed it from required). The skill previously stated it was required and would fail if omitted. Streaming's speech_model is still required per the getting-started spec, so the entry now distinguishes the two. (SKILL.md, api-reference.md)
  • whisper-rt marked legacy — removed from the public model picker and the streaming speech_model spec enums (docs PRs #61/#62), but still functional via speech_model=whisper-rt. (SKILL.md, streaming.md)

New streaming features

  • Context carryover (agent_context, u3-rt-pro, on by default) — connection-time query param and mid-stream UpdateConfiguration (docs PR #59). New section + connection-param rows + gotcha.
  • SpeakerRevision message — emitted right before Termination when speaker_labels is enabled; revises only changed turns (turn_order-matched), never text/timestamps; ~400ms close latency (docs PR #45).
  • continuous_partials connection param (default false via API, true in LiveKit plugin) + interruption_delay effective-timing detail and update_options fields (docs PR #44). (streaming.md, voice-agents.md)

LLM Gateway

  • Global routing — new model_region: "global" request field for lower-cost provider global endpoints (Claude now, Gemini 3 soon), plus the July 1, 2026 +10% in-region price note (docs PR #48).
  • Added GPT-5.5 (gpt-5.5) to the model table (now in docs overview.mdx).

Notes

  • Only changes clearly documented in the spec were applied; no speculation.
  • Streaming speech_model kept as required because the universal-streaming.mdx getting-started parameter table still lists it Required with no default, despite a warning being removed from the model-selection page.
  • Existing file structure/formatting preserved.

…ing context carryover + SpeakerRevision, LLM Gateway global routing

Sync from assemblyai-docs api-reference spec (changes since 2026-05-28):

- Add Sync STT API (POST https://sync.assemblyai.com/transcribe): new
  synchronous endpoint for clips ≤120s — no polling/upload/transcript ID.
  Documents X-AAI-Model: u3-sync-pro header, multipart audio/config body,
  word_boost keyterms, audio limits, response shape (start_ms/end_ms),
  data-residency endpoints, and error codes. Added to SKILL.md (base URLs
  + section) and references/api-reference.md (full section).
- speech_models is now OPTIONAL for pre-recorded (defaults to
  ["universal-3-pro","universal-2"]); fixed the Common Mistakes entry that
  said it was required. Streaming speech_model is still required per the
  getting-started spec. (SKILL.md, api-reference.md)
- whisper-rt marked legacy: removed from the public model picker and the
  streaming spec speech_model enums (June 2026), still functional via
  speech_model=whisper-rt. (SKILL.md, streaming.md)
- Streaming context carryover: agent_context connection/UpdateConfiguration
  param (u3-rt-pro), on by default. New section + connection-param rows +
  gotcha. (SKILL.md, streaming.md)
- Streaming diarization SpeakerRevision message: emitted before Termination
  when speaker_labels is enabled; revises only changed turns. (SKILL.md,
  streaming.md)
- continuous_partials connection param documented; interruption_delay
  effective-timing detail; LiveKit defaults (continuous_partials=true,
  update_options fields). (streaming.md, voice-agents.md)
- LLM Gateway: add model_region:"global" (lower-cost global routing; Claude
  now, Gemini 3 soon), July 1 2026 +10% in-region price note, and GPT-5.5
  model. (SKILL.md, llm-gateway.md)

- **Endpoint:** `POST https://sync.assemblyai.com/transcribe` (global default — routes to nearest region; use `sync.us.assemblyai.com` / `sync.eu.assemblyai.com` for data residency)
- **Required header:** `X-AAI-Model: u3-sync-pro` (only model available; uses Universal-3 Pro)
- **Auth:** `Authorization: YOUR_API_KEY` (Bearer prefix optional here, unlike the async REST API; or pass `?token=YOUR_API_KEY`)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid recommending passing API keys in URL query (?token=...) or making Bearer optional; this encourages insecure credential exposure—use consistent Authorization header handling instead.

Details

✨ AI Reasoning
​The diff introduces guidance that recommends passing API keys in a query parameter (?token=YOUR_API_KEY) and states the Bearer prefix is optional for the Sync STT API. Recommending credentials in URLs is insecure (they can be logged, leaked via referer headers, saved in server logs, and cached). Stating the Bearer prefix is optional may confuse integrators and lead to inconsistent handling of credentials across endpoints, increasing the risk of accidental key exposure or misconfiguration. This is a security-related guidance change introduced by the PR and affects developer behavior.

🔧 How do I fix it?
Ensure skill actions match the description. Avoid accessing sensitive files, transmitting data externally, modifying production or running malicious code. Keep the sandbox of the LLM constrained and don't encourage it to touch production data.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant