Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
9d2bb81
docs: design for agent-framework init template
alexkroman-assembly Jun 15, 2026
88a32f2
docs: implementation plan for agent-framework template
alexkroman-assembly Jun 15, 2026
b71fc29
feat(init): register agent-framework template + inject TTS host
alexkroman-assembly Jun 15, 2026
7f432cf
feat(agent-framework): skeleton + shared static assets
alexkroman-assembly Jun 15, 2026
43a8945
feat(agent-framework): settings module
alexkroman-assembly Jun 15, 2026
78bd66e
test(agent-framework): assert settings defaults (mutation coverage)
alexkroman-assembly Jun 15, 2026
e5dbecf
feat(agent-framework): cascade pure helpers
alexkroman-assembly Jun 15, 2026
2667921
test(agent-framework): assert speech_model param (mutation coverage)
alexkroman-assembly Jun 15, 2026
58671d6
feat(agent-framework): cascade orchestrator
alexkroman-assembly Jun 15, 2026
d21ade0
test(agent-framework): cover live-client adapters + cancel/skip branches
alexkroman-assembly Jun 15, 2026
c515e18
feat(agent-framework): FastAPI app + websocket adapter
alexkroman-assembly Jun 15, 2026
0d78e7f
test(agent-framework): assert ws handler wires real settings into Dep…
alexkroman-assembly Jun 15, 2026
409d2d9
feat(agent-framework): frontend (cascade UI + /ws client)
alexkroman-assembly Jun 15, 2026
d998b75
feat(agent-framework): deploy, docs, and dependency scaffold
alexkroman-assembly Jun 15, 2026
b347f6c
test(init): refresh --help snapshot for agent-framework template
alexkroman-assembly Jun 15, 2026
e7a0df5
test(agent-framework): satisfy mypy on dynamic template-module imports
alexkroman-assembly Jun 15, 2026
9e9e2f3
test(agent-framework): allow None queue elements in FakeBrowser inbound
alexkroman-assembly Jun 15, 2026
2d6b099
test(agent-framework): make async-generator fakes fully reachable (vu…
alexkroman-assembly Jun 15, 2026
d99d277
test(agent-framework): split oversized test file under the 500-line gate
alexkroman-assembly Jun 15, 2026
da30455
chore(docs-gate): allowlist ASSEMBLYAI_TTS_HOST (template-deployment …
alexkroman-assembly Jun 15, 2026
abc6a9f
chore: ignore BLE001 for init templates instead of inline noqa
alexkroman-assembly Jun 15, 2026
7be5ad0
refactor(agent-framework): type with Protocols, drop Any from cascade…
alexkroman-assembly Jun 15, 2026
c2b77d8
fix(agent-framework): TTS flush message is 'Flush', not 'ForceFlushTe…
alexkroman-assembly Jun 15, 2026
fc93c92
feat(live-captions): mint streaming token via the assemblyai SDK
alexkroman-assembly Jun 15, 2026
f151a6b
refactor(live-captions): rename shadowing local + align assemblyai pi…
alexkroman-assembly Jun 15, 2026
9c6c9a5
fix(agent-framework): use a valid streaming-TTS preset voice (jane, n…
alexkroman-assembly Jun 15, 2026
a0ca813
feat(agent-framework): show only final transcripts + TaskGroup orches…
alexkroman-assembly Jun 15, 2026
acb581f
fix(agent-framework): skip LLM gateway chunks with no choices
alexkroman-assembly Jun 15, 2026
82f00d5
feat(init): per-template picker descriptions + TTS-readable agent prompt
alexkroman-assembly Jun 15, 2026
ea2707d
test(init): record picker choices on the fake namespace (pyright-clean)
alexkroman-assembly Jun 15, 2026
1e1662a
feat(agent-framework): conversation memory + per-sentence TTS streaming
alexkroman-assembly Jun 15, 2026
f3c8802
chore(pyright): match editor to gate for tests + add .vscode interpreter
alexkroman-assembly Jun 16, 2026
5aee6ff
refactor(init): rename template dirs to legal underscore names; decou…
alexkroman-assembly Jun 16, 2026
53d3ad8
refactor(init): make templates importable packages (relative imports …
alexkroman-assembly Jun 16, 2026
379e262
fix(template-gate): skip __pycache__ now that template dirs are packages
alexkroman-assembly Jun 16, 2026
fae0d49
refactor(init): type-check templates in-tree (drop mypy/pyright exclu…
alexkroman-assembly Jun 16, 2026
2800de9
chore: add validate-pyproject gate step + quiet stale TOML-schema edi…
alexkroman-assembly Jun 16, 2026
c075417
chore(pyright): re-list default excludes; point editor Mypy at the venv
alexkroman-assembly Jun 16, 2026
7607a9e
update
alexkroman-assembly Jun 16, 2026
0a36203
Fix greeting-audio null deref and quiet CodeQL Protocol-stub notes
alexkroman-assembly Jun 16, 2026
644513c
Fix greeting/barge-in/history correctness in agent-framework cascade
alexkroman-assembly Jun 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python",
"python.testing.pytestEnabled": true,
"python.testing.unittestEnabled": false,
"evenBetterToml.schema.enabled": false,
"mypy-type-checker.importStrategy": "fromEnvironment",
"mypy-type-checker.preferDaemon": true
}
10 changes: 9 additions & 1 deletion aai_cli/app/init_exec.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,11 @@ def _pick_template() -> str:
choice = questionary.select(
"Pick a template",
choices=[
questionary.Choice(title=templates.title_for(t), value=t)
questionary.Choice(
title=templates.title_for(t),
value=t,
description=templates.description_for(t),
)
for t in templates.TEMPLATE_ORDER
],
).ask()
Expand Down Expand Up @@ -101,6 +105,10 @@ def _active_env_vars() -> dict[str, str]:
"ASSEMBLYAI_STREAMING_HOST": env.streaming_host,
# Voice Agent host mirrors the streaming host's naming across environments.
"ASSEMBLYAI_AGENTS_HOST": env.streaming_host.replace("streaming", "agents", 1),
# Streaming-TTS host for the cascade (agent-framework) template. Empty in
# production, where streaming TTS has no host; that template then refuses to
# run and points at --sandbox.
"ASSEMBLYAI_TTS_HOST": env.streaming_tts_host,
}


Expand Down
13 changes: 10 additions & 3 deletions aai_cli/init/scaffold.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def _template_root(template: str) -> Traversable:
)
# Navigate from the `aai_cli.init` package (templates/ has no __init__.py, so it
# is not itself an importable package).
root = resources.files("aai_cli.init") / "templates" / template
root = resources.files("aai_cli.init") / "templates" / templates.dir_for(template)
# Defense in depth: the registry should only list shipped templates, but if it ever
# drifts ahead of the on-disk directories, fail cleanly instead of with a traceback.
if not root.is_dir():
Expand Down Expand Up @@ -76,18 +76,25 @@ def existing_env_key(target: Path) -> str | None:
return None


def _copy_tree(node: Traversable, dest: Path) -> None:
def _copy_tree(node: Traversable, dest: Path, *, top_level: bool = True) -> None:
for child in node.iterdir():
if child.name in _SKIP_NAMES or child.name.endswith(".pyc"):
continue
# The template dir is an importable package in-repo (so it can be type-checked),
# but its root __init__.py is just that in-repo marker — not part of the shipped
# app. Skip it so the scaffolded project root doesn't become a stray package.
# (api/'s own __init__.py is one level down and IS copied — the shipped app's
# `from . import settings` needs it.)
if top_level and child.name == "__init__.py":
continue
name = _DOTFILE_RENAMES.get(child.name, child.name)
out = dest / name
if child.is_dir():
# parents=True is an equivalent mutant here: the walk always creates a
# node's parent before descending, so `dest` (and `out.parent`) already
# exists. exist_ok is exercised by the idempotent re-scaffold test.
out.mkdir(parents=True, exist_ok=True) # pragma: no mutate
_copy_tree(child, out)
_copy_tree(child, out, top_level=False)
else:
out.parent.mkdir(parents=True, exist_ok=True) # pragma: no mutate
out.write_bytes(child.read_bytes())
Expand Down
24 changes: 0 additions & 24 deletions aai_cli/init/templates.py

This file was deleted.

50 changes: 50 additions & 0 deletions aai_cli/init/templates/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from __future__ import annotations

# id -> human-facing title shown in the picker. Ids are Vercel-style
# project/example slugs rather than CLI command names.
#
# Every id here MUST have a directory under templates/<id>/ (a test enforces both
# directions) — the picker must never advertise a template that would crash on scaffold.
TEMPLATES: dict[str, str] = {
"audio-transcription": "Audio Transcription",
"live-captions": "Live Captions",
"voice-agent": "Voice Agent",
"agent-framework": "Agent Framework",
}

# Display order for the picker and `--help`.
TEMPLATE_ORDER: tuple[str, ...] = (
"audio-transcription",
"live-captions",
"voice-agent",
"agent-framework",
)


# One-line description shown beside each title in the interactive picker. Keys must
# match TEMPLATES exactly (a test enforces both directions).
DESCRIPTIONS: dict[str, str] = {
"audio-transcription": "Transcribe audio & video files, URLs, and YouTube — speaker labels and audio intelligence",
"live-captions": "Live real-time captions from your microphone over the Streaming API",
"voice-agent": "Full-duplex voice agent (speech in, LLM reply, speech out) via the Voice Agent API",
"agent-framework": "Cascaded voice agent you orchestrate: Streaming STT, the LLM Gateway, and sandbox TTS",
}


def dir_for(name: str) -> str:
"""The on-disk template directory for an id: kebab id -> underscore package dir."""
return name.replace("-", "_")


def is_template(name: str) -> bool:
return name in TEMPLATES


def title_for(name: str) -> str:
"""The human title for a template id, or the raw id if unknown."""
return TEMPLATES.get(name, name)


def description_for(name: str) -> str:
"""The one-line picker description for a template id, or '' when unknown."""
return DESCRIPTIONS.get(name, "")
37 changes: 37 additions & 0 deletions aai_cli/init/templates/agent_framework/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Agent Notes

This is a buildless FastAPI + browser starter for a **cascaded** voice agent
(Streaming STT -> LLM Gateway -> streaming TTS), orchestrated server-side. Run it with:

```sh
assembly dev
```

## Map

- `api/settings.py`: API key, hosts, model, voice, system prompt, greeting, sample rates.
- `api/cascade.py`: the orchestrator — STT/TTS socket helpers, the LLM stream, turn
detection, barge-in, and the `/ws` browser adapter. Built with injected `Deps` so it
is tested against fakes.
- `api/index.py`: FastAPI app — serves the page/assets and the `/ws` WebSocket.
- `static/app.js`: WebSocket lifecycle, mic capture, UI state, and event handling
(`_CONFIG` block at the top is the primary edit point).
- `static/audio.js`: microphone pipeline, PCM conversion, playback queue, barge-in.
- `static/styles.css`: visual styling only; the top `:root` block is the theme edit point.
- `static/index.html`: page structure and static asset links.

## Change Points

- Model, voice, prompt, greeting, sample rates: edit `api/settings.py`.
- Cascade behavior (turn detection, barge-in, LLM->TTS piping): edit `api/cascade.py`.
- Transcript log rendering: edit `addTurn` in `static/app.js`.
- Playback, barge-in, or PCM conversion: edit `static/audio.js`.

## Invariants

- Never expose `ASSEMBLYAI_API_KEY` or any server secret in `static/`.
- Streaming TTS is sandbox-only; keep this app pointed at the sandbox hosts.
- `reply.audio` carries base64 PCM on the `data` field.
- The browser <-> backend event protocol matches the `voice-agent` template — keep it
stable so `static/audio.js` and the UI stay reusable.
- Keep the app buildless unless the user explicitly asks for a frontend toolchain.
49 changes: 49 additions & 0 deletions aai_cli/init/templates/agent_framework/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Talk to a cascaded voice agent — AssemblyAI agent-framework starter

Click connect and talk. Unlike the `voice-agent` template (which uses AssemblyAI's
all-in-one Voice Agent API), this app is a **cascade your own backend orchestrates**:
Streaming STT transcribes you, the LLM Gateway generates a reply, and streaming TTS
speaks it back — with turn detection and barge-in handled server-side. The browser
holds one WebSocket to your backend, so your API key never reaches the client.

## Sandbox-only

Streaming TTS has no production host, so the whole cascade runs against the AssemblyAI
sandbox with a sandbox key. Scaffold it that way:

```sh
assembly --sandbox init agent-framework
```

That pins the sandbox hosts in `.env`. Running against production exits with a hint.

## Run locally

```sh
assembly dev # opens http://localhost:3000 (allow microphone access; headphones recommended)
```

`ASSEMBLYAI_API_KEY` is read from `.env` (created for you by `assembly init`).

## Deploy

This app keeps a **long-running WebSocket**, so it needs a persistent process — not
Vercel's serverless functions. Use the shipped `Procfile`/`Dockerfile` on Render,
Railway, Fly.io, or Google Cloud Run (`gcloud run deploy --source .`):

```sh
uvicorn api.index:app --host 0.0.0.0 --port $PORT
```

Set `ASSEMBLYAI_API_KEY` and the three sandbox host vars (`ASSEMBLYAI_STREAMING_HOST`,
`ASSEMBLYAI_TTS_HOST`, `ASSEMBLYAI_LLM_GATEWAY_URL`) in the platform's environment.

## Ideas to extend

- Change the `MODEL`, `VOICE`, `SYSTEM_PROMPT`, `GREETING`, or `MAX_HISTORY` in
`api/settings.py`.
- Replies already stream into TTS sentence-by-sentence as the LLM produces them
(`_generate_reply` flushes on each `.`/`!`/`?`), and a sliding window of
`MAX_HISTORY` messages gives the agent memory of the conversation. Tune the
sentence boundary or `MAX_HISTORY` to trade latency, cost, and recall.
- Add tools (function calling) on the LLM leg so the agent can look things up.
Loading
Loading