The AssemblyAI CLI (assembly) brings speech AI directly into your terminal: transcribe files, URLs, and YouTube/podcast pages, stream live audio, talk to a two-way voice agent, prompt the LLM Gateway, benchmark speech models, and scaffold ready-to-deploy starter apps.
Learn more about the platform in the AssemblyAI docs.
Install on macOS or Linux with Homebrew:
brew tap assemblyai/cli https://github.com/AssemblyAI/cli
brew trust assemblyai/cli # only needed when HOMEBREW_REQUIRE_TAP_TRUST is set; harmless otherwise
brew install assemblySign in (stores your API key in the OS keyring) and run your first transcription:
assembly login
assembly transcribe --sampleThat's it. Run assembly onboard for a guided tour, or see Installation for pipx/uv and other options.
- 🎯 One command for everything: transcription, real-time streaming, voice agents, LLM prompts, and WER benchmarking — no SDK boilerplate.
- 🔌 Built for pipelines: data goes to stdout, errors to stderr,
--jsongives stable machine-readable output, and-reads audio from stdin. - 🔐 Secure by default: your API key lives in the OS keyring, never in a dotfile — and run commands have no
--api-keyflag, so keys can't leak intopsor shell history. - 🛠️ From demo to deployed app:
assembly initscaffolds a runnable FastAPI starter,assembly dev/share/deployrun, tunnel, and ship it, and--show-codeprints the equivalent Python SDK script for any run command. - 🤖 Agent-ready:
assembly setup installwires your coding agent up with the AssemblyAI docs MCP server and skills. - 📖 Open source: MIT licensed.
| Command | What it does |
|---|---|
assembly transcribe |
Transcribe files, URLs, YouTube/podcast pages, directories, globs, or bucket storage (s3://, gs://, az://) — with speaker labels, PII redaction, summarization, SRT/VTT captions, and resumable batch runs |
assembly stream |
Real-time transcription from your microphone, a file, or a URL — on macOS it can capture system audio too |
assembly dictate |
Push-to-talk dictation: press Enter to record, Enter again for instant text (Sync STT API, up to 120 s per utterance) |
assembly agent |
Full-duplex spoken conversation with a voice agent, right in your terminal |
assembly speak |
Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) |
assembly llm |
Prompt the LLM Gateway over a transcript, stdin, or a live stream |
assembly clip |
Cut audio/video with ffmpeg by diarized speaker, text match, LLM pick, or time range (--video keeps the picture for URL sources) — clip boundaries snap into nearby silence |
assembly dub |
Re-voice an audio/video file or URL in another language: transcription, LLM translation, per-speaker TTS, ffmpeg track-swap (sandbox-only) |
assembly caption |
Burn always-visible captions into a video: transcribe (or reuse a transcript), fetch SRT, ffmpeg burns it in — audio untouched |
assembly eval |
Benchmark WER against Hugging Face datasets (built-in aliases: librispeech, tedlium, …) or local manifests |
assembly webhooks listen |
Open a public dev URL that prints webhook deliveries and can forward them to your local app |
assembly init / dev / share / deploy |
Scaffold a FastAPI + HTML starter app, run it locally, expose it on a public URL, ship it to Vercel / Railway / Fly.io |
assembly setup |
Wire a coding agent up with the AssemblyAI docs MCP server and skills |
assembly doctor |
Check your environment: API key, network, ffmpeg, microphone |
assembly transcripts / sessions |
Browse and fetch past transcripts and streaming sessions |
assembly keys / balance / usage / limits / audit |
Account self-service via browser login |
Add --show-code to transcribe / stream / agent to print the equivalent Python SDK script instead of running — the built-in path from CLI experiment to SDK code.
A few one-liners that show what assembly can do. The everyday basics live under Getting started below.
Note
speak and dub are sandbox-only today — that's why the examples below pass --sandbox.
Recreate a scene with synthetic voices — transcribe and diarize a YouTube clip, then pipe it straight into TTS with a different voice per speaker:
assembly transcribe "https://www.youtube.com/watch?v=awmCtXzFsJo" --speaker-labels \
| assembly --sandbox speak --voice A=jane --voice B=mary --out scene.wavspeak auto-detects Speaker A: labels, merges each speaker's turns, and rotates voices.
Dub a video into another language — the whole platform in one command: transcription with utterance timestamps, per-utterance LLM translation, TTS for each line (one voice per speaker), and ffmpeg laying the new track over the original video. A great demo is the first YouTube video ever, "Me at the zoo" — it's 19 seconds long, a single clear English speaker, and instantly recognizable, so the dub finishes fast and the before/after is obvious:
assembly --sandbox dub "https://www.youtube.com/watch?v=jNQXAC9IVRw" -l de --videoThe video stream is copied untouched; each dubbed line lands at its original start time.
Turn a podcast into audio — Apple and Spotify podcast pages work too (yt-dlp ingestion):
assembly transcribe "https://podcasts.apple.com/us/podcast/id1516093381" --speaker-labels \
| assembly --sandbox speak --out episode.wavCut the highlight reel from a speech — clip downloads the video (--video; omit it for audio-only clips), transcribes it, has an LLM pick the windows, and cuts each one into its own file with ffmpeg (here: Steve Jobs' Stanford commencement address):
assembly clip "https://www.youtube.com/watch?v=UF8uR6Z6KLc" --video \
--llm "the most quotable 20-40 seconds from each of the stories" \
--padding 0.5 --out-dir .Burn karaoke subtitles into a music video — caption transcribes the video and burns the captions straight into the picture with ffmpeg; --chars-per-caption keeps the lines short so they flip with the vocals:
assembly caption video.mp4 --chars-per-caption 24 --font-size 28Keep a live to-do list from your mic — llm -f re-runs the prompt over the growing transcript, updating in place:
assembly stream -o text | assembly llm -f "summarize my to-dos as I talk"Caption a meeting from system audio (macOS) — captures app/system audio alongside your mic as separate diarized speakers:
assembly stream --system-audio --speaker-labels -o textGet pinged when your name comes up in a live meeting:
assembly stream -o text | grep --line-buffered -i alex \
| while read -r _; do afplay /System/Library/Sounds/Glass.aiff; doneChain LLM prompts over a transcript — each prompt runs on the finished transcript:
assembly transcribe --sample --llm "summarize" --llm "translate the summary to French"Talk to a voice agent in your terminal — full-duplex, around 20 voices:
assembly agent --voice ivy --system-prompt "you're a helpful interviewer"Graduate to the SDK — --show-code prints the equivalent Python script for any transcribe/stream/agent run instead of executing it:
assembly agent --system-prompt "you're a story generator" --show-code > story.pyScaffold and deploy a voice agent — templates: voice-agent, audio-transcription, live-captions:
assembly init voice-agent && assembly deploy --prodBenchmark WER against public datasets — built-in aliases for LibriSpeech, TEDLIUM, and more:
assembly eval librispeech --speech-model universal-3-pro --limit 50Requires Python 3.12+ (Homebrew brings its own; for pipx/uv see the --python hint below).
Warning
The assemblyai-cli package on PyPI is not this project — install with one of the
commands below, not pip install assemblyai-cli.
brew tap assemblyai/cli https://github.com/AssemblyAI/cli
brew trust assemblyai/cli # only needed when HOMEBREW_REQUIRE_TAP_TRUST is set; harmless otherwise
brew install assemblyHomebrew pulls in ffmpeg and portaudio, so every command works out of the box.
pipx install "git+https://github.com/AssemblyAI/cli.git"
# or
uv tool install "git+https://github.com/AssemblyAI/cli.git"If your default interpreter is older than Python 3.12, add --python python3.12 (pipx) or
--python 3.12 (uv) to the install command.
System dependencies for the live-audio commands (pipx/uv installs only)
Only the live-audio commands need anything extra: stream, dictate, and agent use PortAudio for
microphone capture and ffmpeg on PATH to stream non-WAV audio.
Plain transcribe uploads your file directly and needs neither.
- Debian/Ubuntu:
sudo apt-get install libportaudio2 ffmpeg - Fedora:
sudo dnf install portaudio ffmpeg - macOS (Homebrew):
brew install portaudio ffmpeg
New to AssemblyAI? Create a free account at assemblyai.com/dashboard to get an API key.
✨ Best for: day-to-day use on your own machine.
Browser login stores your API key in the OS keyring (Keychain / Credential Manager / Secret Service) — nothing lands in a dotfile, and it unlocks the account commands (keys, balance, usage, limits, sessions, audit):
assembly login✨ Best for: CI, containers, and anywhere a browser isn't an option.
The environment variable is checked before the keyring, and nothing is written to disk:
export ASSEMBLYAI_API_KEY="YOUR_API_KEY"Sign in, run a first transcription, start building:
assembly onboardassembly transcribe --sample # transcribe the hosted sample file
assembly transcribe call.mp3 # then your own audio
assembly stream --sample # live streaming, no microphone needed
assembly stream # stream your microphone (Ctrl-C to stop)
assembly agent # talk to a voice agent (use headphones)
assembly init # scaffold a starter appPull exactly the output you need:
assembly transcribe call.mp3 -o text # just the text
assembly transcribe video.mp4 -o srt # captions
assembly transcribe call.mp3 --speaker-labels --summarization --jsonTranscribe in batches — a directory, a glob, or a piped list, resumable on re-run:
assembly transcribe ./recordings
assembly transcribe "s3://bucket/calls/*.mp3" # needs: pip install s3fs
find . -name "*.wav" | assembly transcribe --from-stdinCompose with other tools — audio in, text out:
ffmpeg -i talk.mp4 -f wav - | assembly transcribe -
git log --oneline -30 | assembly llm "write release notes grouped by feature/fix"- Run
assembly --helporassembly <command> --helpfor flags and examples. - Run
assembly doctorto check your environment (API key, network, ffmpeg, microphone).
- AssemblyAI docs — guides for every model and feature.
- API reference — the REST and streaming APIs the CLI drives.
- Dashboard — manage your account and API keys.
This project uses uv:
uv sync # create/refresh the venv
uv run assembly --help # run the CLI from the locked environment
./scripts/check.sh # the full gate CI runsSee AGENTS.md for development conventions and architecture notes.
- License: released under the MIT license.
- Privacy: AssemblyAI privacy policy — the CLI's anonymous usage telemetry is opt-out (
assembly telemetry disable,AAI_TELEMETRY_DISABLED=1, orDO_NOT_TRACK=1). - Terms: AssemblyAI terms of service.
