Support batch transcript fetching from piped stdin by alexkroman · Pull Request #180 · AssemblyAI/cli

alexkroman · 2026-06-16T17:19:59Z

Enable the transcripts get command to read multiple transcript IDs from piped stdin, composing naturally with transcripts list --json and other upstream commands.

Summary

This change makes transcripts get accept zero or more transcript IDs from stdin when no positional argument is provided, enabling pipelines like:

assembly transcripts list --json | assembly transcripts get -o text

The command maintains backward compatibility: a positional ID still works as before, and the output shape differs between single-fetch (unchanged) and batch modes (NDJSON with --json, plain text otherwise).

Key Changes

_resolve_ids() helper: Determines whether to fetch a single positional ID or read multiple IDs from stdin. Raises UsageError if neither is provided or stdin contains no valid IDs, preventing hangs on interactive invocation.
_emit_transcript() helper: Centralizes transcript output logic, handling four distinct modes:
- Single-fetch with -o field: raw field output (pipeline-friendly)
- Single-fetch with --json: full SDK payload (matches transcribe --json)
- Batch with --json: NDJSON records tagged with "type": "transcript" (CLI-wide discriminator)
- Batch without --json: plain text (one transcript per line)
parse_transcript_ids() in client.py: Flexible stdin parser accepting:
- JSON array from transcripts list --json (extracts id field)
- Single transcript JSON object (extracts id field)
- Plain text with one ID per line (e.g., from jq -r '.[].id')
- Falls back gracefully between formats; deduplicates while preserving order
Argument change: transcript_id is now optional (str | None), with help text clarifying stdin fallback.
Test coverage: Added comprehensive tests for ID parsing, batch fetching, error cases (no ID + no stdin, empty stdin), and output format validation (single-fetch JSON excludes "type" wrapper; batch JSON includes it).

Implementation Details

The batch flag returned by _resolve_ids() controls output formatting, ensuring NDJSON wrapping only appears in true batch mode.
Validation happens early (before API calls) for both single and batch IDs.
Error handling distinguishes between usage errors (missing ID/stdin) and API errors (failed transcript fetch).
Snapshot tests updated to reflect the optional argument syntax.

https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst

`transcripts get` now fetches multiple transcripts when ids are piped on stdin and the positional id is omitted, so the collection commands compose as a Unix pipeline: assembly transcripts list --json | assembly transcripts get -o text A new `client.parse_transcript_ids` accepts the shapes a pipeline naturally produces — the JSON array from `transcripts list --json`, a single transcript JSON object, or plain text with one id per line (`jq -r '.[].id'`) — so the pipeline works with or without jq. In stdin (batch) mode `--json` emits one NDJSON `{"type": "transcript", ...}` record per id, mirroring `transcribe` batch, while a single positional fetch keeps its existing output shape. https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst

…-496nhb

Mirrors the map-reduce LLM vocabulary `transcribe` gained in #179, completing the pipeline `transcripts list | transcripts get` so fetched transcripts can be summarized or aggregated without a second tool: # map: summarize each transcript in a piped list assembly transcripts list --json | assembly transcripts get --llm "Summarize this call" # reduce: one ranking across all of them assembly transcripts list --json | assembly transcripts get --llm-reduce "Rank these worst-to-best" `--llm` runs a per-transcript chain (server-injected by id via `llm.run_chain_steps`); `--llm-reduce` runs one chain over all fetched transcripts (`llm.run_chain`), emitting the same additive `{"type":"reduce",...}` NDJSON record transcribe does. A single positional id folds the reduce prompts into the `--llm` chain (nothing to aggregate), matching transcribe's single-source behavior. Human reduce keeps stdout clean by suppressing the per-transcript output; --json keeps the per-id stream and appends the reduce record. Reuses core/llm.py and transcribe's render_transform_steps — no new engine. The get/list pipeline tests move to tests/test_transcripts_pipeline.py to keep both test modules under the 500-line gate. https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst

aikido-pr-checks · 2026-06-16T17:54:40Z

+    if output_field is not None:
+        # -o wins over the chain, matching `transcribe` deliver_result precedence.
+        output.emit_text(
+            client.select_transcript_field(
+                transcript, output_field, chars_per_caption=chars_per_caption
+            )
+        )


_deliver_transcript ignores suppress when output_field is set, so per-transcript output still prints during reduce-suppressed runs, contradicting the function’s suppression logic.

Show fix

Suggested change

if output_field is not None:

# -o wins over the chain, matching `transcribe` deliver_result precedence.

output.emit_text(

client.select_transcript_field(

transcript, output_field, chars_per_caption=chars_per_caption

)

)

if output_field is not None:

if not suppress:

output.emit_text(

client.select_transcript_field(

transcript, output_field, chars_per_caption=chars_per_caption

)

)

Details

✨ AI Reasoning
The function is designed to optionally skip per-transcript emission when a reduce step will follow. That behavior is implemented for the transform and default transcript branches, but not for the single-field branch. As a result, when reduce mode intends to suppress per-item output, selecting a field still prints each item, violating the function’s own contract and making the suppression path inconsistent.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

`_deliver_transcript`'s `-o` branch ignored the `suppress` flag, so `transcripts get -o text --llm-reduce …` in human mode printed each transcript's field *and* the reduce result. Guard the field emit with `if not suppress:` so it matches the transform/default branches: under a pending human reduce only the aggregate reaches stdout, while each field still feeds the reduce. (Flagged in review.) https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst

claude added 3 commits June 16, 2026 17:07

Merge remote-tracking branch 'origin/main' into claude/admiring-gauss…

2e59c52

…-496nhb

aikido-pr-checks Bot reviewed Jun 16, 2026

View reviewed changes

alexkroman added this pull request to the merge queue Jun 16, 2026

alexkroman removed this pull request from the merge queue due to a manual request Jun 16, 2026

alexkroman enabled auto-merge June 16, 2026 18:23

alexkroman added this pull request to the merge queue Jun 16, 2026

Merged via the queue into main with commit 0bdec6c Jun 16, 2026
19 checks passed

alexkroman deleted the claude/admiring-gauss-496nhb branch June 16, 2026 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support batch transcript fetching from piped stdin#180

Support batch transcript fetching from piped stdin#180
alexkroman merged 4 commits into
mainfrom
claude/admiring-gauss-496nhb

alexkroman commented Jun 16, 2026

Uh oh!

aikido-pr-checks Bot Jun 16, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexkroman commented Jun 16, 2026

Summary

Key Changes

Implementation Details

Uh oh!

aikido-pr-checks Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants