Support batch transcript fetching from piped stdin#180
Conversation
`transcripts get` now fetches multiple transcripts when ids are piped on
stdin and the positional id is omitted, so the collection commands compose
as a Unix pipeline:
assembly transcripts list --json | assembly transcripts get -o text
A new `client.parse_transcript_ids` accepts the shapes a pipeline naturally
produces — the JSON array from `transcripts list --json`, a single transcript
JSON object, or plain text with one id per line (`jq -r '.[].id'`) — so the
pipeline works with or without jq. In stdin (batch) mode `--json` emits one
NDJSON `{"type": "transcript", ...}` record per id, mirroring `transcribe`
batch, while a single positional fetch keeps its existing output shape.
https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst
Mirrors the map-reduce LLM vocabulary `transcribe` gained in #179, completing the pipeline `transcripts list | transcripts get` so fetched transcripts can be summarized or aggregated without a second tool: # map: summarize each transcript in a piped list assembly transcripts list --json | assembly transcripts get --llm "Summarize this call" # reduce: one ranking across all of them assembly transcripts list --json | assembly transcripts get --llm-reduce "Rank these worst-to-best" `--llm` runs a per-transcript chain (server-injected by id via `llm.run_chain_steps`); `--llm-reduce` runs one chain over all fetched transcripts (`llm.run_chain`), emitting the same additive `{"type":"reduce",...}` NDJSON record transcribe does. A single positional id folds the reduce prompts into the `--llm` chain (nothing to aggregate), matching transcribe's single-source behavior. Human reduce keeps stdout clean by suppressing the per-transcript output; --json keeps the per-id stream and appends the reduce record. Reuses core/llm.py and transcribe's render_transform_steps — no new engine. The get/list pipeline tests move to tests/test_transcripts_pipeline.py to keep both test modules under the 500-line gate. https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst
| if output_field is not None: | ||
| # -o wins over the chain, matching `transcribe` deliver_result precedence. | ||
| output.emit_text( | ||
| client.select_transcript_field( | ||
| transcript, output_field, chars_per_caption=chars_per_caption | ||
| ) | ||
| ) |
There was a problem hiding this comment.
_deliver_transcript ignores suppress when output_field is set, so per-transcript output still prints during reduce-suppressed runs, contradicting the function’s suppression logic.
Show fix
| if output_field is not None: | |
| # -o wins over the chain, matching `transcribe` deliver_result precedence. | |
| output.emit_text( | |
| client.select_transcript_field( | |
| transcript, output_field, chars_per_caption=chars_per_caption | |
| ) | |
| ) | |
| if output_field is not None: | |
| if not suppress: | |
| output.emit_text( | |
| client.select_transcript_field( | |
| transcript, output_field, chars_per_caption=chars_per_caption | |
| ) | |
| ) |
Details
✨ AI Reasoning
The function is designed to optionally skip per-transcript emission when a reduce step will follow. That behavior is implemented for the transform and default transcript branches, but not for the single-field branch. As a result, when reduce mode intends to suppress per-item output, selecting a field still prints each item, violating the function’s own contract and making the suppression path inconsistent.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
`_deliver_transcript`'s `-o` branch ignored the `suppress` flag, so `transcripts get -o text --llm-reduce …` in human mode printed each transcript's field *and* the reduce result. Guard the field emit with `if not suppress:` so it matches the transform/default branches: under a pending human reduce only the aggregate reaches stdout, while each field still feeds the reduce. (Flagged in review.) https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst
Enable the
transcripts getcommand to read multiple transcript IDs from piped stdin, composing naturally withtranscripts list --jsonand other upstream commands.Summary
This change makes
transcripts getaccept zero or more transcript IDs from stdin when no positional argument is provided, enabling pipelines like:assembly transcripts list --json | assembly transcripts get -o textThe command maintains backward compatibility: a positional ID still works as before, and the output shape differs between single-fetch (unchanged) and batch modes (NDJSON with
--json, plain text otherwise).Key Changes
_resolve_ids()helper: Determines whether to fetch a single positional ID or read multiple IDs from stdin. RaisesUsageErrorif neither is provided or stdin contains no valid IDs, preventing hangs on interactive invocation._emit_transcript()helper: Centralizes transcript output logic, handling four distinct modes:-o field: raw field output (pipeline-friendly)--json: full SDK payload (matchestranscribe --json)--json: NDJSON records tagged with"type": "transcript"(CLI-wide discriminator)--json: plain text (one transcript per line)parse_transcript_ids()inclient.py: Flexible stdin parser accepting:transcripts list --json(extractsidfield)idfield)jq -r '.[].id')Argument change:
transcript_idis now optional (str | None), with help text clarifying stdin fallback.Test coverage: Added comprehensive tests for ID parsing, batch fetching, error cases (no ID + no stdin, empty stdin), and output format validation (single-fetch JSON excludes
"type"wrapper; batch JSON includes it).Implementation Details
batchflag returned by_resolve_ids()controls output formatting, ensuring NDJSON wrapping only appears in true batch mode.https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst