Skip to content

Support batch transcript fetching from piped stdin#180

Merged
alexkroman merged 4 commits into
mainfrom
claude/admiring-gauss-496nhb
Jun 16, 2026
Merged

Support batch transcript fetching from piped stdin#180
alexkroman merged 4 commits into
mainfrom
claude/admiring-gauss-496nhb

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Enable the transcripts get command to read multiple transcript IDs from piped stdin, composing naturally with transcripts list --json and other upstream commands.

Summary

This change makes transcripts get accept zero or more transcript IDs from stdin when no positional argument is provided, enabling pipelines like:

assembly transcripts list --json | assembly transcripts get -o text

The command maintains backward compatibility: a positional ID still works as before, and the output shape differs between single-fetch (unchanged) and batch modes (NDJSON with --json, plain text otherwise).

Key Changes

  • _resolve_ids() helper: Determines whether to fetch a single positional ID or read multiple IDs from stdin. Raises UsageError if neither is provided or stdin contains no valid IDs, preventing hangs on interactive invocation.

  • _emit_transcript() helper: Centralizes transcript output logic, handling four distinct modes:

    • Single-fetch with -o field: raw field output (pipeline-friendly)
    • Single-fetch with --json: full SDK payload (matches transcribe --json)
    • Batch with --json: NDJSON records tagged with "type": "transcript" (CLI-wide discriminator)
    • Batch without --json: plain text (one transcript per line)
  • parse_transcript_ids() in client.py: Flexible stdin parser accepting:

    • JSON array from transcripts list --json (extracts id field)
    • Single transcript JSON object (extracts id field)
    • Plain text with one ID per line (e.g., from jq -r '.[].id')
    • Falls back gracefully between formats; deduplicates while preserving order
  • Argument change: transcript_id is now optional (str | None), with help text clarifying stdin fallback.

  • Test coverage: Added comprehensive tests for ID parsing, batch fetching, error cases (no ID + no stdin, empty stdin), and output format validation (single-fetch JSON excludes "type" wrapper; batch JSON includes it).

Implementation Details

  • The batch flag returned by _resolve_ids() controls output formatting, ensuring NDJSON wrapping only appears in true batch mode.
  • Validation happens early (before API calls) for both single and batch IDs.
  • Error handling distinguishes between usage errors (missing ID/stdin) and API errors (failed transcript fetch).
  • Snapshot tests updated to reflect the optional argument syntax.

https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst

claude added 3 commits June 16, 2026 17:07
`transcripts get` now fetches multiple transcripts when ids are piped on
stdin and the positional id is omitted, so the collection commands compose
as a Unix pipeline:

    assembly transcripts list --json | assembly transcripts get -o text

A new `client.parse_transcript_ids` accepts the shapes a pipeline naturally
produces — the JSON array from `transcripts list --json`, a single transcript
JSON object, or plain text with one id per line (`jq -r '.[].id'`) — so the
pipeline works with or without jq. In stdin (batch) mode `--json` emits one
NDJSON `{"type": "transcript", ...}` record per id, mirroring `transcribe`
batch, while a single positional fetch keeps its existing output shape.

https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst
Mirrors the map-reduce LLM vocabulary `transcribe` gained in #179, completing
the pipeline `transcripts list | transcripts get` so fetched transcripts can be
summarized or aggregated without a second tool:

    # map: summarize each transcript in a piped list
    assembly transcripts list --json | assembly transcripts get --llm "Summarize this call"

    # reduce: one ranking across all of them
    assembly transcripts list --json | assembly transcripts get --llm-reduce "Rank these worst-to-best"

`--llm` runs a per-transcript chain (server-injected by id via
`llm.run_chain_steps`); `--llm-reduce` runs one chain over all fetched
transcripts (`llm.run_chain`), emitting the same additive
`{"type":"reduce",...}` NDJSON record transcribe does. A single positional id
folds the reduce prompts into the `--llm` chain (nothing to aggregate), matching
transcribe's single-source behavior. Human reduce keeps stdout clean by
suppressing the per-transcript output; --json keeps the per-id stream and
appends the reduce record. Reuses core/llm.py and transcribe's
render_transform_steps — no new engine.

The get/list pipeline tests move to tests/test_transcripts_pipeline.py to keep
both test modules under the 500-line gate.

https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst
Comment thread aai_cli/commands/transcripts.py Outdated
Comment on lines +152 to +158
if output_field is not None:
# -o wins over the chain, matching `transcribe` deliver_result precedence.
output.emit_text(
client.select_transcript_field(
transcript, output_field, chars_per_caption=chars_per_caption
)
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_deliver_transcript ignores suppress when output_field is set, so per-transcript output still prints during reduce-suppressed runs, contradicting the function’s suppression logic.

Show fix
Suggested change
if output_field is not None:
# -o wins over the chain, matching `transcribe` deliver_result precedence.
output.emit_text(
client.select_transcript_field(
transcript, output_field, chars_per_caption=chars_per_caption
)
)
if output_field is not None:
if not suppress:
output.emit_text(
client.select_transcript_field(
transcript, output_field, chars_per_caption=chars_per_caption
)
)
Details

✨ AI Reasoning
​The function is designed to optionally skip per-transcript emission when a reduce step will follow. That behavior is implemented for the transform and default transcript branches, but not for the single-field branch. As a result, when reduce mode intends to suppress per-item output, selecting a field still prints each item, violating the function’s own contract and making the suppression path inconsistent.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

@alexkroman alexkroman added this pull request to the merge queue Jun 16, 2026
@alexkroman alexkroman removed this pull request from the merge queue due to a manual request Jun 16, 2026
`_deliver_transcript`'s `-o` branch ignored the `suppress` flag, so
`transcripts get -o text --llm-reduce …` in human mode printed each
transcript's field *and* the reduce result. Guard the field emit with
`if not suppress:` so it matches the transform/default branches: under a
pending human reduce only the aggregate reaches stdout, while each field
still feeds the reduce. (Flagged in review.)

https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst
@alexkroman alexkroman enabled auto-merge June 16, 2026 18:23
@alexkroman alexkroman added this pull request to the merge queue Jun 16, 2026
Merged via the queue into main with commit 0bdec6c Jun 16, 2026
19 checks passed
@alexkroman alexkroman deleted the claude/admiring-gauss-496nhb branch June 16, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants