Skip to content

Separate chunker from batcher#625

Merged
frankmcsherry merged 1 commit into
TimelyDataflow:master-nextfrom
antiguru:explicit_chunker
May 28, 2026
Merged

Separate chunker from batcher#625
frankmcsherry merged 1 commit into
TimelyDataflow:master-nextfrom
antiguru:explicit_chunker

Conversation

@antiguru
Copy link
Copy Markdown
Member

@antiguru antiguru commented Jul 15, 2025

The chunker was part of the batcher and responsible for transforming input data into the batcher's chain format. Hence, the batcher needed to be aware of its input types, although it would not otherwise use this information.

This change drops the Input and C type parameters from MergeBatcher, and the Input associated type plus push_container method from the Batcher trait. Batchers now accept chunks via PushInto<Self::Output>. Chunking moves into arrange_core, which gains a Chu: ContainerBuilder type parameter so callers can supply a chunker that maps the stream's input container into the batcher's output container.

The Arrange trait constrains Ba::Output = C (same-type chunker) and hardcodes ContainerChunker<C> internally, so .arrange::<Ba, Bu, Tr>() callsites for Vec-based collections are unchanged. Callers that need a cross-container chunker (columnar layouts, interactive) drop to arrange_core directly.

Also updates chainless_batcher::Batcher to the new Batcher trait shape.

@antiguru antiguru requested a review from frankmcsherry July 15, 2025 12:39
@antiguru antiguru changed the base branch from master to master-next April 17, 2026 14:45
The chunker was part of the batcher and responsible for transforming input
data into the batcher's chunk format. Hence, the batcher needed to be aware
of its input types, although it would not otherwise use this information.

Drop the `Input` associated type and `push_container` method from the
`Batcher` trait; batchers now accept already-chunked input via
`PushInto<Self::Output>`. The vec `MergeBatcher` loses its `Input` and `C`
(chunker) type parameters, and the columnar `MergeBatcher` loses its internal
`TrieChunker`. Both now expose `PushInto` that inserts a chunk directly as a
chain.

Chunking moves into `arrange_core`, which gains a `Chu: ContainerBuilder`
type parameter so callers supply a chunker that maps the stream's input
container into the batcher's output container. The operator drives the
chunker (push, extract, and a `finish` drain before sealing) where the
batcher previously did.

The `Arrange` trait constrains `Ba::Output = C` and hardcodes
`ContainerChunker<C>` internally, so `.arrange::<Ba, Bu, Tr>()` callsites for
`Vec`-based collections are unchanged. Callers needing a cross-container
chunker (columnar layouts, interactive, spill) drop to `arrange_core`
directly and pass an explicit `ValChunker`.

Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@antiguru antiguru force-pushed the explicit_chunker branch from 523a501 to 5fc079f Compare May 28, 2026 20:10
@frankmcsherry frankmcsherry merged commit 5c9f72d into TimelyDataflow:master-next May 28, 2026
6 checks passed
@github-actions github-actions Bot mentioned this pull request May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants