Skip to content

feat: add ListLayout#8071

Draft
mhk197 wants to merge 37 commits into
developfrom
mk/add-list-layout
Draft

feat: add ListLayout#8071
mhk197 wants to merge 37 commits into
developfrom
mk/add-list-layout

Conversation

@mhk197

@mhk197 mhk197 commented May 22, 2026

Copy link
Copy Markdown
Contributor

Adds ListLayout, a structured layout for DType::List columns that stores elements, offsets, and (when nullable) validity as independent child layouts, each with its own configurable strategy.

ListLayoutStrategy canonicalizes input to ListViewArray, rebuilds it via list_from_list_view, and writes each child concurrently.

ListReader fetches offsets and validity concurrently. Then uses offsets[0]..offsets[-1] to derive a bounded elements range, rebase the offsets to start at zero, fetch only that slice of elements, and assemble the ListArray. This means partial-range reads only fetch the elements they actually need rather than the whole elements buffer.

@mhk197 mhk197 linked an issue May 27, 2026 that may be closed by this pull request
1 task
@mhk197 mhk197 removed a link to an issue May 27, 2026
1 task
@mhk197 mhk197 force-pushed the mk/add-list-layout branch from 2dafcdb to b267f99 Compare May 28, 2026 13:53
@mhk197 mhk197 added the changelog/feature A new feature label May 28, 2026
@mhk197 mhk197 linked an issue May 28, 2026 that may be closed by this pull request
1 task
@mhk197 mhk197 removed a link to an issue May 28, 2026
1 task
@mhk197 mhk197 marked this pull request as ready for review May 28, 2026 14:09
@connortsui20 connortsui20 requested review from gatesn and robert3005 and removed request for connortsui20 May 28, 2026 14:14
Comment thread vortex-layout/src/layouts/list/reader.rs
session: VortexSession,
elements: LayoutReaderRef,
offsets: LayoutReaderRef,
validity: Option<LayoutReaderRef>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this needs a layout?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably can by with this being a segment

offsets,
validity,
..
} = list_from_list_view(array.execute::<ListViewArray>(&mut exec_ctx)?)?.into_data_parts();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to execute to a ListArray?

Comment on lines +34 to +42
/// Strategy for writing list-typed arrays.
///
/// Single-chunk only. The strategy:
/// 1. Canonicalizes the input chunk into a [`ListViewArray`].
/// 2. Calls [`list_from_list_view`] to rebuild it into zero-copy-to-list form
/// (sorted, gapless, non-overlapping offsets) and produce a [`ListArray`].
/// 3. Writes the `elements`, `offsets`, and (when nullable) `validity` columns into
/// separately configurable downstream strategies, producing a single [`ListLayout`].
///

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the sizing of the splits works, what is it based on?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just talked through this. This is the layout that really highlights why row-batches suck... and really why engines should be pushing down unnest operations into the source. But... they don't.

For now, the best we can do is register the validity and offsets splits, so we just delegate to the child.

@connortsui20

connortsui20 commented May 29, 2026

Copy link
Copy Markdown
Member

Is there a tracking issue that explains why we do not want to store sizes as a (potentially optional) layout child?

It would be nice to see some sort of design doc (or if that already exists, please attach it in the PR description).

@joseph-isaacs joseph-isaacs added the action/benchmark Trigger full benchmarks to run on this PR label May 29, 2026
@github-actions github-actions Bot removed the action/benchmark Trigger full benchmarks to run on this PR label May 29, 2026
@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Polar Signals Profiling Results

Latest Run

Status Commit Job Attempt Link
🟢 Done fa581c4 1 Explore Profiling Data
Previous Runs (3)
Status Commit Job Attempt Link
🟢 Done e94f42c 1 Explore Profiling Data
🟢 Done 4256223 2 Explore Profiling Data
🟢 Done 4256223 1 Explore Profiling Data

Powered by Polar Signals Cloud

@connortsui20 connortsui20 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits, didn't get a chance to read through writer.rs though

Comment thread vortex-layout/src/layouts/list/mod.rs Outdated
Comment thread vortex-layout/src/layouts/list/mod.rs Outdated
Comment on lines +266 to +270
pub fn new(offsets_ptype: PType) -> Self {
let mut metadata = Self::default();
metadata.set_offsets_ptype(offsets_ptype);
metadata
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a strange constructor? Why not just do Self { offsets_ptype }?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conversion from enum to i32 in prost needs this apparently, see DictLayout

Comment thread vortex-layout/src/layouts/list/mod.rs Outdated
Comment thread vortex-layout/src/layouts/list/reader.rs Outdated
// Bound the elements read using offsets[0] and offsets[-1]
let elements_range = calculate_elements_range(&offsets, &session)?;

// Rebase the offsets so they start at zero

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prose comments should end in periods (ask an llm to do this for you!).

Comment on lines +241 to +245
let array = if !mask.all_true() {
array.filter(mask)?
} else {
array
};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joseph-isaacs I feel like we should have this optimization in the .filter method for arrays instead of constructing the filter array and then immediately optimizing it away?

Basically for the "built-in" array methods we might as well do the optimizations immediately.

@github-actions github-actions Bot added the stale This PR is stale and will be auto-closed soon label Jun 13, 2026
@mhk197 mhk197 requested a review from a team June 17, 2026 00:26
@mhk197 mhk197 marked this pull request as draft June 17, 2026 00:27
@mhk197 mhk197 force-pushed the mk/add-list-layout branch 2 times, most recently from fcd1059 to e94f42c Compare June 17, 2026 04:24
@mhk197 mhk197 added the action/benchmark Trigger full benchmarks to run on this PR label Jun 18, 2026
@github-actions github-actions Bot removed the action/benchmark Trigger full benchmarks to run on this PR label Jun 18, 2026
@mhk197 mhk197 added action/benchmark-sql Trigger SQL benchmarks to run on this PR and removed stale This PR is stale and will be auto-closed soon labels Jun 18, 2026
mhk197 added 29 commits June 26, 2026 14:45
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
…hange

Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
@mhk197 mhk197 force-pushed the mk/add-list-layout branch from fa581c4 to af54f25 Compare June 29, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants