Skip to content

perf: avoid expanding Mosaic row selections#421

Open
QuakeWang wants to merge 1 commit into
apache:mainfrom
QuakeWang:perf/mosaic-row-ranges
Open

perf: avoid expanding Mosaic row selections#421
QuakeWang wants to merge 1 commit into
apache:mainfrom
QuakeWang:perf/mosaic-row-ranges

Conversation

@QuakeWang

Copy link
Copy Markdown
Member

Purpose

Mosaic row selection previously expanded each selected row range into a UInt64Array of per-row indices before applying take. For large contiguous ranges, this makes memory and CPU cost proportional to the selected row count even though the input is already range-based.

Brief change log

  • Keep Mosaic row selections as row-group-local (offset, len) slices.
  • Apply a single selected range with RecordBatch::slice.
  • Apply multiple selected ranges with concat_batches.
  • Preserve row counts for empty/all-missing projections.
  • Add coverage to ensure large selections stay range-based.

Tests

  • cargo clippy --all-targets --workspace --features fulltext,vortex,mosaic -- -D warnings
  • cargo test -p paimon --all-targets --features fulltext,vortex,mosaic test_mosaic
  • cargo test -p paimon --all-targets --features fulltext,vortex,mosaic arrow::format::mosaic::tests

API and Format

Documentation

Keep Mosaic row selections as row-group-local slices instead of materializing one UInt64 index per selected row. Apply selected ranges with RecordBatch slicing and concat while preserving empty projection row counts.

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant