Skip to content

fix: handle non-contiguous RowRanges when resolving global row IDs#383

Open
zhf999 wants to merge 23 commits into
alibaba:mainfrom
zhf999:fix-rowid
Open

fix: handle non-contiguous RowRanges when resolving global row IDs#383
zhf999 wants to merge 23 commits into
alibaba:mainfrom
zhf999:fix-rowid

Conversation

@zhf999

@zhf999 zhf999 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Purpose

  • Fix a correctness bug where upper layers assumed returned batch rows are continuous and derived file row IDs as previous_batch_start + offset.
  • Introduce unified usage of GetPreviousBatchFileRowId(batch_row_id) to resolve the file row ID for a row index inside the current batch.
  • In PrefetchFileBatchReaderImpl, cache the actual file row IDs for each returned batch and keep row-id mapping aligned when a batch is sliced by read_range.
  • For Parquet, add target row-range union and per-batch row mapping logic:
    • Merge filtered target ranges into a file target row set.
    • Build batch_row_id -> file_row_id mapping in NextBatch().
    • Keep GetPreviousBatchFileRowId() correct under non-contiguous rows caused by predicate + bitmap filtering.
  • Update upper-layer consumers to query per-row file IDs directly (deletion vectors, bitmap index filtering, _ROW_ID field conversion, KeyValue iteration positions).

Tests

  • Updated and adapted reader tests to the new interface and semantics:
    • src/paimon/format/avro/avro_file_batch_reader_test.cpp
    • src/paimon/format/blob/blob_file_batch_reader_test.cpp
    • src/paimon/format/lance/lance_format_reader_writer_test.cpp
    • src/paimon/format/orc/orc_file_batch_reader_test.cpp
    • src/paimon/format/parquet/parquet_file_batch_reader_test.cpp
    • src/paimon/common/reader/prefetch_file_batch_reader_impl_test.cpp
  • Added/extended coverage in Parquet with TestRowMapping to validate file row mapping across non-contiguous ranges.
  • Remaining changes are interface migration plus consistency updates of call sites and assertions.

API and Format

  • Public reader API change: FileBatchReader and implementations now use GetPreviousBatchFileRowId(uint64_t batch_row_id).
  • Semantic change: returns the file row ID for the batch_row_id inside the current batch (instead of deriving by batch start + offset under contiguous assumptions).
  • Storage format and on-disk protocol are unchanged.

Documentation

No.

Generative AI tooling

gpt-5.3-codex

Copilot AI review requested due to automatic review settings June 26, 2026 02:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Comment thread include/paimon/reader/file_batch_reader.h
Result<uint64_t> PrefetchFileBatchReaderImpl::GetPreviousBatchFirstRowNumber() const {
return previous_batch_first_row_num_;
Result<uint64_t> PrefetchFileBatchReaderImpl::GetPreviousBatchGlobalRowId(
uint64_t batch_row_id) const {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can’t we just return previous_batch_first_row_num_ + batch_row_id directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PrefetchFileBatchReaderImpl may hold ParquetFileBatchReader, which may return contenation of two discontinuous batch. Should we fallback PrefetchFileBatchReaderImpl::GetPreviousBatchGlobalRowId to simply return previous_batch_first_row_num_ + batch_row_id like LanceFileBatchReader or BlobFileBatchReader?

Comment thread src/paimon/common/reader/prefetch_file_batch_reader_impl_test.cpp Outdated
Comment thread src/paimon/common/reader/prefetch_file_batch_reader_impl_test.cpp
Comment thread src/paimon/core/io/complete_row_tracking_fields_reader.cpp
Comment thread src/paimon/testing/mock/mock_file_batch_reader.h
Comment thread src/paimon/testing/utils/read_result_collector.h Outdated

static Result<std::shared_ptr<arrow::ChunkedArray>> CollectResultOneBatch(
BatchReader* batch_reader, int64_t max_data_processing_time_in_us) {
int64_t seed = DateTimeUtils::GetCurrentUTCTimeUs();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can CollectResultOneBatch just return an Arrow array directly? It doesn’t look like we need a ChunkedArray here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectResultOneBatch is designed to align with CollectResult. Should we implement CollectResultOneBatch with a different return value?

Comment thread src/paimon/format/parquet/parquet_file_batch_reader.cpp
PrepareOrcFileBatchReader(file_name, &read_schema, batch_size, natural_read_size);
ASSERT_EQ(std::numeric_limits<uint64_t>::max(),
orc_batch_reader->GetPreviousBatchFirstRowNumber().value());
ASSERT_EQ(orc_batch_reader->GetPreviousBatchFileRowId(0).value(), -1);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why -1 here? Status::Invalid?

return CollectResultOneBatch(batch_reader, /*max_simulated_data_processing_time*/ 0);
}

static Result<std::shared_ptr<arrow::ChunkedArray>> CollectResultOneBatch(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very similar with CollectResult, can you refactor to extract the common parts?

Result<uint64_t> GetPreviousBatchFirstRowNumber() const override {
assert(reader_);
return reader_->GetPreviousBatchFirstRowNumber();
Result<uint64_t> GetPreviousBatchFileRowId(uint64_t batch_row_id) const override {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the interface in FileReaderWrapper together

Comment on lines +217 to +222
target_row_groups.emplace_back(
/*rg_index=*/rg_id,
/*is_partially_matched=*/false,
/*ranges=*/
RowRanges(Range(0, reader_->GetAllRowGroupRanges()[rg_id].second -
reader_->GetAllRowGroupRanges()[rg_id].first - 1)));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format it

Suggested change
target_row_groups.emplace_back(
/*rg_index=*/rg_id,
/*is_partially_matched=*/false,
/*ranges=*/
RowRanges(Range(0, reader_->GetAllRowGroupRanges()[rg_id].second -
reader_->GetAllRowGroupRanges()[rg_id].first - 1)));
target_row_groups.emplace_back(
/*rg_index=*/rg_id, /*is_partially_matched=*/false, /*ranges=*/
RowRanges(Range(0, reader_->GetAllRowGroupRanges()[rg_id].second -
reader_->GetAllRowGroupRanges()[rg_id].first - 1)));

ReadResultCollector::CollectResult(
reader.get(), /*max simulated data processing time*/ 100));
ASSERT_EQ(reader->GetPreviousBatchFirstRowNumber().value(), 101);
ASSERT_NOK(reader->GetPreviousBatchFileRowId(0));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the interface has been modified, these calls will always trigger ASSERT_NOK, so there's no point in testing them anymore, right? There seem to be similar issues in other test files as well.

/// Get the row number of the first row in the previously read batch.
virtual Result<uint64_t> GetPreviousBatchFirstRowNumber() const = 0;
/// Get the global row number of the row in the previously read batch.
virtual Result<uint64_t> GetPreviousBatchFileRowId(uint64_t batch_row_id) const = 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any explicit semantic constraints on this interface before reading begins and after EOF is reached? As per previous discussions, it returns Status::Invalid before reading starts. However, after reaching EOF, the behavior currently varies wildly—some continue to accumulate, while others return errors. Should we impose some constraints on this? cc @lxy-9602

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants