fix: align duplicate-key replacement semantics with Java Paimon#389
Open
lucasfang wants to merge 2 commits into
Open
fix: align duplicate-key replacement semantics with Java Paimon#389lucasfang wants to merge 2 commits into
lucasfang wants to merge 2 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot couldn't run its full agentic review because no GitHub Actions runner was available. Make sure your repository has a runner available to run Copilot's review, or add a copilot-setup-steps.yml file specifying one with the runs-on attribute. See the docs for more details.
Aligns C++ duplicate-key map update behavior with Java Paimon semantics (“last write wins”) across snapshot deletion file mapping, deletion-vector metadata, and index manifest combining.
Changes:
- Switch several map update paths from
insert/emplaceto replacement semantics viainsert_or_assign. - Adjust
BucketedCombinerprocessing to preserve Java-like removed/added handling and allow overwriting duplicate added entries. - Add targeted unit tests for duplicate-key overwrite behavior in snapshot reader, deletion vectors, deletion file writer, and index manifest combining.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/core/table/source/snapshot/snapshot_reader_test.cpp | Adds regression test ensuring later DV index metadata overwrites earlier for same data file. |
| src/paimon/core/table/source/snapshot/snapshot_reader.cpp | Updates DV meta → index meta mapping to overwrite on duplicate keys. |
| src/paimon/core/manifest/index_manifest_file_handler_test.cpp | Adds tests around combiner overwrite behavior; adds DV entry helper. |
| src/paimon/core/manifest/index_manifest_file_handler.h | Updates combiner docstrings to reflect combining keys used. |
| src/paimon/core/manifest/index_manifest_file_handler.cpp | Changes combiner maps to overwrite duplicates (insert_or_assign) and adjusts removed/added handling. |
| src/paimon/core/deletionvectors/deletion_vector_test.cpp | Adds coverage for duplicate data file name overwrite in deletion-file map creation. |
| src/paimon/core/deletionvectors/deletion_vector.cpp | Uses overwrite semantics when building deletion-file map. |
| src/paimon/core/deletionvectors/deletion_file_writer_test.cpp | Adds test verifying duplicate writes overwrite DV metadata for same data file name. |
| src/paimon/core/deletionvectors/deletion_file_writer.cpp | Overwrites DV metadata on duplicate key (insert_or_assign). |
| src/paimon/CMakeLists.txt | Includes new snapshot reader unit test in test build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+125
to
+129
| auto iter = dv_ranges->find("data-file-1"); | ||
| ASSERT_NE(iter, dv_ranges->end()); | ||
| ASSERT_GT(iter->second.GetOffset(), 1); | ||
| ASSERT_EQ(iter->second.GetCardinality(), std::optional<int64_t>(2)); | ||
| } |
Comment on lines
+45
to
+46
| dv_metas_.insert_or_assign(key, DeletionVectorMeta(key, static_cast<int32_t>(start), length, | ||
| deletion_vector->GetCardinality())); |
|
|
||
| ASSERT_EQ(deletion_files.size(), 1); | ||
| ASSERT_TRUE(deletion_files[0].has_value()); | ||
| EXPECT_EQ(deletion_files[0]->path, dir_->Str() + "/index/index-second"); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title: fix: align duplicate-key replacement semantics with Java Paimon
Purpose
Linked issue: close #xxx
This PR aligns several duplicate-key map update paths with Java Paimon semantics.
Java
Map.put/LinkedHashMap.putreplaces the existing value when the key already exists, while C++insert/emplacekeeps the old value. This caused C++ behavior to diverge from Java in duplicate key scenarios.The affected paths are:
BUCKET_UNAWAREdeletion vector mode unsupported.BucketedCombiner.SnapshotReader::GetDeletionFileswith JavaSnapshotReaderImpl.toDeletionFiles, where later deletion file metadata for the same data file wins.DeletionVector::CreateDeletionFileMapwith JavaDeletionFile.factory, where laterDeletionFilevalues overwrite earlier values for the same data file name.DeletionFileWriter::Writewith JavaDeletionFileWriter.write, where later DV metadata overwrites earlier metadata for the same data file name.Tests
Unit tests:
DeletionFileWriterTest.WriteCloseAndReadBackDeletionFileWriterTest.GetResultWithoutCloseShouldFailDeletionFileWriterTest.WriteOverwritesDuplicateDataFileNameDeletionFileWriterTest.ExternalPathInResultDeletionVectorTest.CreateDeletionFileMapIndexManifestFileHandlerTest.GlobalCombinerDeletesThenAddsByFileNameIndexManifestFileHandlerTest.BucketedCombinerUsesPartitionBucketAndIndexTypeIndexManifestFileHandlerTest.BucketedCombinerOverwritesDuplicateAddedEntriesIndexManifestFileHandlerTest.GlobalCombinerOverwritesDuplicateAddedEntriesIndexManifestFileHandlerTest.DvWithBucketUnawareModeReturnsNotImplementedSnapshotReaderTest.GetDeletionFilesOverwritesDuplicateDataFileNameAPI and Format
No public API change.
No storage format or protocol change.
This only changes runtime duplicate-key handling to match Java Paimon replacement semantics.
Documentation
No new feature is introduced.
Generative AI tooling
Generated-by: GitHub Copilot