Skip to content

Align TableCommit with FileStoreCommit commit semantics#422

Open
JingsongLi wants to merge 7 commits into
mainfrom
codex/table-commit-parity
Open

Align TableCommit with FileStoreCommit commit semantics#422
JingsongLi wants to merge 7 commits into
mainfrom
codex/table-commit-parity

Conversation

@JingsongLi

@JingsongLi JingsongLi commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Align Rust TableCommit more closely with Python/Java FileStoreCommit commit behavior where the underlying Rust writer/commit contracts match. This adds manifest rolling and minor compaction, overwrite retry caching, richer latest-snapshot conflict checks, row-tracking parity for dedicated files, and column-aware global index update handling.

Changes

  • Add manifest options and write-path support for rolling manifest files by target size, plus minor compaction of small existing manifest files.
  • Reuse overwrite base entries across retries when intervening snapshots are safe, while rebuilding on target-partition changes, non-APPEND commits, or whole-table overwrite.
  • Extend commit conflict handling with delete existence checks and row-id existence/range checks against the current resolved base entries.
  • Fill manifest min_row_id/max_row_id metadata when all entries carry row ids.
  • Align row-tracking assignment with Python/Java behavior for missing file_source, blob files, and vector dedicated files.
  • Make global index updates column-aware with default rejection and DROP_PARTITION_INDEX support, including extra_field_ids.
  • Keep overwrite changelog handling Java-aligned: overwrite commits ignore changelog-only input.

Testing

  • cargo fmt --all
  • cargo test -p paimon table::table_commit --no-fail-fast -- --nocapture
  • cargo test -p paimon --lib --no-fail-fast
  • git diff --check

Notes

  • table_commit coverage was expanded with Python-inspired regression tests for overwrite cache reuse/rebuild, manifest rolling/minor compaction, global index extra-field updates, and default-partition null overwrite behavior.
  • Python's check_from_snapshot path is intentionally not mirrored here because Rust currently does not carry the same read-snapshot contract into CommitMessage; keeping a public field without a reliable producer would imply stronger conflict detection than Rust actually has.

@JingsongLi JingsongLi changed the title Align TableCommit with FileStoreCommit commit semantics [WIP] Align TableCommit with FileStoreCommit commit semantics Jun 29, 2026
@JingsongLi JingsongLi changed the title [WIP] Align TableCommit with FileStoreCommit commit semantics Align TableCommit with FileStoreCommit commit semantics Jun 29, 2026
Comment thread crates/paimon/src/table/table_commit.rs Outdated

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Reviewed the latest head, including the manifest rolling and compression follow-up commits. I also ran cargo fmt --all -- --check and cargo test -p paimon table::table_commit --no-fail-fast locally on 6f41f30.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants