Skip to content

Core: Include row lineage and key-id in snapshot value methods#17015

Open
manuzhang wants to merge 1 commit into
apache:mainfrom
manuzhang:codex/snapshot-row-lineage-equality
Open

Core: Include row lineage and key-id in snapshot value methods#17015
manuzhang wants to merge 1 commit into
apache:mainfrom
manuzhang:codex/snapshot-row-lineage-equality

Conversation

@manuzhang

@manuzhang manuzhang commented Jun 30, 2026

Copy link
Copy Markdown
Member

Summary

BaseSnapshot.equals, hashCode, and toString now include the snapshot row-lineage fields firstRowId and addedRows, plus the manifest-list encryption keyId.

This keeps snapshot comparisons and hashes aligned with metadata that changes row ID assignment or encrypted manifest-list key selection. It also makes those values visible in snapshot debug output.

The regression coverage is in TestSnapshot, alongside the snapshot behavior tests.

Testing

  • env GRADLE_USER_HOME=/tmp/gradle ./gradlew :iceberg-core:test --tests org.apache.iceberg.TestSnapshot.snapshotValueMethodsIncludeMetadataFields --console=plain
  • env GRADLE_USER_HOME=/tmp/gradle ./gradlew :iceberg-core:spotlessJavaCheck --console=plain
  • git diff --check

AI Disclosure

  • Model: GPT-5
  • Platform/Tool: Codex
  • Human Oversight: partially reviewed
  • Prompt Summary: Update Apache Iceberg snapshot value methods to include row-lineage metadata and manifest-list key IDs, with focused snapshot test coverage.

@github-actions github-actions Bot added the core label Jun 30, 2026
@manuzhang manuzhang requested a review from RussellSpitzer June 30, 2026 04:15
@manuzhang manuzhang force-pushed the codex/snapshot-row-lineage-equality branch from 31752ae to 7891e9f Compare June 30, 2026 04:19
@manuzhang manuzhang marked this pull request as ready for review June 30, 2026 04:21
@manuzhang manuzhang marked this pull request as draft June 30, 2026 04:35
@manuzhang manuzhang requested a review from Copilot June 30, 2026 04:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Iceberg Core’s BaseSnapshot identity semantics so snapshot equality, hashing, and debug output reflect row-lineage state (firstRowId, addedRows). This prevents snapshots that differ only in assigned row ID ranges from comparing equal and makes those values visible when logging/inspecting snapshots.

Changes:

  • Extend BaseSnapshot.equals and hashCode to include firstRowId and addedRows.
  • Extend BaseSnapshot.toString to include first-row-id and added-rows.
  • Add a focused unit test asserting equality/hash behavior and toString visibility for row-lineage fields.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
core/src/main/java/org/apache/iceberg/BaseSnapshot.java Includes row-lineage fields in equals, hashCode, and toString for correct snapshot identity/debugging.
core/src/test/java/org/apache/iceberg/TestRowLineageMetadata.java Adds coverage verifying snapshot equality/hash/toString change with/without row-lineage values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@singhpk234 singhpk234 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change @manuzhang
I understand this is focusing on RL, though while we are at it, we can also include key-id as well

@manuzhang manuzhang marked this pull request as ready for review June 30, 2026 11:44
@manuzhang manuzhang force-pushed the codex/snapshot-row-lineage-equality branch from 7891e9f to 08d0447 Compare June 30, 2026 11:54
@manuzhang manuzhang changed the title Core: Include row lineage in snapshot equality Core: Include row lineage and key-id in snapshot value methods Jun 30, 2026
Include snapshot row-lineage fields and manifest list key IDs in BaseSnapshot equals/hashCode and toString. This keeps comparisons and hashes aligned with metadata that changes row ID assignment or manifest-list encryption key selection, and makes those values visible in debug output.

Co-authored-by: Codex <codex@openai.com>
@manuzhang manuzhang force-pushed the codex/snapshot-row-lineage-equality branch from 08d0447 to 34e733c Compare June 30, 2026 11:57
@manuzhang manuzhang requested a review from singhpk234 June 30, 2026 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants