Skip to content

HDDS-14945. Implement Iceberg position delete file rewrite for path migration#10306

Draft
sreejasahithi wants to merge 3 commits into
apache:masterfrom
sreejasahithi:HDDS-14945
Draft

HDDS-14945. Implement Iceberg position delete file rewrite for path migration#10306
sreejasahithi wants to merge 3 commits into
apache:masterfrom
sreejasahithi:HDDS-14945

Conversation

@sreejasahithi
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Position delete files of an Iceberg table contain the absolute path to the data files which contains the rows deleted.
We need to rewrite these position delete files as part of path migration. For each selected position delete file, we can take help of Iceberg's RewriteTablePathUtil to change sourcePrefix to targetPrefix for each data file absolute path mentioned in it, and add the rewritten position delete file to a staging location.

Introduce OzonePositionDeleteReaderWriter, which implements Iceberg’s PositionDeleteReaderWriter, to perform format-specific reads and writes for Avro, Parquet, and ORC.

Also added test coverage wrt position delete files and manifest file.

What is the link to the Apache JIRA

HDDS-14945

How was this patch tested?

Updated testcases
Green CI : https://github.com/sreejasahithi/ozone/actions/runs/26036208134

@sreejasahithi sreejasahithi marked this pull request as ready for review May 19, 2026 09:47
@sreejasahithi
Copy link
Copy Markdown
Contributor Author

@ashishkumar50 could you please review this patch.

Copy link
Copy Markdown
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sreejasahithi for the patch. I would like to know how ozone-iceberg.jar is used in practice.

  • Are you supposed to copy all 14 new dependency jars to make it work? If so, its usage is getting cumbersome, and we should provide a fat jar, like ozone-filesystem-hadoop3 for Hadoop environment.
  • Or are these already available in Iceberg environment? If so, then the dependencies should be added with provided scope, and shouldn't be part of the Ozone binary distribution.

@sreejasahithi sreejasahithi marked this pull request as draft May 22, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants