Skip to content

Core: Add ManifestFile adapter for v4 tracked files#16867

Draft
anoopj wants to merge 9 commits into
apache:mainfrom
anoopj:manifestfile-adapter
Draft

Core: Add ManifestFile adapter for v4 tracked files#16867
anoopj wants to merge 9 commits into
apache:mainfrom
anoopj:manifestfile-adapter

Conversation

@anoopj

@anoopj anoopj commented Jun 18, 2026

Copy link
Copy Markdown
Member

Add TrackedFileAdapters.asManifestFile to wrap a DATA_MANIFEST or DELETE_MANIFEST TrackedFile as a ManifestFile.

The ManifestFile adapter will only be used by tests. Note that ManifestFile doesn't support manifest DVs. So in the actual manifest read path, the reader resolves the MDV first and produces the resulting TrackedFiles, which are then adapted to data/delete files.

Closes #16227

@anoopj anoopj moved this to In review in V4: metadata tree Jun 18, 2026
@github-actions github-actions Bot added the core label Jun 18, 2026
@anoopj anoopj force-pushed the manifestfile-adapter branch 2 times, most recently from b50410f to 718ae34 Compare June 19, 2026 00:42
Add TrackedFileAdapters.asManifestFile to wrap a DATA_MANIFEST or
DELETE_MANIFEST TrackedFile as a ManifestFile.

Closes apache#16227
@anoopj anoopj force-pushed the manifestfile-adapter branch from 718ae34 to 93368cb Compare June 19, 2026 00:47

@wombatu-kun wombatu-kun left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks correct: field delegation, equals/hashCode (consistent with GenericManifestFile), constructor preconditions, and the test coverage are all solid.

Non-blocking discussion point: partitionSpecId() throws UnsupportedOperationException. That is defensible since a v4 manifest is not bound to a single spec. But @rdblue pushed back on #16100 against these adapters exposing an invalid or arbitrary spec ID, so it may be worth confirming with reviewers that a throw is the intended contract for any future caller that reads partitionSpecId().

Comment thread core/src/main/java/org/apache/iceberg/TrackedFileAdapters.java
Comment thread core/src/main/java/org/apache/iceberg/TrackedFileAdapters.java
NO_PARTITION,
8L,
2048L);
file.set(MANIFEST_INFO_ORDINAL, manifestInfo);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use the builder from @gaborkaszab 's PR: #16769

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I can make the change if #16769 gets merged first or do this as a followup PR if this gets merged first.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in the meantime the builder got merged

@gaborkaszab gaborkaszab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, @anoopj !
I took a Quick Look and left some minor comment.


private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking() != null, "Cannot create manifest file: no tracking");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: other adapters don't check if tracking is null. Not sure we have to here. It's a required field anyway

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking() != null, "Cannot create manifest file: no tracking");
Preconditions.checkArgument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the spec proposal data seq num is 'Optional for leaf manifests, required for root.'. I think it's valid to have it as null here. Seems also optional in ManifestFile

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is for root-level manifest tracking where null is truly an invalid state. We can either validate or just have it fail with an NPE.

Seems also optional in ManifestFile

It is required for a manifest file: ManifestFile.sequenceNumber() returns a primitive

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to add on top of what Anoop said. this is for leaf manifest file entry (TrackedFile) in the root manifest file. Hence dataSequenceNumber should be required

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @anoopj and @stevenzwu !

Comment thread core/src/main/java/org/apache/iceberg/TrackedFileAdapters.java
}

@Override
public boolean equals(Object other) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other adapters don't override equals and hashCode. Is there a particular reason it's needed here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to be consistent with the semantics of GenericManifestFile.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually use equals and hashCode? We try to avoid implementing those methods where there are different interpretations of what "equality" might mean.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add any code that uses it. I was trying to be consistent with the behavior of the GenericManifestFile which defines equality with the location. I have removed it.

@anoopj anoopj requested review from gaborkaszab and stevenzwu June 22, 2026 17:38

@stevenzwu stevenzwu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I will merge Gabor's TrackedFileBuilder once CI passed. then let's rebase this one and use the builder as applicable.

private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking() != null, "Cannot create manifest file: no tracking");
Preconditions.checkArgument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to add on top of what Anoop said. this is for leaf manifest file entry (TrackedFile) in the root manifest file. Hence dataSequenceNumber should be required

private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking().dataSequenceNumber() != null,
"Cannot create manifest file: no data sequence number");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we prefer the slightly more specific "Invalid data sequence number: null" because "no X" doesn't necessarily mean "x is null".

Also, even though the attempt is to create a manifest file, that's not the actual problem. It's debatable whether to include what is being attempted, "Cannot create manifest file", but I would opt to leave it out because that's clear from other context (that the check is in asManifestFile and the TrackedManifestFile constructor).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Override
public int partitionSpecId() {
throw new UnsupportedOperationException(
"Tracked manifests are not bound to a single partition spec");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it is correct to say "Tracked manifests": what is a "tracked" manifest and why is one from a manifest list not "tracked"?

I think it is better to say "v4 manifests are not bound to a single partition spec".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I was trying to avoid having too many v4 references.

public ManifestContent content() {
return file.contentType() == FileContent.DATA_MANIFEST
? ManifestContent.DATA
: ManifestContent.DELETES;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think it is safer to avoid a dependency between this and the constructor's verification. I would recommend using a switch statement:

switch (file.contentType()) {
  case DATA_MANIFEST:
    return ManifestContent.DATA;
  case DELETE_MANIFEST:
    return ManifestContent.DELETES;
  default:
    throw new UnsupportedOperationException("Unsupported content type for manifests: " + file.contentType());
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public boolean equals(Object other) {
if (this == other) {
return true;
} else if (!(other instanceof TrackedManifestFile)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the implementation of GenericManifestFile is very helpful. It considers two files equal if they have the same path (which I think is reasonable) but it also excludes any implementation other than GenericManifestFile, which means that they aren't actually interchangeable as the interface suggests.

In addition, this creates a situation where a.equals(b) may be true, but b.equals(a) can be false for reasonable implementations.

I'd prefer to remove this and fix places where it is assumed or used. Good catch, @gaborkaszab!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

"Cannot create manifest file: no data sequence number");
Preconditions.checkArgument(
file.tracking().snapshotId() != null, "Cannot create manifest file: no snapshot ID");
Preconditions.checkArgument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems an internal invariant for TrackedFile. The new TrackedFileBuilder guarantees that manifestInfo is set for manifest entries.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Removed.

NO_PARTITION,
8L,
2048L);
file.set(MANIFEST_INFO_ORDINAL, manifestInfo);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in the meantime the builder got merged

private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking() != null, "Cannot create manifest file: no tracking");
Preconditions.checkArgument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @anoopj and @stevenzwu !

void testManifestFileAdapterDelegation() {
Tracking tracking = createTracking();
ManifestInfo manifestInfo = createManifestInfo();
TrackedFileStruct file =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we hide the construction of the input TrackedFile into a separate function? Like dataManifestFile() it could get ManifestInfo, FileContent, and/or Tracking parameters if that matters for the test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. extracted manifestBuilder(FileContent) / manifestFile(FileContent, ManifestInfo) and now tests build through them.

}

@Test
void testManifestFileAdapterDelegation() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe parameterize the test for data + delete manifest. The testDeleteManifestContentMapsToDeletes test could be dropped then

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggesiton. Done.

@anoopj anoopj requested a review from rdblue June 23, 2026 18:36

@gaborkaszab gaborkaszab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some minor comments. Other than them, LGTM!

.manifestInfo(manifestInfo)
.keyMetadata(ByteBuffer.wrap(new byte[] {7, 8, 9}))
.build();
populateTrackingFields(file);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI: in the follow-up for the TrackedFileBuilder PR, we decided to drop populateTrackingFields and went to call the constructor of TrackedFile instead using the createTracking (revived in the follow-up PR) method.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I can rebase when the followup PR is merged.

manifestBuilder(FileContent.DATA_MANIFEST).manifestInfo(createManifestInfo()).build();
// Inheritance fills the data sequence number, which the adapter requires; first row ID and
// key metadata stay unset.
((TrackingStruct) file.tracking()).set(DATA_SEQUENCE_NUMBER_ORDINAL, DATA_SEQUENCE_NUMBER);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using the ordinal, we can use the inheritFrom(some_manifest) method to do that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routing through inheritFrom() will require building a parent Tzracking with several nulls. I prefer to keep it since this is simpler.

Comment thread core/src/test/java/org/apache/iceberg/TestTrackedFileAdapters.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/TestTrackedFileAdapters.java Outdated
/** Adapts {@link TrackedFile} entries to the {@link DataFile} and {@link DeleteFile} APIs. */
/**
* Adapts {@link TrackedFile} entries to the {@link DataFile}, {@link DeleteFile}, and {@link
* ManifestFile} APIs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If comments require regular maintenance when touching the file, I prefer to rewrite them to minimize churn. This doesn't need to list everything that can be produced. Maybe edit this so that it say for example, {@link DataFile}?

return new TrackedManifestFile(file);
}

/** Shared base for all tracked file adapters. */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer true. How about "Shared base for data and delete file adapters"?

Comment thread core/src/main/java/org/apache/iceberg/TrackedFileAdapters.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/TrackedFileAdapters.java Outdated
tracking.set(MANIFEST_POS_ORDINAL, MANIFEST_POS);
}

private static ManifestInfo createManifestInfo() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a constant, not a method. We don't need to create a manifest info struct every time, and a method gives no guarantees about the values produced. It could create random values every time.

If we use a constant, then you can write assertions like assertThat(manifest.existingFilesCount()).isEqualTo(manifestInfo.existingFilesCount()).


// Builder for a manifest entry with the required non-tracking fields set. Callers add the
// manifest info and any optional fields before building.
private static TrackedFileBuilder manifestBuilder(FileContent contentType) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the complexity here to use the builder is a good thing.

This creates indirection and makes tests harder to read, while gaining almost nothing. There is hardly any readability benefit to using the builder when you're already setting constants like FORMAT_VERSION_V4. All this does is move the initialization here, so people have to find this method to validate that the assertions above are correct, like assertThat(manifest.path()).isEqualTo(MANIFEST_FILE_LOCATION). I would rather see MANIFEST_FILE_LOCATION passed in and then validated.

I think this should be removed and the constructor should be used instead. All these tests need to verify is that these values are passed through correctly.

.manifestInfo(manifestInfo)
.keyMetadata(ByteBuffer.wrap(new byte[] {7, 8, 9}))
.build();
populateTrackingFields(file);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this and either use a Tracking constant or create one inline. This indirection makes the tests unreadable.

file.tracking().snapshotId() != null, "Invalid snapshot ID: null");
Preconditions.checkArgument(
file.manifestInfo().dv() == null,
"Cannot adapt manifest with a deletion vector to ManifestFile: %s",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct and calls into question whether this adapter is going to be useful. We expect manifest DVs to be common and if we can't adapt a manifest with a DV back to ManifestFile, then what is the purpose of this adapter?

ManifestFile is mostly below the API level. It is exposed in some places (accessed primarily from Snapshot), but reading and writing for correctness (scan planning and commit operations) are all done within Table APIs and even snapshot change access is moving to SnapshotChanges. You can read manifests directly using utils in core, but I don't think that is very common.

I think we may want to avoid adding this for now, even though it seems like a reasonable addition. In order for this to be valuable, we would need to continue using ManifestFile and expose a DV through that interface. Since we don't know if that's going to be helpful, I think we should keep this as a draft for now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will convert this into a draft and we can revisit this if it's needed as we build the reader and writer. cc @stevenzwu

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the write path, I also proposed new additions to the ManifestFile interface. I added 3 new methods in my next PR #16936 . We can add more (including manifest DV) as we go.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add the DV, let me know and I can revive this PR.

anoopj and others added 2 commits June 30, 2026 21:31
@anoopj anoopj marked this pull request as draft July 1, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

v4: add wrapper for ManifestFile

5 participants