Core, Spark: Fix incorrect partial commit failure count in rewrite data files by john801205 · Pull Request #16970 · apache/iceberg

john801205 · 2026-06-26T06:34:46Z

Summary

The rewrite data files procedure reported partial commit failures even when all commits succeeded, because it derived the total commit count as min(totalGroupCount, maxCommits). When groups are batched into groupsPerCommit = ceil(totalGroupCount / maxCommits) per commit, the actual number of commits is ceil(totalGroupCount / groupsPerCommit), which can be smaller. The mismatch made totalCommits - succeededCommits positive even with zero failures.

Compute totalCommits as ceil(totalGroupCount / groupsPerCommit) so the failed commit count is correct. Also make succeededCommits an AtomicInteger since it is incremented from multiple rewrite threads.

Issues

Rewrite data files reports partial commit failure even when all commits succeed #16971

…ta files The rewrite data files procedure reported partial commit failures even when all commits succeeded, because it derived the total commit count as min(totalGroupCount, maxCommits). When groups are batched into groupsPerCommit = ceil(totalGroupCount / maxCommits) per commit, the actual number of commits is ceil(totalGroupCount / groupsPerCommit), which can be smaller. The mismatch made totalCommits - succeededCommits positive even with zero failures. Compute totalCommits as ceil(totalGroupCount / groupsPerCommit) so the failed commit count is correct. Also make succeededCommits an AtomicInteger since it is incremented from multiple rewrite threads. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Fixes incorrect “partial commit failure” reporting in Spark rewrite_data_files when partial progress is enabled by computing the true number of commits executed (based on groupsPerCommit) rather than min(totalGroupCount, maxCommits). Also makes commit-success tracking thread-safe in the shared commit service and adds regression coverage across supported Spark variants.

Changes:

Compute totalCommits as ceil(totalGroupCount / groupsPerCommit) to avoid phantom failed-commit counts.
Make BaseCommitService.succeededCommits an AtomicInteger to safely increment from multiple threads.
Add regression tests covering the “actual commits < maxCommits” scenario in Spark 3.5, 4.0, and 4.1 test suites.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
core/src/main/java/org/apache/iceberg/actions/BaseCommitService.java	Make succeeded commit counter atomic for concurrent increments during partial-progress commits.
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java	Correct total commit counting logic used to infer failed commits.
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java	Correct total commit counting logic used to infer failed commits.
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java	Correct total commit counting logic used to infer failed commits.
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java	Add regression test ensuring no false failures when actual commits are fewer than maxCommits.
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java	Add regression test ensuring no false failures when actual commits are fewer than maxCommits.
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java	Add regression test ensuring no false failures when actual commits are fewer than maxCommits.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    // With 10 original groups and max commits of 7, we have 2 groups per commit.
+    // Hence, there will be 1 + 5 commits in total


+    // With 10 original groups and max commits of 7, we have 2 groups per commit.
+    // Hence, there will be 1 + 5 commits in total


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

github-actions Bot added spark core labels Jun 26, 2026

john801205 force-pushed the partial_commit_failure branch 2 times, most recently from 268928a to beac357 Compare June 26, 2026 12:20

add integration tests

c4c6538

john801205 force-pushed the partial_commit_failure branch from beac357 to c4c6538 Compare June 26, 2026 12:26

wombatu-kun approved these changes Jun 29, 2026

View reviewed changes

manuzhang requested a review from Copilot June 30, 2026 04:21

Copilot started reviewing on behalf of manuzhang June 30, 2026 04:22 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Update comment for clarity

f3805c2

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

john801205 force-pushed the partial_commit_failure branch from 6092eb4 to f3805c2 Compare June 30, 2026 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Core, Spark: Fix incorrect partial commit failure count in rewrite data files#16970

Core, Spark: Fix incorrect partial commit failure count in rewrite data files#16970
john801205 wants to merge 3 commits into
apache:mainfrom
john801205:partial_commit_failure

john801205 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// With 10 original groups and max commits of 7, we have 2 groups per commit.
		// Hence, there will be 1 + 5 commits in total

Uh oh!

Conversation

john801205 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

john801205 commented Jun 26, 2026 •

edited

Loading