Skip to content

Core, Spark: Fix incorrect partial commit failure count in rewrite data files#16970

Open
john801205 wants to merge 3 commits into
apache:mainfrom
john801205:partial_commit_failure
Open

Core, Spark: Fix incorrect partial commit failure count in rewrite data files#16970
john801205 wants to merge 3 commits into
apache:mainfrom
john801205:partial_commit_failure

Conversation

@john801205

@john801205 john801205 commented Jun 26, 2026

Copy link
Copy Markdown

Summary

The rewrite data files procedure reported partial commit failures even when all commits succeeded, because it derived the total commit count as min(totalGroupCount, maxCommits). When groups are batched into groupsPerCommit = ceil(totalGroupCount / maxCommits) per commit, the actual number of commits is ceil(totalGroupCount / groupsPerCommit), which can be smaller. The mismatch made totalCommits - succeededCommits positive even with zero failures.

Compute totalCommits as ceil(totalGroupCount / groupsPerCommit) so the failed commit count is correct. Also make succeededCommits an AtomicInteger since it is incremented from multiple rewrite threads.

Issues

…ta files

The rewrite data files procedure reported partial commit failures even
when all commits succeeded, because it derived the total commit count as
min(totalGroupCount, maxCommits). When groups are batched into
groupsPerCommit = ceil(totalGroupCount / maxCommits) per commit, the
actual number of commits is ceil(totalGroupCount / groupsPerCommit),
which can be smaller. The mismatch made totalCommits - succeededCommits
positive even with zero failures.

Compute totalCommits as ceil(totalGroupCount / groupsPerCommit) so the
failed commit count is correct. Also make succeededCommits an
AtomicInteger since it is incremented from multiple rewrite threads.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@john801205 john801205 force-pushed the partial_commit_failure branch 2 times, most recently from 268928a to beac357 Compare June 26, 2026 12:20
@john801205 john801205 force-pushed the partial_commit_failure branch from beac357 to c4c6538 Compare June 26, 2026 12:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect “partial commit failure” reporting in Spark rewrite_data_files when partial progress is enabled by computing the true number of commits executed (based on groupsPerCommit) rather than min(totalGroupCount, maxCommits). Also makes commit-success tracking thread-safe in the shared commit service and adds regression coverage across supported Spark variants.

Changes:

  • Compute totalCommits as ceil(totalGroupCount / groupsPerCommit) to avoid phantom failed-commit counts.
  • Make BaseCommitService.succeededCommits an AtomicInteger to safely increment from multiple threads.
  • Add regression tests covering the “actual commits < maxCommits” scenario in Spark 3.5, 4.0, and 4.1 test suites.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/src/main/java/org/apache/iceberg/actions/BaseCommitService.java Make succeeded commit counter atomic for concurrent increments during partial-progress commits.
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java Correct total commit counting logic used to infer failed commits.
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java Correct total commit counting logic used to infer failed commits.
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java Correct total commit counting logic used to infer failed commits.
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java Add regression test ensuring no false failures when actual commits are fewer than maxCommits.
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java Add regression test ensuring no false failures when actual commits are fewer than maxCommits.
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java Add regression test ensuring no false failures when actual commits are fewer than maxCommits.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1472 to +1473
// With 10 original groups and max commits of 7, we have 2 groups per commit.
// Hence, there will be 1 + 5 commits in total
Comment on lines +1471 to +1472
// With 10 original groups and max commits of 7, we have 2 groups per commit.
// Hence, there will be 1 + 5 commits in total
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@john801205 john801205 force-pushed the partial_commit_failure branch from 6092eb4 to f3805c2 Compare June 30, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants