Core: Retry Hadoop version hint before metadata scan#17013
Conversation
6df81ce to
527ddb8
Compare
Retry reading version-hint.text when the metadata directory exists before falling back to listing metadata files. Keep missing metadata directories on the fast path, use non-conflicting Hadoop configuration keys, cap exponential retry backoff, and surface retry interruptions.
527ddb8 to
ab95836
Compare
| } | ||
| } | ||
|
|
||
| private Integer retryReadVersionHint(FileSystem fs, Path versionHintFile) { |
There was a problem hiding this comment.
retryReadVersionHint and sleepBeforeVersionHintRetry reimplement retry, capped exponential backoff, and re-interrupt-then-throw that org.apache.iceberg.util.Tasks already provides (its exponentialBackoff(min, max, totalTimeout, scale) plus the sleep/interrupt handling in runSingleThreaded, which does the same Thread.currentThread().interrupt(); throw new RuntimeException(...)). AGENTS.md lists Tasks.foreach as the standard retry utility and this package already uses it in HadoopFileIO. Consider Tasks.foreach(versionHintFile).retry(numRetries).exponentialBackoff(initialWaitMs, maxWaitMs, totalTimeoutMs, 2.0).onlyRetryOn(IOException.class).run(...), falling back to the metadata listing in the catch when retries are exhausted. onlyRetryOn(IOException.class) also avoids retrying a deterministic parse failure (NumberFormatException from a corrupt hint), which the current catch (Exception) re-reads with full backoff.
Summary
Retry reading Hadoop
version-hint.textbriefly before falling back to scanning the metadata directory.Problem
HadoopTableOperationsupdatesversion-hint.textas a best-effort pointer after committing a new metadata file. On object stores such as OSS/S3/GCS, the delete/rename sequence can make the hint file briefly unavailable or not yet visible to readers.Today, a transient read failure immediately falls back to listing the metadata directory. For tables with many metadata files, especially on object-store-backed metadata directories, that listing can be significantly more expensive than retrying the small hint file read.
Fix
Add configurable retries for reading
version-hint.textbefore metadata directory listing.Defaults:
The retry uses exponential backoff capped by the configured max wait. If the metadata directory does not exist,
findVersion()keeps the existing fast path and returns0without retrying. If retry sleep is interrupted, the interrupt is surfaced instead of falling through to metadata listing.The configuration keys intentionally avoid the
iceberg.hadoop.*prefix to prevent ambiguity with integrations that useiceberg.hadoop.*as a pass-through prefix for Hadoop configuration.Test plan
Attempted:
Gradle did not reach test execution because plugin dependency downloads failed with TLS handshake errors from the Gradle plugin repository.