[core] Support OR/nested partition predicate pruning in format table scan#8367
Open
Zouxxyy wants to merge 2 commits into
Open
[core] Support OR/nested partition predicate pruning in format table scan#8367Zouxxyy wants to merge 2 commits into
Zouxxyy wants to merge 2 commits into
Conversation
…scan Format table scan pruned partition directories via a per-field Map<String,Predicate> built with splitAnd, which dropped any predicate referencing more than one partition field. A cross-field OR such as (dt='a' AND hour<'16') OR (dt='b' AND hour>='16') was therefore dropped entirely, falling back to listing every partition directory. Replace the per-field model with incremental partial evaluation of the whole partition predicate during directory descent (mightMatch): a leaf is decided only once all the fields it references are bound along the current path, otherwise it is treated as possibly-matching. This prunes AND/OR/nested/cross-field predicates uniformly and never prunes a directory that could contain a match. Works for both the default key=value layout and format-table.partition-path-only-value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In format-table.partition-path-only-value mode, the default partition name (__DEFAULT_PARTITION__ by default) starts with '_' and was treated as a hidden path by PartitionPathUtils. This caused null/default partitions to be skipped entirely during partition discovery. Allow the configured default partition directory name through the hidden path check in only-value mode, and pass defaultPartName into FormatTableScan.listPartitionEntries so that listPartitionEntries uses the same semantics as findPartitions. Also add regression tests for null/default partitions in OR pruning and listPartitionEntries, and keep the prefix+OR edge case covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Support generic partition-directory pruning for format table scans when partition predicates contain OR/nested expressions.
Previously pruning used a per-field `Map<String, Predicate>` built from `splitAnd`, so any predicate referencing more than one partition field (for example `(dt='20260625' AND hour<'16') OR (dt='20260624' AND hour>='16')`) was dropped and the scan fell back to listing every partition directory.
This change:
Tests