Add inverted-index-based distinct operator with runtime cost heuristic#17872
Add inverted-index-based distinct operator with runtime cost heuristic#17872xiangfu0 wants to merge 2 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17872 +/- ##
============================================
- Coverage 63.29% 63.27% -0.02%
- Complexity 1525 1542 +17
============================================
Files 3194 3197 +3
Lines 193645 193977 +332
Branches 29787 29864 +77
============================================
+ Hits 122559 122748 +189
- Misses 61466 61595 +129
- Partials 9620 9634 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
80749f8 to
8e47159
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new execution path for single-column DISTINCT on dictionary + inverted-index columns by introducing InvertedIndexDistinctOperator, with a runtime heuristic to choose between inverted-index bitmap intersection vs scan-based distinct.
Changes:
- Introduces
InvertedIndexDistinctOperatorwith cost-based runtime path selection and bitmapintersects()usage. - Updates
DistinctPlanNodeto opt into the new operator viauseInvertedIndexDistinctquery option (plus cost-ratio override option). - Adds unit/integration tests and JMH benchmarks to validate correctness and performance.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java |
Adds query option keys for enabling inverted-index distinct and overriding heuristic cost ratio. |
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java |
Adds helpers to parse new query options. |
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java |
Wires plan construction to use InvertedIndexDistinctOperator when eligible and opted in. |
pinot-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java |
New operator implementing inverted-index-based distinct with scan fallback and heuristic. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctCostHeuristicTest.java |
New unit tests for heuristic behavior and path selection. |
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java |
Adds integration coverage for DISTINCT with inverted index (with/without filter). |
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkInvertedIndexDistinct.java |
New JMH benchmark comparing inverted-index vs scan distinct paths. |
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkBitmapIntersectionVsAnd.java |
New JMH benchmark comparing bitmap intersects() vs full and() intersection. |
You can also share your feedback on Copilot code review. Take the survey.
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java
Outdated
Show resolved
Hide resolved
3f376a8 to
6ed2aee
Compare
There was a problem hiding this comment.
Pull request overview
Adds an opt-in (OPTION(useInvertedIndexDistinct=true)) single-segment DISTINCT execution path that can answer single-column DISTINCT from dictionary + inverted index (or sorted forward index) without going through the scan/projection pipeline, choosing between sorted/bitmap/scan paths via a runtime heuristic.
Changes:
- Introduces
InvertedIndexDistinctOperatorwith sorted-index, bitmap-inverted-index, and scan-fallback execution paths plus a cost heuristic (and override). - Wires the new operator into
DistinctPlanNodebehind a query option, and adds query option parsing/constants. - Adds unit/integration tests plus a JMH benchmark for the new operator/heuristic.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java | New DISTINCT operator implementing sorted/bitmap/scan paths and heuristic selection. |
| pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java | Constructs the new operator when eligible and opted-in. |
| pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java | Adds parsing helpers for the new query options. |
| pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java | Adds query option keys for enabling/tuning inverted-index DISTINCT. |
| pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctCostHeuristicTest.java | Tests heuristic boundary/override behavior and correctness parity vs scan. |
| pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctMultiValueTest.java | Tests MV column correctness and path selection behavior. |
| pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctSortedColumnTest.java | Tests sorted-index path selection and correctness parity vs scan. |
| pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctStringTest.java | Tests STRING column correctness and path selection parity vs scan. |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java | Adds an integration test covering DISTINCT with the opt-in option. |
| pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkInvertedIndexDistinct.java | Adds a JMH benchmark comparing the three execution paths. |
You can also share your feedback on Copilot code review. Take the survey.
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java
Outdated
Show resolved
Hide resolved
...on-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java
Outdated
Show resolved
Hide resolved
cded79e to
0339dfb
Compare
There was a problem hiding this comment.
Pull request overview
Adds an opt-in execution path for single-column DISTINCT queries that can bypass scan/projection by using dictionary + inverted index (with a runtime heuristic), plus tests and benchmarks to validate/measure the new behavior.
Changes:
- Introduces
InvertedIndexDistinctOperatorwith sorted-index, bitmap-inverted-index, and scan-fallback paths plus a cost heuristic and null handling. - Wires the operator into
DistinctPlanNodebehinduseInvertedIndexDistinct=trueand addsinvertedIndexDistinctCostRatioparsing + query option keys. - Adds unit/integration tests and a JMH benchmark for the new distinct execution paths.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java |
Adds new query option keys for enabling/tuning inverted-index DISTINCT. |
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java |
Parses the new query options (useInvertedIndexDistinct, invertedIndexDistinctCostRatio). |
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java |
Constructs InvertedIndexDistinctOperator when eligible and opted-in. |
pinot-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java |
New operator implementing the distinct logic and heuristic. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctCostHeuristicTest.java |
Unit tests for heuristic behavior and scan-vs-inverted correctness. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctMultiValueTest.java |
Unit tests for MV column behavior and ORDER BY / LIMIT interactions. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctSortedColumnTest.java |
Unit tests for sorted-index path selection and correctness. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctStringTest.java |
Unit tests for STRING column behavior and ordering/limits. |
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java |
Adds an integration test covering opt-in DISTINCT with/without filter. |
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkInvertedIndexDistinct.java |
Adds a JMH benchmark comparing sorted vs bitmap-inverted vs scan paths. |
You can also share your feedback on Copilot code review. Take the survey.
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Show resolved
Hide resolved
0339dfb to
c200936
Compare
There was a problem hiding this comment.
Pull request overview
Adds an opt-in, inverted-index-driven execution path for single-column DISTINCT queries to avoid scan/projection where possible, with a runtime heuristic to choose between sorted-index, bitmap-inverted-index, and scan fallback paths.
Changes:
- Introduces
InvertedIndexDistinctOperatorand wires it intoDistinctPlanNodewhenuseInvertedIndexDistinct=true. - Adds query options for enabling the operator and tuning the cost heuristic (
invertedIndexDistinctCostRatio). - Adds unit/integration tests and a JMH benchmark covering sorted/bitmap/scan paths, MV, STRING, and null handling.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java |
Adds query option keys for enabling inverted-index distinct and setting the cost ratio. |
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java |
Adds parsing helpers for the new query options. |
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java |
Constructs InvertedIndexDistinctOperator when eligible and opted-in. |
pinot-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java |
Implements sorted-index, bitmap-inverted-index, and scan-fallback execution paths with null handling and heuristic selection. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctCostHeuristicTest.java |
Unit tests for heuristic boundaries and forced-path behavior via cost ratio override. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctMultiValueTest.java |
Unit tests for MV columns (correctness, ORDER BY, LIMIT, empty result). |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctNullHandlingTest.java |
Unit tests for null handling behavior and scan-vs-inverted equivalence. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctSortedColumnTest.java |
Unit tests for sorted-index path selection and correctness vs scan. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctStringTest.java |
Unit tests for STRING distinct via inverted index and correctness vs scan. |
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java |
Adds an integration test exercising DISTINCT with useInvertedIndexDistinct=true. |
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkInvertedIndexDistinct.java |
Adds JMH benchmark comparing sorted vs bitmap-inverted vs scan distinct paths. |
You can also share your feedback on Copilot code review. Take the survey.
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java
Outdated
Show resolved
Hide resolved
6f54657 to
13acf80
Compare
ace9d3b to
3dc88f8
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new opt-in execution path for single-column DISTINCT queries that can leverage dictionary + inverted index (and a sorted-index fast path) to avoid the scan/projection pipeline, selected at runtime via a cost heuristic.
Changes:
- Introduces
InvertedIndexDistinctOperatorwith sorted-index, bitmap-inverted-index, and scan-fallback paths (plus cost heuristic and null handling). - Updates
DistinctPlanNodeto select the new operator whenuseInvertedIndexDistinct=trueand the column is eligible. - Adds query option keys/parsing plus new unit/integration tests and a JMH benchmark.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pinot-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java | New distinct operator with runtime path selection and null handling. |
| pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java | Wires in the operator behind a query option and eligibility checks. |
| pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java | Adds query option parsing helpers for enable flag + heuristic override. |
| pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java | Adds query option keys for the new operator and heuristic override. |
| pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctCostHeuristicTest.java | Tests heuristic selection and correctness vs scan. |
| pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctMultiValueTest.java | Tests MV-column correctness and scan parity. |
| pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctNullHandlingTest.java | Tests null handling behavior and scan parity. |
| pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctSortedColumnTest.java | Tests sorted-column path selection and correctness. |
| pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctStringTest.java | Tests STRING-column correctness and ordering/limit behavior. |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java | Adds an integration test for DISTINCT using the opt-in operator. |
| pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkInvertedIndexDistinct.java | Adds a JMH benchmark to compare sorted/bitmap/scan paths. |
You can also share your feedback on Copilot code review. Take the survey.
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java
Outdated
Show resolved
Hide resolved
16c2d8d to
179e9f6
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new single-segment execution operator that can answer single-column SELECT DISTINCT using dictionary + inverted index (and a sorted-index fast path when available), selected at runtime via a cost heuristic and gated behind a query option.
Changes:
- Introduces
InvertedIndexDistinctOperatorwith sorted-index / bitmap-inverted-index / scan-fallback paths and a runtime cost heuristic (plus filter-bitmap caching and null handling support). - Wires the operator into
DistinctPlanNodebehinduseIndexBasedDistinctOperator=true, and adds a new tuning optioninvertedIndexDistinctCostRatio. - Adds unit/integration tests and a JMH benchmark covering path selection and correctness (SV/MV, ORDER BY, LIMIT, null handling).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java |
Adds query option key invertedIndexDistinctCostRatio and expands useIndexBasedDistinctOperator doc to cover both distinct operators. |
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java |
Parses invertedIndexDistinctCostRatio query option. |
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java |
Constructs InvertedIndexDistinctOperator when eligible and opted-in. |
pinot-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java |
New operator implementing sorted/bitmap/scan paths with heuristic selection and null handling. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctCostHeuristicTest.java |
Unit tests for heuristic boundary behavior and override. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctMultiValueTest.java |
Unit tests for MV columns (filter/order/limit/empty result) and scan parity. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctNullHandlingTest.java |
Unit tests for null inclusion/exclusion and scan parity under null handling. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctSortedColumnTest.java |
Unit tests verifying sorted-index path selection and correctness. |
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctStringTest.java |
Unit tests for STRING columns with inverted index and scan parity. |
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java |
Adds an integration test exercising DISTINCT on an inverted-indexed column with/without filter. |
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkInvertedIndexDistinct.java |
Adds JMH benchmark comparing sorted vs inverted-index vs scan behavior. |
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java
Show resolved
Hide resolved
179e9f6 to
861a9ce
Compare
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkInvertedIndexDistinct.java
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/InvertedIndexDistinctCostHeuristicTest.java
Outdated
Show resolved
Hide resolved
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkInvertedIndexDistinct.java
Outdated
Show resolved
Hide resolved
861a9ce to
1a07567
Compare
1a07567 to
1734ec6
Compare
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
|
|
||
| // ==================== Scan Path (Fallback) ==================== | ||
|
|
||
| private DistinctResultsBlock executeScanPath() { |
There was a problem hiding this comment.
Since we already have the filtered bitmap, short-circuit the project operator and directly read the filtered docs
1749c73 to
65a4371
Compare
Updated JMH Benchmark ResultsSetup: 1M docs, 4 cardinalities × 5 selectivities × 3 paths, Sorted index path (μs/op, lower is better)
Bitmap inverted index path (μs/op)
Scan path (μs/op)
Cost heuristic crossover validation
Key takeaways
Raw JMH output |
Adds a new execution path for single-column DISTINCT queries that leverages dictionary + inverted index to avoid the scan/projection pipeline entirely. Three paths chosen at runtime: sorted index (merge-iterate filter vs contiguous ranges), bitmap inverted index (intersects() per dictionary entry), and scan fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
65a4371 to
bb6f548
Compare
Exploit sorted dictionary ordering to enable early termination in both the inverted index and sorted index paths. For ORDER BY ASC, iterate forward and stop after finding LIMIT matching values. For ORDER BY DESC, iterate backward. This avoids scanning all dictionary entries when only the top-N values are needed. Benchmark shows 10x-100,000x speedup for LIMIT queries depending on dictionary cardinality and filter selectivity (e.g., 100K cardinality at 100% selectivity: 4,007us -> 0.037us with LIMIT 10). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| // Process null handling: exclude null docs from filter and determine if nulls are present | ||
| NullFilterResult nullResult = processNullDocs(filteredDocIds); | ||
| ImmutableRoaringBitmap nonNullFilteredDocIds = nullResult._nonNullFilteredDocIds; | ||
|
|
There was a problem hiding this comment.
When null handling is enabled and the filtered docs contain nulls, this operator never reserves a slot for null while collecting dictIds (DictIdDistinctTable never calls addNull()). If the query has a LIMIT and no ORDER BY, this can fill the table to LIMIT dictIds and later cause the broker-side toResultTableWithoutOrderBy() logic to drop null (it only includes null when numValues < limit). Consider calling dictIdTable.addNull() as soon as nullResult indicates nulls are present (before addDictId loop), or otherwise ensure at most (limit-1) non-null values are collected when null is present.
| if (nullResult._hasNull) { | |
| // Reserve one slot in the distinct table for null so that at most (limit - 1) non-null values are collected | |
| dictIdTable.addNull(); | |
| } |
| @Test | ||
| public void testNullIncludedWithWideFilter() { | ||
| _activeSegment = _nullSegment; | ||
|
|
||
| BaseOperator<DistinctResultsBlock> op = getOperator( | ||
| "SELECT DISTINCT intColumn FROM testTable WHERE filterColumn >= 0 LIMIT 1000 " | ||
| + OPT + ", invertedIndexDistinctCostRatio=1, enableNullHandling=true)"); | ||
| DistinctTable table = op.nextBlock().getDistinctTable(); |
There was a problem hiding this comment.
The null-handling tests validate server-side DistinctTable contents, but they don’t exercise broker-side limiting behavior (DistinctTable#toResultTableWithoutOrderBy only includes null when numValues < limit). Please add a regression test that calls toResultTable() (or otherwise verifies final results) for a query with nulls, no ORDER BY, and LIMIT equal to the number of non-null distinct values so that null must be preserved under the limit.
Summary
Adds
InvertedIndexDistinctOperator— a new execution path for single-columnDISTINCTqueries that leverages dictionary + inverted index to avoid the scan/projection pipeline entirely. Enabled via the existinguseIndexBasedDistinctOperator=truequery option (shared withJsonIndexDistinctOperator).Sample queries
Three execution paths, chosen automatically at runtime:
dictCard × costRatio ≤ filteredDocs)intersects()checksKey design decisions
intersects()overand().isEmpty(): Short-circuits on the first common element, avoids full bitmap allocation.PeekableIntIterator.advanceIfNeeded()to merge-iterate the filter bitmap against contiguous doc ranges — no bitmap intersection needed.ORDER BY col ASC LIMIT Nonly needs the first N matching dictIds (forward iteration), andORDER BY col DESC LIMIT Niterates backward. This avoids scanning all dictionary entries, yielding 10x–100,000x speedups on LIMIT queries.buildFilteredDocIds()materializes the filter bitmap upfront. When the scan fallback is chosen, the bitmap is wrapped in aBitmapBasedFilterOperatorand passed to theProjectPlanNode, avoiding redundant filter re-evaluation.isSorted()check inDistinctPlanNode): Excludes mutable/consuming segments whose unsorted dictionaries would breakDictIdDistinctTable's assumption that dictId order = value order (required for correct ORDER BY pruning).nullHandlingEnabled, excludes null placeholder values from dictionary iteration and checks the null value vector separately against the filter bitmap to include null in results.DictIdDistinctTable(integer dictIds with dictionary-order comparator for ORDER BY), then converts to typed values once at the end — same pattern asDictionaryBasedSingleColumnDistinctExecutor.Query options
useIndexBasedDistinctOperator=true— enables the operator (opt-in, shared with JsonIndexDistinctOperator)invertedIndexDistinctCostRatio=N— overrides the per-cardinality default for the bitmap path heuristicChanges
InvertedIndexDistinctOperator.javaDistinctPlanNode.javaDictIdDistinctTable.javaQueryOptionsUtils.javainvertedIndexDistinctCostRatioquery optionCommonConstants.javainvertedIndexDistinctCostRatioquery option keyBenchmarkInvertedIndexDistinct.javaJMH benchmark results
Setup: 1M docs, 4 cardinalities × 5 selectivities × 3 paths,
@Fork(value=2, warmups=0),@Warmup(iterations=2),@Measurement(iterations=3), JDK 17Sorted index path (μs/op, lower is better)
Bitmap inverted index path (μs/op)
Scan path (μs/op)
ORDER BY + LIMIT 10 early termination (μs/op)
Exploits dictId order = value order: ASC iterates forward, DESC iterates backward, both stop after finding 10 matching values.
Key takeaways
filteredDocs >> dictCard(e.g., dictCard=100 at 10%+ selectivity: 1.5 μs vs 2,310 μs = 1,540x faster).Test plan
InvertedIndexDistinctOperatorTest— 17 tests: cost heuristic boundaries, inverted vs scan correctness, MV columns, STRING columns, sorted column path, ORDER BY/LIMIT, null handling, all data typesOfflineClusterIntegrationTest.testDistinctWithInvertedIndex— integration testBenchmarkInvertedIndexDistinct(3 paths × 20 param combos + 3 LIMIT variants)🤖 Generated with Claude Code