Support consuming tier overrides for realtime consuming segments#18480
Support consuming tier overrides for realtime consuming segments#18480xiangfu0 wants to merge 1 commit into
Conversation
0de8a19 to
ee9297c
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18480 +/- ##
============================================
+ Coverage 63.68% 63.70% +0.01%
Complexity 1685 1685
============================================
Files 3265 3266 +1
Lines 199745 199935 +190
Branches 31013 31050 +37
============================================
+ Hits 127215 127367 +152
- Misses 62395 62414 +19
- Partials 10135 10154 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ee9297c to
952a398
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds support for applying tierOverwrites.consuming as a synthetic tier when constructing mutable realtime consuming segments, without changing immutable/committed segment tier semantics.
Changes:
- Build
RealtimeSegmentConfigfor consuming segments from a tier-overwritten table-config view whentierOverwrites.consumingis present andconsumingis not a real storage tier. - Prevent synthetic consuming overrides from affecting storage-tier index loading (
IndexLoadingConfig). - Add unit/integration tests plus a tools example config + README walkthrough.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-tools/src/main/resources/examples/stream/consumingSegmentTierOverride/userEvents_realtime_table_config.json | Adds a sample realtime table config demonstrating consuming-tier overrides. |
| pinot-tools/src/main/resources/examples/stream/consumingSegmentTierOverride/README.md | Documents how to use tierOverwrites.consuming and its constraints. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/utils/TableConfigUtilsTest.java | Ensures consuming remains valid as a real storage tier name. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/utils/TableConfigConsumingSegmentTierOverrideTest.java | Adds unit tests for consuming-tier override behavior and validation rules. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfigTest.java | Tests synthetic consuming tier is not applied as a storage tier, while real tiers still are. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java | Implements consuming-tier override detection, validation, and config builder for consuming segments. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java | Skips tier overwrite application for the synthetic consuming tier during storage-tier loading. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/writer/StatelessRealtimeSegmentWriter.java | Keeps stateless reingestion on the non-consuming override path (comment clarifies rationale). |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/CustomDataQueryClusterIntegrationTest.java | Exposes shared server starters to support segment inspection in new tests. |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/ConsumingSegmentTierOverrideRealtimeTest.java | End-to-end test verifying consuming segments get extra indexes while immutable segments keep persisted shape. |
| pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java | Switches consuming segment builder creation to the new helper that applies synthetic consuming overrides. |
155c851 to
534d356
Compare
534d356 to
dd7fe99
Compare
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal correctness issue; see inline comment.
| .setAggregateMetrics(indexingConfig.isAggregateMetrics()) | ||
| .setIngestionAggregationConfigs(IngestionConfigUtils.getAggregationConfigs(tableConfig)) | ||
| .setDefaultNullHandlingEnabled(_defaultNullHandlingEnabled) | ||
| .setAggregateMetrics(consumingSegmentIndexingConfig.isAggregateMetrics()) |
There was a problem hiding this comment.
This now treats tierOverwrites.consuming.aggregateMetrics as a mutable-segment setting, but aggregateMetrics changes row semantics, not just index layout. MutableSegmentImpl will collapse rows while the segment is consuming, and RealtimeSegmentConverter then writes those already-collapsed rows into the immutable segment even though the persisted table config still has aggregateMetrics=false. A consuming-only override can therefore permanently change query results after commit; this path should reject or ignore non-index knobs like aggregateMetrics.
Summary
tierOverwritesmechanism for mutable realtime consuming segments by applying the synthetictierOverwrites.consumingview when building the consumingRealtimeSegmentConfig.completedalways maps to the base table config, whileconsumingis the synthetic mutable-consuming view unless a legacy real storage tier already owns that name.tierConfigsentry. UnknowntierOverwrites.<tier>keys are rejected, andtierOverwrites.completedis rejected because completed segments use the base table config.consumingview, including field index configs, null handling, aggregation, multi-column text index config, and segment partition config.User Manual
Configure the completed/committed segment shape as the normal table config, then add
tierOverwrites.consumingwhere the mutable consuming segment should differ.Example: keep
userIdRAW and without an inverted index after commit, but use dictionary encoding plus an inverted index while the segment is consuming:{ "tableIndexConfig": { "noDictionaryColumns": ["userId"], "tierOverwrites": { "consuming": { "noDictionaryColumns": [] } } }, "fieldConfigList": [ { "name": "userId", "encodingType": "RAW", "tierOverwrites": { "consuming": { "encodingType": "DICTIONARY", "indexes": { "inverted": { "enabled": true } } } } } ] }Query example:
Notes:
tierOverwrites.completed.tierConfigs.tierConfigsentry namedconsumingfor this feature.consumingas a real storage tier name, Pinot preserves that existing storage-tier behavior and does not treattierOverwrites.consumingas the synthetic mutable-consuming override for that table.tableIndexConfig.noDictionaryColumnsornoDictionaryConfig, clear that setting undertableIndexConfig.tierOverwrites.consumingso the consuming view can enable dictionary encoding.Validation
./mvnw -pl pinot-segment-local -Dtest=IndexLoadingConfigTest,TableConfigConsumingSegmentTierOverrideTest,TableConfigUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-segment-local spotless:apply license:format checkstyle:check license:check./mvnw -pl pinot-integration-tests -am -Dskip.npm=true -Dtest=ConsumingSegmentTierOverrideRealtimeTest -Dsurefire.failIfNoSpecifiedTests=false -Dmaven.build.cache.enabled=false testgit diff --check