Skip to content

Support consuming tier overrides for realtime consuming segments#18480

Open
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/consuming-tier-overrides-master
Open

Support consuming tier overrides for realtime consuming segments#18480
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/consuming-tier-overrides-master

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented May 12, 2026

Summary

  • Reuse the existing tierOverwrites mechanism for mutable realtime consuming segments by applying the synthetic tierOverwrites.consuming view when building the consuming RealtimeSegmentConfig.
  • Treat lifecycle names explicitly: completed always maps to the base table config, while consuming is the synthetic mutable-consuming view unless a legacy real storage tier already owns that name.
  • Apply real storage-tier overrides only when the segment tier matches a configured tierConfigs entry. Unknown tierOverwrites.<tier> keys are rejected, and tierOverwrites.completed is rejected because completed segments use the base table config.
  • Source mutable-segment settings from the effective consuming view, including field index configs, null handling, aggregation, multi-column text index config, and segment partition config.
  • Keep committed realtime segments, immutable segment load, and offline tables on the persisted table config unless an actual configured storage tier applies its own override.
  • Validate tier overrides generically: collect every overridden tier name, apply the override first, then run the normal table config validation path on the effective table config. There is no separate consuming-tier validation allowlist.

User Manual

Configure the completed/committed segment shape as the normal table config, then add tierOverwrites.consuming where the mutable consuming segment should differ.

Example: keep userId RAW and without an inverted index after commit, but use dictionary encoding plus an inverted index while the segment is consuming:

{
  "tableIndexConfig": {
    "noDictionaryColumns": ["userId"],
    "tierOverwrites": {
      "consuming": {
        "noDictionaryColumns": []
      }
    }
  },
  "fieldConfigList": [
    {
      "name": "userId",
      "encodingType": "RAW",
      "tierOverwrites": {
        "consuming": {
          "encodingType": "DICTIONARY",
          "indexes": {
            "inverted": {
              "enabled": true
            }
          }
        }
      }
    }
  ]
}

Query example:

SELECT COUNT(*)
FROM userEvents
WHERE userId = 'u123';

Notes:

  • Use the base table config for completed/committed segment behavior. Do not configure tierOverwrites.completed.
  • Other storage-tier override names must match entries in tierConfigs.
  • You normally do not need a tierConfigs entry named consuming for this feature.
  • If a table already uses consuming as a real storage tier name, Pinot preserves that existing storage-tier behavior and does not treat tierOverwrites.consuming as the synthetic mutable-consuming override for that table.
  • The persisted table config remains the source of truth for committed realtime segments, immutable segment load, and offline tables unless a real storage tier later applies its own override.
  • Both the persisted table config and every effective tier-overwritten view must be valid under normal table config validation.
  • If the persisted column is listed in tableIndexConfig.noDictionaryColumns or noDictionaryConfig, clear that setting under tableIndexConfig.tierOverwrites.consuming so the consuming view can enable dictionary encoding.

Validation

  • ./mvnw -pl pinot-segment-local -Dtest=IndexLoadingConfigTest,TableConfigConsumingSegmentTierOverrideTest,TableConfigUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-segment-local spotless:apply license:format checkstyle:check license:check
  • ./mvnw -pl pinot-integration-tests -am -Dskip.npm=true -Dtest=ConsumingSegmentTierOverrideRealtimeTest -Dsurefire.failIfNoSpecifiedTests=false -Dmaven.build.cache.enabled=false test
  • git diff --check
  • Local agent code review: no critical/high/major issues found.

@xiangfu0 xiangfu0 force-pushed the codex/consuming-tier-overrides-master branch from 0de8a19 to ee9297c Compare May 12, 2026 22:37
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 12, 2026

Codecov Report

❌ Patch coverage is 83.06452% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.70%. Comparing base (c35dee5) to head (dd7fe99).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...he/pinot/segment/local/utils/TableConfigUtils.java 82.40% 6 Missing and 13 partials ⚠️
...ealtime/writer/StatelessRealtimeSegmentWriter.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18480      +/-   ##
============================================
+ Coverage     63.68%   63.70%   +0.01%     
  Complexity     1685     1685              
============================================
  Files          3265     3266       +1     
  Lines        199745   199935     +190     
  Branches      31013    31050      +37     
============================================
+ Hits         127215   127367     +152     
- Misses        62395    62414      +19     
- Partials      10135    10154      +19     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.70% <83.06%> (+0.01%) ⬆️
temurin 63.70% <83.06%> (+0.01%) ⬆️
unittests 63.70% <83.06%> (+0.01%) ⬆️
unittests1 55.83% <36.29%> (+0.05%) ⬆️
unittests2 34.98% <81.45%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the codex/consuming-tier-overrides-master branch from ee9297c to 952a398 Compare May 13, 2026 01:47
@xiangfu0 xiangfu0 added configuration Config changes (addition/deletion/change in behavior) real-time Related to realtime table ingestion and serving labels May 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds support for applying tierOverwrites.consuming as a synthetic tier when constructing mutable realtime consuming segments, without changing immutable/committed segment tier semantics.

Changes:

  • Build RealtimeSegmentConfig for consuming segments from a tier-overwritten table-config view when tierOverwrites.consuming is present and consuming is not a real storage tier.
  • Prevent synthetic consuming overrides from affecting storage-tier index loading (IndexLoadingConfig).
  • Add unit/integration tests plus a tools example config + README walkthrough.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pinot-tools/src/main/resources/examples/stream/consumingSegmentTierOverride/userEvents_realtime_table_config.json Adds a sample realtime table config demonstrating consuming-tier overrides.
pinot-tools/src/main/resources/examples/stream/consumingSegmentTierOverride/README.md Documents how to use tierOverwrites.consuming and its constraints.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/utils/TableConfigUtilsTest.java Ensures consuming remains valid as a real storage tier name.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/utils/TableConfigConsumingSegmentTierOverrideTest.java Adds unit tests for consuming-tier override behavior and validation rules.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfigTest.java Tests synthetic consuming tier is not applied as a storage tier, while real tiers still are.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java Implements consuming-tier override detection, validation, and config builder for consuming segments.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java Skips tier overwrite application for the synthetic consuming tier during storage-tier loading.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/writer/StatelessRealtimeSegmentWriter.java Keeps stateless reingestion on the non-consuming override path (comment clarifies rationale).
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/CustomDataQueryClusterIntegrationTest.java Exposes shared server starters to support segment inspection in new tests.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/ConsumingSegmentTierOverrideRealtimeTest.java End-to-end test verifying consuming segments get extra indexes while immutable segments keep persisted shape.
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java Switches consuming segment builder creation to the new helper that applies synthetic consuming overrides.

@xiangfu0 xiangfu0 force-pushed the codex/consuming-tier-overrides-master branch 4 times, most recently from 155c851 to 534d356 Compare May 13, 2026 08:46
@xiangfu0 xiangfu0 force-pushed the codex/consuming-tier-overrides-master branch from 534d356 to dd7fe99 Compare May 13, 2026 23:39
Copy link
Copy Markdown
Contributor Author

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one high-signal correctness issue; see inline comment.

.setAggregateMetrics(indexingConfig.isAggregateMetrics())
.setIngestionAggregationConfigs(IngestionConfigUtils.getAggregationConfigs(tableConfig))
.setDefaultNullHandlingEnabled(_defaultNullHandlingEnabled)
.setAggregateMetrics(consumingSegmentIndexingConfig.isAggregateMetrics())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now treats tierOverwrites.consuming.aggregateMetrics as a mutable-segment setting, but aggregateMetrics changes row semantics, not just index layout. MutableSegmentImpl will collapse rows while the segment is consuming, and RealtimeSegmentConverter then writes those already-collapsed rows into the immutable segment even though the persisted table config still has aggregateMetrics=false. A consuming-only override can therefore permanently change query results after commit; this path should reject or ignore non-index knobs like aggregateMetrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Config changes (addition/deletion/change in behavior) real-time Related to realtime table ingestion and serving

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants