Skip to content

Default Stats Send Mode SAFE -> ALWAYS#18367

Merged
Jackie-Jiang merged 4 commits into
apache:masterfrom
satwik-pachigolla:patch-1
May 13, 2026
Merged

Default Stats Send Mode SAFE -> ALWAYS#18367
Jackie-Jiang merged 4 commits into
apache:masterfrom
satwik-pachigolla:patch-1

Conversation

@satwik-pachigolla
Copy link
Copy Markdown
Contributor

@satwik-pachigolla satwik-pachigolla commented Apr 29, 2026

Summary

  • Pinot 1.5 is out
  • Pinot clusters should not be upgraded more than one minor version at a time, as in they should not go from < 1.4 -> > 1.4 and must upgrade through 1.4 first
  • ALWAYS is strictly better than SAFE unless involving servers < 1.4

Improvements

This will further mitigate #15890 which was only partially mitigated in #15895 (comment) where there is still the risk of ZK instability risk for big clusters where operators haven't come across this setting.

Detailed Explanation

AI powered summary here.

In short,

ALWAYS with no break: All queries optimized > SAFE (no stats)
ALWAYS with break: Average(optimized, slow) > SAFE (no stats)
So yes, ALWAYS is strictly better than SAFE in 1.4.0+ even during a rolling upgrade!

Compatability

  • Unsafe to upgrade from pinot servers with versions < 1.4 (this commit is on 1.5), doing so may lead to query errors during the upgrade
  • Safe to upgrade from pinot servers with version >= 1.4
  • This only changes the default, avoiding affecting any explicit configurations.

@satwik-pachigolla satwik-pachigolla marked this pull request as ready for review April 29, 2026 03:43
@satwik-pachigolla
Copy link
Copy Markdown
Contributor Author

cc @dang-stripe

@satwik-pachigolla
Copy link
Copy Markdown
Contributor Author

@gortiz @suvodeep-pyne please add the upgrade-incompat label to signal just in case someone does intend to upgrade more than one minor version at once

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.69%. Comparing base (5b0b38c) to head (a138d5e).
⚠️ Report is 75 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18367      +/-   ##
============================================
+ Coverage     63.43%   63.69%   +0.25%     
- Complexity     1683     1684       +1     
============================================
  Files          3253     3262       +9     
  Lines        198841   199835     +994     
  Branches      30795    31034     +239     
============================================
+ Hits         126136   127282    +1146     
+ Misses        62625    62407     -218     
- Partials      10080    10146      +66     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.69% <ø> (+0.25%) ⬆️
temurin 63.69% <ø> (+0.25%) ⬆️
unittests 63.69% <ø> (+0.25%) ⬆️
unittests1 55.77% <ø> (+0.41%) ⬆️
unittests2 34.95% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one high-signal compatibility issue; see inline comment.

/// running 1.3.0 may fail, which breaks backward compatibility.
public static final String KEY_OF_SEND_STATS_MODE = "pinot.query.mse.stats.mode";
public static final String DEFAULT_SEND_STATS_MODE = "SAFE";
public static final String DEFAULT_SEND_STATS_MODE = "ALWAYS";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default here bypasses the SAFE compatibility check for every cluster that never set pinot.query.mse.stats.mode. SendStatsPredicate still documents that 1.3.x and lower can return incorrect stats or fail when unexpected upstream stats arrive, so this turns mixed-version rollouts into a behavior-breaking default change. This needs an explicit migration boundary or rollout plan instead of flipping the default constant.

Copy link
Copy Markdown
Contributor Author

@satwik-pachigolla satwik-pachigolla Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a safe way do this as an explicit migration boundary. This requires coordination between nodes on old versions (which we can't change) and new versions. Otherwise we'd need to use more stable mechanisms of relying on ZK metadata that would have existed as of <=1.3, none of which I think are suitable here.

#15890 is a documented case of how using ZK watchers led to more instability and the partial fix PR comments also mention that we should go to default ALWAYS eventually.

I think the existing ZK risk >> the risk of an MSE user upgrading from <= 1.3 to >= 1.5 without seeing this PR if we label it

I updated the PR description to make this more clear.

cc @Jackie-Jiang

@satwik-pachigolla
Copy link
Copy Markdown
Contributor Author

@Jackie-Jiang @xiangfu0 Is there any further discussion needed here or can we merge?

Copy link
Copy Markdown
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this change. My only recommendation is to remove the @Deprecated annotation on SAFE. It should not be the default, but it may be needed in the future.

Anyway, I'm working on another way to send the stats that should be more resilient (see #18458)

@Jackie-Jiang Jackie-Jiang added the upgrade-incompat PR may introduce incompatibility during upgrade of an installation label May 12, 2026
@satwik-pachigolla satwik-pachigolla requested a review from gortiz May 12, 2026 20:05
@Jackie-Jiang Jackie-Jiang merged commit e9518a3 into apache:master May 13, 2026
10 of 11 checks passed
@xiangfu0
Copy link
Copy Markdown
Contributor

Docs update merged: pinot-contrib/pinot-docs#806 documents the new pinot.query.mse.stats.mode default and the mixed-version upgrade caveat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

upgrade-incompat PR may introduce incompatibility during upgrade of an installation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants