Default Stats Send Mode SAFE -> ALWAYS#18367
Conversation
|
cc @dang-stripe |
|
@gortiz @suvodeep-pyne please add the |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18367 +/- ##
============================================
+ Coverage 63.43% 63.69% +0.25%
- Complexity 1683 1684 +1
============================================
Files 3253 3262 +9
Lines 198841 199835 +994
Branches 30795 31034 +239
============================================
+ Hits 126136 127282 +1146
+ Misses 62625 62407 -218
- Partials 10080 10146 +66
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal compatibility issue; see inline comment.
| /// running 1.3.0 may fail, which breaks backward compatibility. | ||
| public static final String KEY_OF_SEND_STATS_MODE = "pinot.query.mse.stats.mode"; | ||
| public static final String DEFAULT_SEND_STATS_MODE = "SAFE"; | ||
| public static final String DEFAULT_SEND_STATS_MODE = "ALWAYS"; |
There was a problem hiding this comment.
Changing the default here bypasses the SAFE compatibility check for every cluster that never set pinot.query.mse.stats.mode. SendStatsPredicate still documents that 1.3.x and lower can return incorrect stats or fail when unexpected upstream stats arrive, so this turns mixed-version rollouts into a behavior-breaking default change. This needs an explicit migration boundary or rollout plan instead of flipping the default constant.
There was a problem hiding this comment.
I don't think there's a safe way do this as an explicit migration boundary. This requires coordination between nodes on old versions (which we can't change) and new versions. Otherwise we'd need to use more stable mechanisms of relying on ZK metadata that would have existed as of <=1.3, none of which I think are suitable here.
#15890 is a documented case of how using ZK watchers led to more instability and the partial fix PR comments also mention that we should go to default ALWAYS eventually.
I think the existing ZK risk >> the risk of an MSE user upgrading from <= 1.3 to >= 1.5 without seeing this PR if we label it
I updated the PR description to make this more clear.
|
@Jackie-Jiang @xiangfu0 Is there any further discussion needed here or can we merge? |
gortiz
left a comment
There was a problem hiding this comment.
I think we can merge this change. My only recommendation is to remove the @Deprecated annotation on SAFE. It should not be the default, but it may be needed in the future.
Anyway, I'm working on another way to send the stats that should be more resilient (see #18458)
|
Docs update merged: pinot-contrib/pinot-docs#806 documents the new |
Summary
Improvements
This will further mitigate #15890 which was only partially mitigated in #15895 (comment) where there is still the risk of ZK instability risk for big clusters where operators haven't come across this setting.
Detailed Explanation
AI powered summary here.
In short,
Compatability