Skip to content

[pinot-server/ proactive-query-killing] (2/2) integrate query scan cost based instrumentation and killing in server operators#18475

Open
anuragrai16 wants to merge 6 commits into
apache:masterfrom
anuragrai16:queryKillIntegrationProduction
Open

[pinot-server/ proactive-query-killing] (2/2) integrate query scan cost based instrumentation and killing in server operators#18475
anuragrai16 wants to merge 6 commits into
apache:masterfrom
anuragrai16:queryKillIntegrationProduction

Conversation

@anuragrai16
Copy link
Copy Markdown
Contributor

@anuragrai16 anuragrai16 commented May 12, 2026

Follow up for SPI Changes - #18102

Summary

Scan cost instrumentation across all operators
Instruments 11 query operators and DocIdSetOperator to push numDocsScanned,
numEntriesScannedInFilter, and numEntriesScannedPostFilter deltas into the per-query
QueryScanCostContext on every block iteration, enabling cooperative scan-based query killing
at block boundaries. The getScanCostContext() helper is consolidated into BaseOperator as a
protected static method (removing 11 duplicate copies), and checkScanBasedKilling() is
called from both checkTermination() and checkTerminationAndSampleUsage() so combine-path
operators are also covered.

Per-table scanKillingMode override
Adds a scanKillingMode field to QueryConfig (validated at construction, not silently at
query time) allowing individual tables to override the cluster-level kill mode. The resolved
mode is stored as a volatile ScanKillingMode on QueryExecutionContext during query
initialization and applied in QueryKillingManager.checkAndKillWithStrategy() ahead of the
cluster config — enabling patterns like cluster=logOnly + table=enforce for targeted
enforcement without a cluster-wide mode change. Invalid mode strings fail fast at table config
submission rather than degrading silently at query time.

QueryKillingManager hardening
Wires QueryKillingManager into BaseServerStarter for singleton initialization and ZK
cluster config change listener registration, so scan killing thresholds and mode can be tuned
live without server restart. Adds synchronized onChange() for atomic config/strategy
rebuilds, a local currentStrategy snapshot in checkAndKillIfNeeded to eliminate a TOCTOU
race on config changes, and a warn log when an unexpected type is found in the cached strategy
slot.

Design: Two-path kill check

The scan-based killing integration has two cooperating paths:

  1. Cost accumulation (per-operator): Each leaf operator pushes scan cost deltas into the shared QueryScanCostContext after each block.
  2. Kill enforcement (BaseOperator.checkTermination() / checkTerminationAndSampleUsage()): On every nextBlock() call across the operator tree, the killing manager evaluates whether accumulated cost exceeds thresholds and terminates the query if so.

Test plan

  • Unit tests pass (54 tests): QueryKillReportTest (4), QueryKillingManagerTest (20, including 6 new), QueryMonitorConfigScanKillingTest (9), CompositeQueryKillingStrategyTest (8), ScanEntriesThresholdStrategyTest (13)

Functional testing plan on quick start for all scenarios

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 12, 2026

Codecov Report

❌ Patch coverage is 69.35484% with 76 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.68%. Comparing base (5fd3183) to head (87f0332).
⚠️ Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/core/query/killing/QueryKillingManager.java 83.14% 7 Missing and 8 partials ⚠️
...core/query/executor/ServerQueryExecutorV1Impl.java 51.85% 7 Missing and 6 partials ⚠️
...a/org/apache/pinot/core/operator/BaseOperator.java 70.58% 1 Missing and 4 partials ⚠️
...killing/strategy/ScanEntriesThresholdStrategy.java 78.26% 0 Missing and 5 partials ⚠️
...pinot/core/operator/query/AggregationOperator.java 20.00% 3 Missing and 1 partial ⚠️
...re/operator/query/FilteredAggregationOperator.java 20.00% 3 Missing and 1 partial ⚠️
...t/core/operator/query/FilteredGroupByOperator.java 20.00% 3 Missing and 1 partial ⚠️
...che/pinot/core/operator/query/GroupByOperator.java 20.00% 3 Missing and 1 partial ⚠️
...uery/SelectionPartiallyOrderedByDescOperation.java 20.00% 3 Missing and 1 partial ⚠️
...ery/SelectionPartiallyOrderedByLinearOperator.java 20.00% 3 Missing and 1 partial ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##             master   #18475    +/-   ##
==========================================
  Coverage     63.68%   63.68%            
- Complexity     1682     1684     +2     
==========================================
  Files          3262     3266     +4     
  Lines        199826   200057   +231     
  Branches      31031    31073    +42     
==========================================
+ Hits         127255   127415   +160     
- Misses        62421    62468    +47     
- Partials      10150    10174    +24     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.68% <69.35%> (+<0.01%) ⬆️
temurin 63.68% <69.35%> (+<0.01%) ⬆️
unittests 63.68% <69.35%> (+<0.01%) ⬆️
unittests1 55.83% <62.29%> (+0.05%) ⬆️
unittests2 34.95% <18.14%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

"Query '%s' (requestId=%d) on table '%s' was killed because '%s' (%,d) exceeded the threshold (%,d) "
+ "configured in %s. "
+ "At kill time: entriesScannedInFilter=%,d, docsScanned=%,d, "
+ "entriesScannedPostFilter=%,d, elapsedMs=%d. "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Reporting entriesScannedPostFilter here makes the feature look wired end-to-end, but this PR never actually makes that metric meaningful. None of the new operator instrumentation calls QueryScanCostContext.addEntriesScannedPostFilter(), and the default ScanEntriesThresholdStrategy still ignores the corresponding cluster/table thresholds. Any deployment that sets maxEntriesScannedPostFilter will therefore get a silent no-op guardrail while this message still prints the field. Please either plumb post-filter accounting through the operators and strategy, or drop the advertised threshold from this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated the logic to instrument maxEntriesScannedPostFilter measurement and killing check as well

@anuragrai16 anuragrai16 changed the title [pinot-server/ proactive-query-killing] (2/2) integrate query scan cost based instrumentation and killing in server operators [Draft][pinot-server/ proactive-query-killing] (2/2) integrate query scan cost based instrumentation and killing in server operators May 12, 2026
@anuragrai16 anuragrai16 changed the title [Draft][pinot-server/ proactive-query-killing] (2/2) integrate query scan cost based instrumentation and killing in server operators [pinot-server/ proactive-query-killing] (2/2) integrate query scan cost based instrumentation and killing in server operators May 13, 2026
anuragrai16 and others added 3 commits May 14, 2026 10:28
Extends QueryConfig with a scanKillingMode field (disabled/logOnly/enforce)
that lets individual tables override the cluster-level scan killing mode.
Stores the resolved mode as a volatile field on QueryExecutionContext during
query init and applies it in QueryKillingManager.checkAndKillWithStrategy()
ahead of the cluster config — enabling patterns like cluster=logOnly +
table=enforce for targeted enforcement without a cluster-wide mode change.
Invalid mode strings are rejected at QueryConfig construction time via
Preconditions. Includes 4 new manager tests and 7 new QueryConfig tests.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
QueryKillingManager.onChange() received raw ZK keys with the full
"pinot.query.scheduler." prefix but passed them directly to
QueryMonitorConfig's update constructor, which checks for keys
without that prefix (e.g. "accounting.scan.based.killing.mode").
The mismatch caused all dynamic config changes to be silently ignored,
requiring a server restart for scan-killing config to take effect.

Strip the prefix before passing to QueryMonitorConfig, matching the
key space the init constructor uses. Add early return when no relevant
keys changed, and log the applied config values after rebuild.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants