feat: Add mergeBuffer/maxSpillProximity metric for groupBy spill diagnosis by aho135 · Pull Request #19627 · apache/druid

aho135 · 2026-06-24T18:23:11Z

Description

When a groupBy query runs, ConcurrentGrouper divides the single acquired merge buffer into druid.processing.numThreads equal slices (sliceSize = capacity / numThreads) and gives one slice to each processing thread. A query spills to disk as soon as its fullest single slice fills — at roughly sizeBytes / numThreads, which can be far below the configured druid.processing.buffer.sizeBytes.

The existing metrics do not let an operator see this. mergeBuffer/maxBytesUsed is a per-query sum across slices, further discounted by the hash-table load factor, so it never approaches sizeBytes even while queries are actively spilling — making it impossible to compare against druid.processing.buffer.sizeBytes or to reason about spill pressure.

Concretely, an operator with sizeBytes = 125 MiB and numThreads = 240 (slices ≈ 546 KiB) saw groupBy/spilledQueries climbing while mergeBuffer/maxBytesUsed sat around ~60 MB, which looks contradictory until you account for slicing.

Change

This PR adds mergeBuffer/maxSpillProximity, a dimensionless gauge in [0.0, 1.0]:

maxSpillProximity = maxSliceUsedBytes / (sliceSize × maxLoadFactor),  clamped to [0, 1]
                    ↑ MAX across a query's slices, then MAX across queries

It is computed per-slice and maxed (never summed), because the fullest slice is what actually triggers a spill.
The denominator is sliceSize × maxLoadFactor (default load factor 0.7), because a BufferHashGrouper spills when its bucket count reaches the load factor, not when the slice is byte-full. This makes 1.0 correspond to the real spill point.

Operators can read mergeBuffer/maxSpillProximity alongside groupBy/spilledQueries: a value near 1.0 means slices are saturating, and the fix is to widen each slice by raising druid.processing.buffer.sizeBytes or lowering druid.processing.numThreads.

Changed files

GroupByStatsProvider — track per-slice max used bytes and the per-slice spill threshold; add getSpillProximity() (clamped to [0,1]); aggregate as a max across queries.
SpillingGrouper — report each slice's peak usage against its spill threshold in close().
BufferHashGrouper — expose resolveMaxLoadFactor() so the metric denominator matches the grouper's actual spill decision (including the default-resolution rule).
GroupByStatsMonitor — emit mergeBuffer/maxSpillProximity.
docs/operations/metrics.md — document the new metric and clarify the slicing semantics of mergeBuffer/bytesUsed and mergeBuffer/maxBytesUsed.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
a release note entry in the PR description. (New metric mergeBuffer/maxSpillProximity; no behavior or config changes.)
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests. (N/A — metric is covered by unit tests.)
been tested in a test Druid cluster.

…nosis The merge buffer is sliced into druid.processing.numThreads slices by ConcurrentGrouper, and a groupBy query spills as soon as its single fullest slice fills (~ sizeBytes/numThreads). Existing metrics could not explain this: mergeBuffer/maxBytesUsed is a per-query SUM across slices, discounted by the hash-table load factor, so it never approaches sizeBytes even while queries spill, making it impossible to compare against druid.processing.buffer.sizeBytes. Add mergeBuffer/maxSpillProximity, a dimensionless gauge in [0.0, 1.0]: the fullest single slice's used bytes divided by its spill threshold (sliceSize * maxLoadFactor), tracked as a max across slices and across queries. 1.0 means a query reached the point at which a slice spills to disk. - GroupByStatsProvider: track per-slice max used bytes and spill threshold; expose getSpillProximity() (clamped to [0,1]); aggregate as a max. - SpillingGrouper: report each slice's peak usage against its threshold. - BufferHashGrouper: expose resolveMaxLoadFactor() so the denominator matches the grouper's actual spill decision. - GroupByStatsMonitor: emit mergeBuffer/maxSpillProximity. - Clarify mergeBuffer/bytesUsed and maxBytesUsed docs (slicing semantics). Existing emitted metric names and values are unchanged.

aho135 · 2026-06-24T18:26:54Z

Hey @GWphua! We were using the mergeBuffer/maxBytesUsed metric for tuning mergeBuffer size but found that it wasn't a very accurate indicator for spilling. Let me know if you have any thoughts on this. Thanks!

FrankChen021

Severity	Findings
P0	0
P1	0
P2	1
P3	0
Total	1

Reviewed 7 of 7 changed files.

This is an automated review by Codex GPT-5.5

FrankChen021 · 2026-06-25T12:16:43Z

    {
-      maxMergeBufferUsedBytes.addAndGet(bytes);
+      maxSliceUsedBytes.accumulateAndGet(usedBytes, Math::max);
+      sliceSpillThresholdBytes.accumulateAndGet(spillThresholdBytes, Math::max);


[P2] Track spill proximity as a per-slice ratio

This stores the maximum used bytes and maximum threshold independently, which breaks when one query reports groupers with different thresholds. Nested/subtotal processing can pass the same PerQueryStats through a sliced ConcurrentGrouper and later full-buffer SpillingGroupers; if a small slice reaches its spill threshold, the larger full-buffer threshold can be retained here and getSpillProximity() will divide by that larger value, under-reporting the slice spill as roughly 1 / concurrencyHint. Please track the maximum usedBytes / spillThresholdBytes per sliceUsage call, or otherwise keep the used/threshold pair together, so mixed grouper sizes still report the true max proximity.

github-actions Bot added the Area - Documentation label Jun 24, 2026

aho135 changed the title ~~Add mergeBuffer/maxSpillProximity metric for groupBy spill diagnosis~~ feat: Add mergeBuffer/maxSpillProximity metric for groupBy spill diagnosis Jun 24, 2026

aho135 requested a review from GWphua June 24, 2026 18:23

FrankChen021 reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add mergeBuffer/maxSpillProximity metric for groupBy spill diagnosis#19627

feat: Add mergeBuffer/maxSpillProximity metric for groupBy spill diagnosis#19627
aho135 wants to merge 1 commit into
apache:masterfrom
aho135:mergeBuffer-metrics

aho135 commented Jun 24, 2026 •

edited

Loading

Uh oh!

aho135 commented Jun 24, 2026

Uh oh!

FrankChen021 left a comment

Uh oh!

FrankChen021 Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

aho135 commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Change

Changed files

Uh oh!

aho135 commented Jun 24, 2026

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

FrankChen021 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aho135 commented Jun 24, 2026 •

edited

Loading