feat: Add mergeBuffer/maxSpillProximity metric for groupBy spill diagnosis#19627
feat: Add mergeBuffer/maxSpillProximity metric for groupBy spill diagnosis#19627aho135 wants to merge 1 commit into
Conversation
…nosis The merge buffer is sliced into druid.processing.numThreads slices by ConcurrentGrouper, and a groupBy query spills as soon as its single fullest slice fills (~ sizeBytes/numThreads). Existing metrics could not explain this: mergeBuffer/maxBytesUsed is a per-query SUM across slices, discounted by the hash-table load factor, so it never approaches sizeBytes even while queries spill, making it impossible to compare against druid.processing.buffer.sizeBytes. Add mergeBuffer/maxSpillProximity, a dimensionless gauge in [0.0, 1.0]: the fullest single slice's used bytes divided by its spill threshold (sliceSize * maxLoadFactor), tracked as a max across slices and across queries. 1.0 means a query reached the point at which a slice spills to disk. - GroupByStatsProvider: track per-slice max used bytes and spill threshold; expose getSpillProximity() (clamped to [0,1]); aggregate as a max. - SpillingGrouper: report each slice's peak usage against its threshold. - BufferHashGrouper: expose resolveMaxLoadFactor() so the denominator matches the grouper's actual spill decision. - GroupByStatsMonitor: emit mergeBuffer/maxSpillProximity. - Clarify mergeBuffer/bytesUsed and maxBytesUsed docs (slicing semantics). Existing emitted metric names and values are unchanged.
|
Hey @GWphua! We were using the |
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 0 |
| P2 | 1 |
| P3 | 0 |
| Total | 1 |
Reviewed 7 of 7 changed files.
This is an automated review by Codex GPT-5.5
| { | ||
| maxMergeBufferUsedBytes.addAndGet(bytes); | ||
| maxSliceUsedBytes.accumulateAndGet(usedBytes, Math::max); | ||
| sliceSpillThresholdBytes.accumulateAndGet(spillThresholdBytes, Math::max); |
There was a problem hiding this comment.
[P2] Track spill proximity as a per-slice ratio
This stores the maximum used bytes and maximum threshold independently, which breaks when one query reports groupers with different thresholds. Nested/subtotal processing can pass the same PerQueryStats through a sliced ConcurrentGrouper and later full-buffer SpillingGroupers; if a small slice reaches its spill threshold, the larger full-buffer threshold can be retained here and getSpillProximity() will divide by that larger value, under-reporting the slice spill as roughly 1 / concurrencyHint. Please track the maximum usedBytes / spillThresholdBytes per sliceUsage call, or otherwise keep the used/threshold pair together, so mixed grouper sizes still report the true max proximity.
Description
When a groupBy query runs,
ConcurrentGrouperdivides the single acquired merge buffer intodruid.processing.numThreadsequal slices (sliceSize = capacity / numThreads) and gives one slice to each processing thread. A query spills to disk as soon as its fullest single slice fills — at roughlysizeBytes / numThreads, which can be far below the configureddruid.processing.buffer.sizeBytes.The existing metrics do not let an operator see this.
mergeBuffer/maxBytesUsedis a per-query sum across slices, further discounted by the hash-table load factor, so it never approachessizeByteseven while queries are actively spilling — making it impossible to compare againstdruid.processing.buffer.sizeBytesor to reason about spill pressure.Concretely, an operator with
sizeBytes = 125 MiBandnumThreads = 240(slices ≈ 546 KiB) sawgroupBy/spilledQueriesclimbing whilemergeBuffer/maxBytesUsedsat around ~60 MB, which looks contradictory until you account for slicing.Change
This PR adds
mergeBuffer/maxSpillProximity, a dimensionless gauge in[0.0, 1.0]:sliceSize × maxLoadFactor(default load factor0.7), because aBufferHashGrouperspills when its bucket count reaches the load factor, not when the slice is byte-full. This makes1.0correspond to the real spill point.Operators can read
mergeBuffer/maxSpillProximityalongsidegroupBy/spilledQueries: a value near1.0means slices are saturating, and the fix is to widen each slice by raisingdruid.processing.buffer.sizeBytesor loweringdruid.processing.numThreads.Changed files
GroupByStatsProvider— track per-slice max used bytes and the per-slice spill threshold; addgetSpillProximity()(clamped to[0,1]); aggregate as a max across queries.SpillingGrouper— report each slice's peak usage against its spill threshold inclose().BufferHashGrouper— exposeresolveMaxLoadFactor()so the metric denominator matches the grouper's actual spill decision (including the default-resolution rule).GroupByStatsMonitor— emitmergeBuffer/maxSpillProximity.docs/operations/metrics.md— document the new metric and clarify the slicing semantics ofmergeBuffer/bytesUsedandmergeBuffer/maxBytesUsed.This PR has:
mergeBuffer/maxSpillProximity; no behavior or config changes.)