feat: enhance U-shape idle prediction for scale-down scenarios#19562
Conversation
FrankChen021
left a comment
There was a problem hiding this comment.
I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.
Reviewed 3 of 3 changed files.
This is an automated review by Codex GPT-5.5
| * Exponent (< 1) for sublinear busy redistribution in the idle projection: busy grows as | ||
| * {@code (currentTaskCount / proposedTaskCount)^EXPONENT}, not linearly. Calibrated as log2(1.25) ~= 0.32. | ||
| */ | ||
| static final double IDLE_SUBLINEARITY_EXPONENT = 0.32; |
There was a problem hiding this comment.
Why this specific value?
I feel that these constants somehow make the behaviour of the auto-scaler more difficult to reason about as well as control via the weights.
We should probably not be trying to predict the idleness in this manner anyway.
Instead, as an alternative, we should probably consider using the avg. processing rate * task count * idle ratio as a measure of total work that needs to be done. Assuming that the processing rate remains the same, we can find the new idle ratio for the new task count.
There was a problem hiding this comment.
Why this specific value?
This is already explained in PR description: we want to redistribute projected busyness in a sub-linear way, basically making it U-shaped (instead of V-shaped, like it was previously). It is also proved by theoretical math behind it (see plots).
Regarding an alternative -- I'd encourage you to elaborate more, please.
There was a problem hiding this comment.
Ah, thanks!
I guess you mean,
In reality, busy grows sublinearly (an observed 2× consolidation raised busy ~1.25×, not 2×).
Do you mean that when task count is halved, idle ratio increases by 1.25x?
There was a problem hiding this comment.
For the alternative, I meant something along these lines
current workload
= total records per second
= current processing rate * current task count * idle ratio
assumptions:
target workload = current workload
predicted processing rate = current processing rate
predicted idle ratio
= target workload / (target task count * predicted processing rate)
= current workload / (target task count * current processing rate)
Let me know if that makes sense.
kfaraz
left a comment
There was a problem hiding this comment.
Approving this for now since the experiments suggest that this helps with idle prediction.
|
|
||
| /** | ||
| * Exponent (< 1) for sublinear busy redistribution in the idle projection: busy grows as | ||
| * {@code (currentTaskCount / proposedTaskCount)^EXPONENT}, not linearly. Calibrated as log2(1.25) ~= 0.32. |
There was a problem hiding this comment.
Please clarify here that this value was determined empirically and it was seen that when task count is halved, idle ratio increases roughly by 1.25.
Description
The cost-based supervisor autoscaler wouldn't scale down a healthy, over-provisioned supervisor - one above the ideal idle ratio with low lag stayed pinned at its current task count.
Root cause. The idle projection was linear:
rawIdle = 1.0 - busyFraction / taskRatio; // taskRatio = proposed / currentThis assumes busy time is fully conserved when work moves onto fewer tasks, so a reasonable consolidation projects negative idle
(e.g. 1 − 0.6/0.5 =−0.2). That clamps to 0 (the worst point of the U-shaped idle cost) and turns an overrun into phantom virtual lag — pinning the task count even at ~0 real lag. In reality, busy grows sublinearly (an observed 2× consolidation raised busy ~1.25×, not 2×).Fix. Redistribute busy sublinearly:
IDLE_SUBLINEARITY_EXPONENT = 0.32 (≈ log₂(1.25))is a tuned constant based on careful testing and theoretical math application.A healthy consolidation now lands near the ideal idle ratio instead of going negative, so the supervisor scales down; the exponent stays > 0, so extreme over-consolidation still diverges and is broken.
Validation (plots under hood)
Details
Optimal task count vs. observed poll-idle ratio, across realistic configs (rate = total cluster throughput, split per-task):Old version stays pinned at 128 until idle ~0.55, while new version consolidates from ~0.32.
Safe under load: new version consolidates earlier on the high-idle side, but at low idle both still jump to max — lag-driven scale-up is unaffected.
The existing version is flat (pinned at max by the phantom overrun); new version consolidates and holds more tasks as lag weight rises.
Release note
Fixed an issue where the cost-based supervisor autoscaler would not scale down an over-provisioned supervisor running above its ideal idle ratio with low lag.