Skip to content

feat: enhance U-shape idle prediction for scale-down scenarios#19562

Merged
Fly-Style merged 2 commits into
apache:masterfrom
Fly-Style:cba-enhance-ushape
Jun 25, 2026
Merged

feat: enhance U-shape idle prediction for scale-down scenarios#19562
Fly-Style merged 2 commits into
apache:masterfrom
Fly-Style:cba-enhance-ushape

Conversation

@Fly-Style

@Fly-Style Fly-Style commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Description

The cost-based supervisor autoscaler wouldn't scale down a healthy, over-provisioned supervisor - one above the ideal idle ratio with low lag stayed pinned at its current task count.

Root cause. The idle projection was linear:

rawIdle = 1.0 - busyFraction / taskRatio; // taskRatio = proposed / current

This assumes busy time is fully conserved when work moves onto fewer tasks, so a reasonable consolidation projects negative idle (e.g. 1 − 0.6/0.5 =−0.2). That clamps to 0 (the worst point of the U-shaped idle cost) and turns an overrun into phantom virtual lag — pinning the task count even at ~0 real lag. In reality, busy grows sublinearly (an observed 2× consolidation raised busy ~1.25×, not 2×).

Fix. Redistribute busy sublinearly:

projectedBusy = busyFraction * (currentTaskCount / proposedTaskCount) ^ IDLE_SUBLINEARITY_EXPONENT;  // 0.32
rawIdle = 1.0 - projectedBusy;

IDLE_SUBLINEARITY_EXPONENT = 0.32 (≈ log₂(1.25)) is a tuned constant based on careful testing and theoretical math application.

A healthy consolidation now lands near the ideal idle ratio instead of going negative, so the supervisor scales down; the exponent stays > 0, so extreme over-consolidation still diverges and is broken.

Validation (plots under hood)

Details Optimal task count vs. observed poll-idle ratio, across realistic configs (rate = total cluster throughput, split per-task): cost_based_scaledown_medium_7Mpm

Old version stays pinned at 128 until idle ~0.55, while new version consolidates from ~0.32.

cost_based_scaledown_large_30Mpm

Safe under load: new version consolidates earlier on the high-idle side, but at low idle both still jump to max — lag-driven scale-up is unaffected.

cost_based_v1_vs_v2_large_30Mpm_amp0 35

The existing version is flat (pinned at max by the phantom overrun); new version consolidates and holds more tasks as lag weight rises.

Release note

Fixed an issue where the cost-based supervisor autoscaler would not scale down an over-provisioned supervisor running above its ideal idle ratio with low lag.

  • self-reviewed.
  • added comments explaining the "why".
  • added/updated unit tests.

@Fly-Style Fly-Style self-assigned this Jun 5, 2026
@Fly-Style Fly-Style requested a review from kfaraz June 5, 2026 13:46

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

Reviewed 3 of 3 changed files.


This is an automated review by Codex GPT-5.5

@kfaraz kfaraz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion.

* Exponent (< 1) for sublinear busy redistribution in the idle projection: busy grows as
* {@code (currentTaskCount / proposedTaskCount)^EXPONENT}, not linearly. Calibrated as log2(1.25) ~= 0.32.
*/
static final double IDLE_SUBLINEARITY_EXPONENT = 0.32;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this specific value?
I feel that these constants somehow make the behaviour of the auto-scaler more difficult to reason about as well as control via the weights.

We should probably not be trying to predict the idleness in this manner anyway.

Instead, as an alternative, we should probably consider using the avg. processing rate * task count * idle ratio as a measure of total work that needs to be done. Assuming that the processing rate remains the same, we can find the new idle ratio for the new task count.

@Fly-Style Fly-Style Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this specific value?

This is already explained in PR description: we want to redistribute projected busyness in a sub-linear way, basically making it U-shaped (instead of V-shaped, like it was previously). It is also proved by theoretical math behind it (see plots).

Regarding an alternative -- I'd encourage you to elaborate more, please.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks!

I guess you mean,

In reality, busy grows sublinearly (an observed 2× consolidation raised busy ~1.25×, not 2×).

Do you mean that when task count is halved, idle ratio increases by 1.25x?

@kfaraz kfaraz Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the alternative, I meant something along these lines

current workload
= total records per second
= current processing rate * current task count * idle ratio

assumptions:
target workload = current workload
predicted processing rate = current processing rate

predicted idle ratio
= target workload / (target task count * predicted processing rate)
= current workload / (target task count * current processing rate)

Let me know if that makes sense.

@kfaraz kfaraz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this for now since the experiments suggest that this helps with idle prediction.


/**
* Exponent (< 1) for sublinear busy redistribution in the idle projection: busy grows as
* {@code (currentTaskCount / proposedTaskCount)^EXPONENT}, not linearly. Calibrated as log2(1.25) ~= 0.32.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify here that this value was determined empirically and it was seen that when task count is halved, idle ratio increases roughly by 1.25.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Fly-Style Fly-Style merged commit d4afc1f into apache:master Jun 25, 2026
38 checks passed
@Fly-Style Fly-Style deleted the cba-enhance-ushape branch June 25, 2026 11:41
@github-actions github-actions Bot added this to the 38.0.0 milestone Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants