JIT: fix edge likelihoods for loop cloning#129646
Conversation
We were over-estimating the edge likelihoods leading to the fast clone, causing profiles to inflate. The old code was not aware of the total number of checks that need to pass to reach the fast clone, so each edge's likelihood was slightly too high. This shows up prominently when we clone a set of nested loops. Fix by counting how many conditional branches we must pass through before reaching the fast clone, and use that to compute the proper likelihoods.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR adjusts how loop cloning assigns edge likelihoods for the fast-path gating condition chain so that the cumulative probability of reaching the fast clone matches the intended fastPathWeightScaleFactor, even when the condition chain is constructed across multiple CondToStmtInBlock calls (e.g., block/deref conditions plus cloning conditions).
Changes:
- Extend
LoopCloneContext::CondToStmtInBlockto accepttotalCondsInChainand use it to compute the per-conditional likelihood (Nth-root model) against the full chain length. - In
Compiler::optInsertLoopChoiceConditions, pre-count total condition blocks to be inserted and pass that count to eachCondToStmtInBlockcall.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/coreclr/jit/loopcloning.h | Updates the CondToStmtInBlock declaration and documents the new totalCondsInChain parameter. |
| src/coreclr/jit/loopcloning.cpp | Uses the full chain condition count to compute per-branch likelihoods; counts and threads the total through optInsertLoopChoiceConditions. |
|
@jakobbotsch PTAL Definitely affects perf scores in some cases... still evaluating diffs, but what we were doing before was wrong. benchmarks.run_pgo.windows has only ~200 code diffs locally, but −301,305,052 (−41.67% of base) perf score diff. Diffs look mostly to be layout changes. |
We were over-estimating the edge likelihoods leading to the fast clone, causing profiles to inflate. The old code was not aware of the total number of checks that need to pass to reach the fast clone, so each edge's likelihood was slightly too high.
This shows up prominently when we clone a set of nested loops.
Fix by counting how many conditional branches we must pass through before reaching the fast clone, and use that to compute the proper likelihoods.