[gpl] Include internal power in MBFF clustering cost#10075
[gpl] Include internal power in MBFF clustering cost#10075openroad-ci wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
Conversation
|
@mikesinouye please test this on your internal designs. I'm seeing improvement on some of our designs. |
There was a problem hiding this comment.
Code Review
This pull request integrates clock pin internal power into the MBFF cost metric, ensuring more accurate power estimation during clustering. The changes include adding a method to calculate clock pin energy, updating the cost calculation logic, and caching clock periods. I have identified a potential division-by-zero risk in the cost normalization logic, a performance concern regarding redundant STA lookups within the SetVars loop, and a violation of the style guide regarding the use of magic numbers for clock activity calculations.
|
clang-tidy review says "All clean, LGTM! 👍" |
src/gpl/src/mbff.cpp
Outdated
| // Include clock pin internal power in the cost metric. | ||
| // Clock activity = 2 transitions / period (known precisely from SDC). | ||
| const float clock_activity | ||
| = (clock_period_ > 0) ? (2.0 / clock_period_) : 0.0; |
There was a problem hiding this comment.
Shouldn't you grab the 2 / clock_period from the waveform duty cycle definition from OpenSTA. Clock::waveform
I know this is equal rise equal fall by default, but since it is possible to specify it I feel you should take it into account here.
There was a problem hiding this comment.
The power only depends on transition activity not duty cycle.
There was a problem hiding this comment.
But you can specify multiple transitions in the wave form. So my assumption was you would need to replace the 2.0 with waveform()->getSize()
There was a problem hiding this comment.
I suppose it is legal but I've never seen it used. In any case sta doesn't handle it either:
float
Power::clockDuty(const Clock *clk)
{
if (clk->isGenerated()) {
const Clock *master = clk->masterClk();
if (master == nullptr)
return 0.5; // punt
else
return clockDuty(master);
}
else {
const FloatSeq *waveform = clk->waveform();
float rise_time = (*waveform)[0];
float fall_time = (*waveform)[1];
float duty = (fall_time - rise_time) / clk->period();
return duty;
}
}
```
The MBFF algorithm previously used only leakage power to decide whether to replace single-bit flip-flops with multi-bit cells. For flip-flops, internal power dominates total power, and MBFF cells share scan (SE/SI) and clock structures across bits, giving 40-90% savings on those pins. Ignoring internal power caused clustering to increase total power on some PDKs. Add getInternalEnergy() which sums average internal energy across all pins (CK, D, Q, SE, SI) from Liberty internal_power tables. Use this alongside leakage in three places: - SetRatios: norm_power_ uses total estimated power (leakage + internal_energy * clock_activity) so the ILP cost function reflects total power, not just leakage. - ReadLibs: select best tray per size by minimum total estimated power instead of minimum leakage, so cells with lower total power (e.g. SVT over LVT) are preferred even when their leakage is higher. - SetVars: select the single-bit baseline cell by lowest total estimated power, with both leakage and internal energy paired from the same cell. Clock period is obtained from SDC before ReadLibs runs so tray selection can account for internal power. Falls back to leakage-only when no clock is defined or no internal_power tables exist. Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
c194db0 to
eee27b3
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eee27b3ea3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| SetVars(FFs[i]); | ||
| clock_period_ = getClockPeriod(ff_inst); | ||
| SetRatios(array_mask); |
There was a problem hiding this comment.
Set clock period before deriving single-bit power baseline
SetVars() now depends on clockActivity() to compute single_bit_power_, but in this loop clock_period_ is updated only after SetVars() runs. In designs with multiple clock periods, each chunk therefore uses the previous chunk’s period (or the bootstrap value) for the denominator baseline, while SetRatios() uses the current chunk’s period, producing inconsistent power normalization and skewed clustering decisions.
Useful? React with 👍 / 👎.
| // Determine clock period before ReadLibs so tray selection can use | ||
| // total estimated power (leakage + internal_energy * clock_activity). | ||
| if (!flops_.empty()) { | ||
| clock_period_ = getClockPeriod(insts_[0]); | ||
| } |
There was a problem hiding this comment.
Avoid seeding tray power ranking from one arbitrary flop clock
ReadLibs() ranks best_master_ using activity = clockActivity(), but Run() initializes clock_period_ from only insts_[0] before that pass. Because tray ranking is cached globally while clustering later runs per clock-net group, multi-clock designs can permanently select a suboptimal tray master for domains whose period differs from that first flop, regressing total power optimization.
Useful? React with 👍 / 👎.
Summary
The MBFF algorithm previously used only leakage power to decide whether
to replace single-bit flip-flops with multi-bit cells. For flip-flops,
internal power dominates total power, and MBFF cells share scan (SE/SI)
and clock structures across bits, giving 40-90% savings on those pins.
Ignoring internal power caused clustering to increase total power on
some PDKs.
Add getInternalEnergy() which sums average internal energy across all
pins (CK, D, Q, SE, SI) from Liberty internal_power tables. Use this
alongside leakage in three places:
internal_energy * clock_activity) so the ILP cost function reflects
total power, not just leakage.
instead of minimum leakage, so cells with lower total power (e.g. SVT
over LVT) are preferred even when their leakage is higher.
power, with both leakage and internal energy paired from the same cell.
Clock period is obtained from SDC before ReadLibs runs so tray selection
can account for internal power. Falls back to leakage-only when no
clock is defined or no internal_power tables exist.
Type of Change
Verification
./etc/Build.sh).