fix(scripts): trend annotations fall back to nearest non-null prior release#1120
fix(scripts): trend annotations fall back to nearest non-null prior release#1120carlos-alm wants to merge 3 commits into
Conversation
…nnotations scripts/update-incremental-report.ts compared each release's metrics against the immediately prior release only. When that prior release had null build/rebuild metrics (e.g. 3.9.5 had null after both engine workers were SIGKILL'd on timeout), the trend cell was silently emitted blank, hiding real regressions: 3.9.6 wasm full-build was 14.0s vs 3.9.4's 7.6s (~86% slower) and no-op was 131ms vs 19ms (~589% slower), but the table showed no arrow at all. Replace the per-release prev lookup with findPrevMetric — a per-metric walk back through history that skips releases without a value for that specific metric. Apply consistently in engineRow (the table) and the regression-warning block. Regenerated the report so existing trend annotations now reflect the fallback (3.9.6, 3.8.0 native, 3.6.0 wasm). Closes #1043
|
Claude finished @carlos-alm's task —— View job PR Review CompleteFinal AssessmentRoot Cause AnalysisProblem Being Solved: The incremental benchmark script was showing blank trend arrows when comparing against releases with null metrics (e.g., when workers timed out and stored null values). For example, version 3.9.6 had 86% slower WASM builds and 589% slower no-ops compared to 3.9.4, but showed no trend arrows because it only looked at the immediate prior release (3.9.5) which had null metrics. Why This Approach: The PR introduces Risk Assessment: Low risk. This is a reporting/analytics fix with comprehensive test coverage. No runtime or CLI behavior changes. Backlog Compliance
Code Quality ReviewImplementation Quality ⭐⭐⭐⭐☆The
Test Coverage ⭐⭐⭐⭐⭐Excellent test implementation (
Code Changes Analysis
|
Greptile SummaryThis PR fixes a silent regression-masking bug where trend annotations on the benchmark report were left blank whenever the immediately preceding release had null build/rebuild metrics (e.g. due to crashed workers). The fix replaces the single-step
Confidence Score: 5/5Safe to merge — the change is self-contained to the benchmark reporting script with no impact on production build or runtime code paths. The logic change is narrow and well-bounded: findPrevMetric replaces a single-step lookup with a null-skipping walk, trend() gains a zero-baseline guard that was already present in checkRegression, and the refactored engineRow/regression block is functionally equivalent to the old code for all non-null history entries. Integration tests cover both the core fallback scenario and the empty-history boundary. No regressions or logic gaps found. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Loop over history entries] --> B[Pre-compute prevResolveNative and prevResolveJs once per release]
B --> C[Call engineRow for native and wasm]
C --> D{engine entry exists?}
D -- No --> E[return null, skip row]
D -- Yes --> F[Call findPrevMetric for each build metric]
F --> G[Walk backward from i+1]
G --> H{entry is dev?}
H -- Yes, skip --> G
H -- No --> I{getter returns non-null?}
I -- No --> G
I -- Yes --> J[Return value as baseline]
G -- exhausted --> K[Return null]
J --> L[Call trend with current and baseline]
K --> L
L --> M{baseline is null or zero?}
M -- Yes --> N[Emit empty string]
M -- No --> O{abs pct change less than 2 pct?}
O -- Yes --> P[Emit tilde]
O -- No --> Q[Emit arrow and percentage]
Reviews (4): Last reviewed commit: "Merge branch 'main' into fix/incremental..." | Re-trigger Greptile |
| for (let i = 0; i < history.length; i++) { | ||
| const h = history[i]; | ||
| const prev = findPrevRelease(history, i); | ||
|
|
||
| const nativeRow = engineRow(h, prev, 'native'); | ||
| const wasmRow = engineRow(h, prev, 'wasm'); | ||
| const nativeRow = engineRow(history, i, 'native'); | ||
| const wasmRow = engineRow(history, i, 'wasm'); | ||
| if (nativeRow) md += nativeRow + '\n'; | ||
| if (wasmRow) md += wasmRow + '\n'; | ||
| } |
There was a problem hiding this comment.
The
natT/jsT resolve trends are computed inside engineRow, which is called once for each engine type per release. Since h.resolve is release-level (not engine-specific), findPrevMetric for nativeBatchMs and jsFallbackMs is invoked twice per release — returning identical results both times. This is harmless but wasteful for long history arrays. Consider hoisting the resolve trend calls to the call site, alongside the nativeRow/wasmRow pair, so they are computed once and passed in.
| for (let i = 0; i < history.length; i++) { | |
| const h = history[i]; | |
| const prev = findPrevRelease(history, i); | |
| const nativeRow = engineRow(h, prev, 'native'); | |
| const wasmRow = engineRow(h, prev, 'wasm'); | |
| const nativeRow = engineRow(history, i, 'native'); | |
| const wasmRow = engineRow(history, i, 'wasm'); | |
| if (nativeRow) md += nativeRow + '\n'; | |
| if (wasmRow) md += wasmRow + '\n'; | |
| } | |
| for (let i = 0; i < history.length; i++) { | |
| // Resolve metrics are release-level; pre-compute their trends once per | |
| // release so engineRow doesn't redundantly walk history twice. | |
| const prevNative = findPrevMetric(history, i, (x) => x.resolve?.nativeBatchMs); | |
| const prevJs = findPrevMetric(history, i, (x) => x.resolve?.jsFallbackMs); | |
| const nativeRow = engineRow(history, i, 'native', prevNative, prevJs); | |
| const wasmRow = engineRow(history, i, 'wasm', prevNative, prevJs); | |
| if (nativeRow) md += nativeRow + '\n'; | |
| if (wasmRow) md += wasmRow + '\n'; | |
| } |
There was a problem hiding this comment.
Fixed in 628d37d — hoisted findPrevMetric for the release-level resolve metrics (nativeBatchMs, jsFallbackMs) to the call site and pass them into engineRow, so they are computed once per release instead of twice.
Codegraph Impact Analysis3 functions changed → 2 callers affected across 1 files
|
…lookup (#1120) - trend(): treat previous === 0 the same as null to avoid emitting '↑Infinity%' if a historical entry stores 0 for a metric. Matches the existing previous === 0 guard in checkRegression. - engineRow(): accept pre-computed prevResolveNative/prevResolveJs from the caller instead of recomputing them inside the function. Resolve metrics are release-level (not engine-specific), so the previous code walked history twice per release (once for native, once for wasm) returning identical results both times.
|
Addressed Greptile review feedback (commit 628d37d):
|
Summary
scripts/update-incremental-report.tscompared each release against the immediately prior release only. When that prior release had null build/rebuild metrics (e.g. 3.9.5, where both engine workers were SIGKILL'd on timeout), the trend cell was silently emitted blank — hiding real regressions.findPrevMetric— a per-metric walk back through history that skips releases without a value for that specific metric.engineRow(the version-history table) and the regression-warning block.generated/benchmarks/INCREMENTAL-BENCHMARKS.mdso the existing rows reflect the fallback. Notable previously-blank rows now annotated: 3.9.6 (vs 3.9.4), 3.8.0 native (vs 3.6.0 native), 3.6.0 wasm (vs 3.4.1 wasm).tests/unit/incremental-report-trend-fallback.test.tsgated on theCODEGRAPH_INCREMENTAL_REPORT_PATHenv override so the script can target a temp file from tests.Closes #1043
Test plan
npx vitest run tests/unit/incremental-report-trend-fallback.test.ts— both cases passINCREMENTAL-BENCHMARKS.mdconfirms previously-blank trend cells now show the expected fallback annotationsnpm run lint— clean