ci(release): restore resolution + tracer gates in pre-publish-benchmark#1051
Conversation
The consolidation in #1040 dropped two gates that previously ran in the benchmark workflow: - tests/benchmarks/resolution/resolution-benchmark.test.ts asserts per-language precision/recall thresholds - tests/benchmarks/resolution/tracer/tracer-validation.test.ts asserts same-file edge recall against dynamic tracers In the old post-publish workflow these only turned the run red; with pre-publish-benchmark they can finally block a release on a real resolution or tracer regression. Inserted between the resolution benchmark run and the merge-into-build step, matching the original ordering. Python/Go setup at the top of the job already covers both gates — no extra setup needed.
|
Claude finished @carlos-alm's task —— View job PR Review AnalysisTodo List
Root Cause AnalysisProblem Being Solved: PR #1040 accidentally dropped two critical CI gates when consolidating benchmark jobs. The "Gate on resolution thresholds" and "Run tracer validation" steps that previously ran in benchmark.yml were not included in the new Why This Approach: This is a legitimate restoration of missing functionality, not a workaround. The gates were accidentally omitted during the workflow consolidation in #1040. Moving them to pre-publish-benchmark is actually better than their original location because they now block npm publish instead of just marking post-publish runs as red. Risk Assessment: LOW RISK - This is purely additive, restoring testing that was accidentally removed. No relaxation of standards. Backlog Compliance
Technical AnalysisConfiguration Changes: The only configuration changes are:
Test Integrity: ✅ EXCELLENT
Code Quality: ✅ GOOD
Placement Logic: The placement between "Run resolution benchmark" and "Merge resolution into build result" makes perfect sense - the gates validate the resolution results before they're incorporated into the final report. Critical AssessmentThis is exemplary bug fixing:
The commit message and PR description clearly explain the problem, the solution, and why this approach is better than the original. Final Recommendation
Excellent work on the fix and clear documentation of the problem! 🎯 |
Greptile SummaryThis PR restores two gate steps — Confidence Score: 5/5Safe to merge — minimal, targeted change that restores previously existing gate steps with no new logic introduced. The diff is two added steps and two comment-line updates. The steps exactly mirror what the PR description states and are correctly ordered. No new environment requirements, no structural changes to the job, and the known duplication concern is already tracked as a follow-up issue. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant CI as pre-publish-benchmark job
participant RB as Run resolution benchmark
participant GT as Gate on resolution thresholds
participant TV as Run tracer validation
participant MR as Merge resolution into build result
participant RQ as Run query benchmark
participant RG as Regression guard
participant PUB as publish job
CI->>RB: scripts/resolution-benchmark.ts → resolution-result.json
RB->>GT: (restored) npx vitest resolution-benchmark.test.ts
GT-->>CI: fail → blocks publish
GT->>TV: (restored) npx vitest tracer-validation.test.ts
TV-->>CI: fail → blocks publish
TV->>MR: merge resolution-result.json into benchmark-result.json
MR->>RQ: run query + incremental benchmarks
RQ->>RG: regression guard
RG->>PUB: gate passes → npm publish proceeds
Reviews (2): Last reviewed commit: "ci(release): restore resolution + tracer..." | Re-trigger Greptile |
| - name: Gate on resolution thresholds | ||
| timeout-minutes: 30 | ||
| run: npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts --reporter=verbose |
There was a problem hiding this comment.
Potential duplicate benchmark run
If resolution-benchmark.test.ts re-runs the full resolution benchmark internally (rather than just reading the resolution-result.json already produced by the previous Run resolution benchmark step), the expensive measurement will execute twice, roughly doubling that portion of the job's runtime. The 30-minute timeout suggests this is at least anticipated to be slow. Worth confirming the test only reads/asserts on the pre-generated JSON rather than regenerating it — if it does regenerate, consider passing the artifact path in or refactoring so the gate test consumes the already-computed output.
There was a problem hiding this comment.
Confirmed — resolution-benchmark.test.ts does regenerate the full benchmark (copies every fixture and calls buildGraph per language, lines 317-329) rather than reading resolution-result.json. So this gate effectively re-runs the work that scripts/resolution-benchmark.ts just produced.
This duplication is pre-existing — same pattern was in benchmark.yml from #1014 (commit 2e86f56) and earlier. PR #1051 is narrowly scoped to restoring the gate steps that were accidentally dropped in #1040, so refactoring the test to consume the JSON artifact is genuinely out of scope here.
Tracked as a follow-up: #1052
Summary
Gate on resolution thresholdsstep (runstests/benchmarks/resolution/resolution-benchmark.test.ts) topre-publish-benchmarkin publish.ymlRun tracer validationstep (runstests/benchmarks/resolution/tracer/tracer-validation.test.ts) to the same jobInserted between
Run resolution benchmarkandMerge resolution into build result, matching the original ordering. Python and Go are already set up at the top of the job, so no extra setup needed.Test plan