Skip to content

ci(release): restore resolution + tracer gates in pre-publish-benchmark#1051

Merged
carlos-alm merged 2 commits into
mainfrom
ci/restore-resolution-tracer-gates
May 4, 2026
Merged

ci(release): restore resolution + tracer gates in pre-publish-benchmark#1051
carlos-alm merged 2 commits into
mainfrom
ci/restore-resolution-tracer-gates

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Re-add the Gate on resolution thresholds step (runs tests/benchmarks/resolution/resolution-benchmark.test.ts) to pre-publish-benchmark in publish.yml
  • Re-add the Run tracer validation step (runs tests/benchmarks/resolution/tracer/tracer-validation.test.ts) to the same job
  • Both gates were dropped in ci(release): gate npm publish on benchmark regressions #1040 when the build/query/incremental jobs were consolidated into the pre-publish gate. Restoring them where the gating mechanism actually lives means a precision/recall or same-file-recall regression now blocks publish — previously the old post-publish workflow could only paint the run red after npm had already accepted the bad version.

Inserted between Run resolution benchmark and Merge resolution into build result, matching the original ordering. Python and Go are already set up at the top of the job, so no extra setup needed.

Test plan

  • Trigger a manual `workflow_dispatch` of the Publish workflow against a known-good commit and verify both new steps run and pass
  • Confirm regression-guard step still runs after these gates
  • On the next real release, confirm the gates either pass cleanly or block with a clear failure message

The consolidation in #1040 dropped two gates that previously ran in the
benchmark workflow:

  - tests/benchmarks/resolution/resolution-benchmark.test.ts
    asserts per-language precision/recall thresholds
  - tests/benchmarks/resolution/tracer/tracer-validation.test.ts
    asserts same-file edge recall against dynamic tracers

In the old post-publish workflow these only turned the run red; with
pre-publish-benchmark they can finally block a release on a real
resolution or tracer regression.

Inserted between the resolution benchmark run and the merge-into-build
step, matching the original ordering. Python/Go setup at the top of the
job already covers both gates — no extra setup needed.
@claude
Copy link
Copy Markdown

claude Bot commented May 3, 2026

Claude finished @carlos-alm's task —— View job


PR Review Analysis

Todo List

  • Read and analyze the workflow file changes
  • Perform root cause analysis
  • Check for configuration change risks
  • Evaluate test integrity
  • Assess code quality
  • Rate backlog compliance
  • Provide final recommendation

Root Cause Analysis

Problem Being Solved: PR #1040 accidentally dropped two critical CI gates when consolidating benchmark jobs. The "Gate on resolution thresholds" and "Run tracer validation" steps that previously ran in benchmark.yml were not included in the new pre-publish-benchmark job.

Why This Approach: This is a legitimate restoration of missing functionality, not a workaround. The gates were accidentally omitted during the workflow consolidation in #1040. Moving them to pre-publish-benchmark is actually better than their original location because they now block npm publish instead of just marking post-publish runs as red.

Risk Assessment: LOW RISK - This is purely additive, restoring testing that was accidentally removed. No relaxation of standards.

Backlog Compliance

  • Zero-dep: ✓ — No new dependencies, uses existing vitest and test infrastructure
  • Foundation-aligned: ✓ — Maintains quality gates, no violation of principles
  • Problem-fit: 4 — Directly prevents accuracy regressions from shipping to npm, critical for AI agent reliability
  • Breaking: No — Purely internal CI changes, no API or behavior changes
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Technical Analysis

Configuration Changes: The only configuration changes are:

  1. Updated setup step names to reflect they serve both benchmarks and tracer validation
  2. Added back the two gates that were accidentally dropped

Test Integrity: ✅ EXCELLENT

  • Both test files exist and are legitimate: resolution-benchmark.test.ts and tracer-validation.test.ts
  • These assert precision/recall thresholds and same-file edge recall against dynamic tracers
  • The gates now run in pre-publish where they can actually prevent bad releases instead of just marking them red after publication

Code Quality: ✅ GOOD

  • Minimal, surgical change with clear intent
  • Proper timeouts (30min for resolution, 10min for tracer)
  • Inserted in logical location between resolution benchmark and merge step
  • No unnecessary changes or scope creep

Placement Logic: The placement between "Run resolution benchmark" and "Merge resolution into build result" makes perfect sense - the gates validate the resolution results before they're incorporated into the final report.

Critical Assessment

This is exemplary bug fixing:

  1. Identified genuine regression - Gates were accidentally dropped during consolidation
  2. Improved upon original design - Gates now block publish instead of post-hoc notification
  3. Minimal scope - Only adds back what was lost, no feature creep
  4. No workarounds - Direct fix to the structural issue

The commit message and PR description clearly explain the problem, the solution, and why this approach is better than the original.

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning: This is a textbook example of how to properly restore accidentally-dropped functionality. The change improves release quality by moving gates to where they can actually prevent bad releases from shipping. No concerns identified.

Excellent work on the fix and clear documentation of the problem! 🎯


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 3, 2026

Greptile Summary

This PR restores two gate steps — Gate on resolution thresholds and Run tracer validation — that were accidentally dropped from pre-publish-benchmark in #1040. Both steps are correctly inserted between Run resolution benchmark and Merge resolution into build result, preserving the original ordering and ensuring precision/recall or same-file-recall regressions block publish before the npm release is accepted.

Confidence Score: 5/5

Safe to merge — minimal, targeted change that restores previously existing gate steps with no new logic introduced.

The diff is two added steps and two comment-line updates. The steps exactly mirror what the PR description states and are correctly ordered. No new environment requirements, no structural changes to the job, and the known duplication concern is already tracked as a follow-up issue.

No files require special attention.

Important Files Changed

Filename Overview
.github/workflows/publish.yml Restores two dropped gate steps (resolution threshold + tracer validation) between the benchmark run and the merge step; step names updated to reflect tracer scope. No logic regressions introduced.

Sequence Diagram

sequenceDiagram
    participant CI as pre-publish-benchmark job
    participant RB as Run resolution benchmark
    participant GT as Gate on resolution thresholds
    participant TV as Run tracer validation
    participant MR as Merge resolution into build result
    participant RQ as Run query benchmark
    participant RG as Regression guard
    participant PUB as publish job

    CI->>RB: scripts/resolution-benchmark.ts → resolution-result.json
    RB->>GT: (restored) npx vitest resolution-benchmark.test.ts
    GT-->>CI: fail → blocks publish
    GT->>TV: (restored) npx vitest tracer-validation.test.ts
    TV-->>CI: fail → blocks publish
    TV->>MR: merge resolution-result.json into benchmark-result.json
    MR->>RQ: run query + incremental benchmarks
    RQ->>RG: regression guard
    RG->>PUB: gate passes → npm publish proceeds
Loading

Reviews (2): Last reviewed commit: "ci(release): restore resolution + tracer..." | Re-trigger Greptile

Comment on lines +292 to +294
- name: Gate on resolution thresholds
timeout-minutes: 30
run: npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts --reporter=verbose
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Potential duplicate benchmark run

If resolution-benchmark.test.ts re-runs the full resolution benchmark internally (rather than just reading the resolution-result.json already produced by the previous Run resolution benchmark step), the expensive measurement will execute twice, roughly doubling that portion of the job's runtime. The 30-minute timeout suggests this is at least anticipated to be slow. Worth confirming the test only reads/asserts on the pre-generated JSON rather than regenerating it — if it does regenerate, consider passing the artifact path in or refactoring so the gate test consumes the already-computed output.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — resolution-benchmark.test.ts does regenerate the full benchmark (copies every fixture and calls buildGraph per language, lines 317-329) rather than reading resolution-result.json. So this gate effectively re-runs the work that scripts/resolution-benchmark.ts just produced.

This duplication is pre-existing — same pattern was in benchmark.yml from #1014 (commit 2e86f56) and earlier. PR #1051 is narrowly scoped to restoring the gate steps that were accidentally dropped in #1040, so refactoring the test to consume the JSON artifact is genuinely out of scope here.

Tracked as a follow-up: #1052

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 320d188 into main May 4, 2026
14 checks passed
@carlos-alm carlos-alm deleted the ci/restore-resolution-tracer-gates branch May 4, 2026 05:31
@github-actions github-actions Bot locked and limited conversation to collaborators May 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant