refactor(ci): let tracer-validation reuse benchmark tracer output#1173
refactor(ci): let tracer-validation reuse benchmark tracer output#1173carlos-alm wants to merge 6 commits into
Conversation
The pre-publish-benchmark job's `Run resolution benchmark` step already
spawns `run-tracer.mjs` once per language fixture for telemetry (writing
`dynamicEdges` / `dynamicConfirmed` counts into resolution-result.json).
The `Run tracer validation` step that follows then ran the same per-
language tracer subprocess again to compute same-file recall — doubling
the tracer cost in the pre-publish job.
Extend the script's per-language LangResult with a `tracer` artifact
containing status ('ok' | 'skipped') and the raw captured edges. The
status distinction mirrors the gate test's `runTracer` semantics
(null-return on toolchain-missing → 'skipped'). Refactor the gate test
to consume that artifact when RESOLUTION_RESULT_JSON is set, falling
back to running run-tracer.mjs directly when unset so devs can still
execute `npx vitest run tests/benchmarks/resolution/tracer/...`
standalone. Wire the env var through the workflow's tracer-validation
step.
Verified locally: gate test in artifact mode passes 35/35 in ~160ms
against an artifact produced by scripts/resolution-benchmark.ts (vs
~320ms for the standalone path that respawns the JS tracer subprocess).
The resolution-benchmark gate test still passes 170/170 with the new
`tracer` field present in the artifact.
Closes #1166
|
Claude finished @carlos-alm's task —— View job PR Review: Tracer Validation DeduplicationFinal recommendation completed. Todo List:
Root Cause AnalysisProblem Being Solved: The pre-publish benchmark job runs each language's tracer subprocess once for telemetry, but the tracer-validation step that follows runs the same subprocess again per fixture, doubling tracer cost in the publish pipeline. Why This Approach: The solution mirrors the resolution-gate dedup approach from PR #1052. It extends the resolution benchmark script to include a Risk Assessment: Low risk. This is a CI optimization that preserves existing behavior while reducing duplicate work. Fallback to direct tracer execution ensures standalone tests still work. Configuration Change DetectionAssessment: ✅ NO HIGH-RISK CHANGES DETECTED
Test Integrity AssessmentAssessment: ✅ NO RED FLAGS DETECTED
Code Quality AssessmentAssessment: ✅ GOOD IMPLEMENTATION QUALITY Strengths:
Minor observations:
No significant issues identified. Backlog Compliance
Critical ConcernsNone identified. This is a well-implemented CI optimization. Final Recommendation
|
Greptile SummaryThis PR eliminates duplicate tracer subprocess invocations in the publish CI pipeline by having
Confidence Score: 5/5Safe to merge — the refactor correctly deduplicates tracer subprocess calls in CI without altering any observable test semantics or benchmark metrics. All three changed files make narrowly scoped, well-tested changes. The No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant WF as GitHub Actions
participant BM as resolution-benchmark.ts
participant RJ as resolution-result.json
participant TV as tracer-validation.test.ts
participant RT as run-tracer.mjs
WF->>BM: node scripts/resolution-benchmark.ts
loop per language fixture
BM->>RT: execFileSync (run-tracer.mjs)
RT-->>BM: "{ edges, error? }"
BM->>BM: "Build TracerArtifact {status, edges}"
BM->>RJ: write LangResult incl. tracer field
end
BM-->>WF: resolution-result.json written
WF->>TV: vitest run (RESOLUTION_RESULT_JSON set)
TV->>RJ: readFileSync + JSON.parse (module load)
loop per language
TV->>TV: runTracer(lang)
alt "artifact has status=ok"
TV->>TV: return entry.edges (no subprocess)
else "artifact missing or status=skipped"
TV->>TV: "return null -> skip test"
end
end
TV-->>WF: recall assertions complete
Reviews (7): Last reviewed commit: "fix: surface helpful error when RESOLUTI..." | Re-trigger Greptile |
| return JSON.parse(fs.readFileSync(RESOLUTION_RESULT_JSON, 'utf-8')); | ||
| })(); |
There was a problem hiding this comment.
If
RESOLUTION_RESULT_JSON points to a file that exists but contains invalid or truncated JSON (e.g., the benchmark step was killed mid-write and the redirect left a partial file), JSON.parse throws a raw SyntaxError during module load. Every test in the suite then fails with an opaque parse error instead of a useful message explaining which file to regenerate.
| return JSON.parse(fs.readFileSync(RESOLUTION_RESULT_JSON, 'utf-8')); | |
| })(); | |
| try { | |
| return JSON.parse(fs.readFileSync(RESOLUTION_RESULT_JSON, 'utf-8')); | |
| } catch (err) { | |
| throw new Error( | |
| `RESOLUTION_RESULT_JSON=${RESOLUTION_RESULT_JSON} is not valid JSON — regenerate it with scripts/resolution-benchmark.ts. (${err})`, | |
| ); | |
| } | |
| })(); |
There was a problem hiding this comment.
Fixed in c2c5889 — wrapped the JSON.parse in a try/catch that throws a helpful error pointing the developer at scripts/resolution-benchmark.ts to regenerate the artifact, instead of letting the raw SyntaxError surface during module load. Verified manually with a malformed JSON file.
Codegraph Impact Analysis4 functions changed → 1 callers affected across 1 files
|
Summary
Mirrors the resolution-gate dedup approach in #1052 (PR d697929).
Why
The pre-publish benchmark job runs each language's tracer subprocess once for telemetry (writing `dynamicEdges` / `dynamicConfirmed` into `resolution-result.json`). The tracer-validation step that follows ran the same subprocess again per fixture, doubling tracer cost in the publish pipeline.
The status field distinguishes "tracer ran, found nothing" (`ok` with empty edges) from "toolchain missing" (`skipped`), mirroring `runTracer`'s null-return semantics so the test still skips gracefully on environments without rustc/ghc/ruby/etc.
Test plan
Closes #1166