fix(native): break engine_version mismatch loop in CI hot-swap flows#1074
Conversation
The Rust orchestrator writes `build_meta.engine_version = CARGO_PKG_VERSION` (the binary's value) and `check_version_mismatch` compares against it on the next build. The JS post-processing was overwriting that with the platform package.json version (`getNativePackageVersion()`), which drifts from the binary in CI flows that hot-swap a freshly-built `.node` over the published binary without also updating the platform package's `package.json`. The mismatch promoted every incremental rebuild to a full rebuild, producing the ~2s 1-file-rebuild floor reported in #1066. - Plumb the binary's `engineVersion()` through `getActiveEngine` as a new `binaryVersion` field, kept distinct from the package.json-preferred display `version`. - `PipelineContext.nativeBinaryVersion` carries it; `setBuildMeta` after the orchestrator and `persistBuildMetadata` in finalize now write it instead of `engineVersion`. `checkEngineSchemaMismatch` compares against it for symmetry with what Rust reads. - `scripts/ci-install-native.mjs` now also rewrites the platform package's `package.json` `version` to match the just-built binary, sourcing the value from `NATIVE_BUILD_VERSION` (preferred) or the in-tree `Cargo.toml` (correct for flows that build without a version bump). - `publish.yml`'s pre-publish-benchmark job — the only site where Cargo.toml is bumped before the build artifact ships — sets `NATIVE_BUILD_VERSION` from `compute-version.outputs.version`. Other call sites (CI test/parity, publish preflight) build from current source without a bump, so the Cargo.toml fallback resolves correctly. Refs #1066.
|
Claude finished @carlos-alm's task —— View job PR Review: Engine Version Mismatch FixReviewing PR #1074 that fixes the ~2s incremental build regression caused by engine version mismatches. Todo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: The PR fixes a performance regression where incremental builds were taking ~2s instead of ~20ms. The issue was that the JS pipeline overwrote the binary's Why This Approach: The fix addresses both the runtime overwrite AND the CI hot-swap that creates version drift:
Risk Assessment: This is a legitimate bug fix addressing a real performance regression with a well-reasoned two-layer approach. Phase 2: Configuration Change Detection ✅No high-risk configuration changes detected:
The changes are purely functional code fixes with no rule relaxation. Phase 3: Test Integrity ✅No test integrity issues found:
The changes focus on a single well-scoped performance fix. Phase 4: Code Quality ✅Implementation QualityStrong implementation with proper separation of concerns: CI Script Changes (
|
Greptile SummaryThis PR fixes the root cause of a ~2s incremental-rebuild regression (#1066) where the JS pipeline was overwriting
Confidence Score: 5/5Safe to merge — all three build_meta write sites and the version-comparison read path are now mutually consistent, and the CI script correctly syncs the platform package.json to avoid re-introducing the drift. The change is narrowly scoped: it introduces a new No files require special attention. Important Files Changed
Reviews (2): Last reviewed commit: "fix(native): align tryNativeOrchestrator..." | Re-trigger Greptile |
| const nativeVersionForMeta = ctx.nativeBinaryVersion || ctx.engineVersion || ''; | ||
| setBuildMeta(ctx.db, { | ||
| engine: ctx.engineName, | ||
| engine_version: ctx.engineVersion || '', | ||
| codegraph_version: ctx.engineVersion || CODEGRAPH_VERSION, | ||
| engine_version: nativeVersionForMeta, | ||
| codegraph_version: nativeVersionForMeta || CODEGRAPH_VERSION, |
There was a problem hiding this comment.
Write/read fallback asymmetry when
nativeBinaryVersion is null
When ctx.nativeBinaryVersion is null (native addon does not expose engineVersion()), tryNativeOrchestrator writes codegraph_version = ctx.engineVersion (the platform package.json version, e.g. 3.9.6). On the very next incremental, checkEngineSchemaMismatch compares that stored value against CODEGRAPH_VERSION (the JS npm package version) — not ctx.engineVersion — because nativeBinaryVersion is still null and the condition falls through to the CODEGRAPH_VERSION branch. If those two strings differ, every incremental is promoted to a full rebuild, which is exactly the bug this PR is fixing. persistBuildMetadata (finalize.ts) already uses CODEGRAPH_VERSION as its fallback in this case, so the correct fix is to align the write in tryNativeOrchestrator to match both the read path and finalize.ts.
There was a problem hiding this comment.
Fixed in 637c233 — tryNativeOrchestrator now falls back to CODEGRAPH_VERSION when ctx.nativeBinaryVersion is null, matching both the read path (checkEngineSchemaMismatch) and the JS-pipeline write path (persistBuildMetadata in finalize.ts). Removed the ctx.engineVersion fallback that created the asymmetry, and updated the comment block to call out the alignment.
Codegraph Impact Analysis7 functions changed → 15 callers affected across 10 files
|
#1074) When ctx.nativeBinaryVersion is null (native addon lacks engineVersion()), tryNativeOrchestrator was writing codegraph_version = ctx.engineVersion (platform package.json), while checkEngineSchemaMismatch and persistBuildMetadata both fall back to CODEGRAPH_VERSION (JS package). The asymmetry could re-introduce a perpetual full-rebuild loop on older addons whenever those two strings diverged — exactly the bug #1066 fixed.
Summary
Fixes the actual root cause of the ~2s 1-file-rebuild regression in #1066. PR #1070 corrected
detect_removed_files(a real but downstream bug) — the underlying issue is that JS overwrites the binary'sengine_versioninbuild_metawith a stale platform-package version, socheck_version_mismatchpromotes every incremental to a full rebuild on the publish gate.What was happening
The Rust orchestrator writes
build_meta.engine_version = CARGO_PKG_VERSION(build_pipeline.rs:749) and on the next incremental compares against it (check_version_mismatch,build_pipeline.rs:769). The JS pipeline's post-orchestratorsetBuildMeta(andpersistBuildMetadatainfinalize.ts) then overwrote that value withctx.engineVersion, which isgetNativePackageVersion()— the platform-packagepackage.jsonversion.In the publish gate,
build-nativebumpsCargo.tomltocompute-version.outputs.version(e.g.3.10.0) and builds, butscripts/ci-install-native.mjsonly copied the.nodebinary over the published v3.9.6 platform package — leavingpackage.jsonat3.9.6. JS wrote3.9.6tobuild_meta; binary'sCARGO_PKG_VERSIONwas3.10.0;check_version_mismatchreturnedtrue; every incremental ran the full pipeline (~2s).In failed run 25419286720, the smoking-gun signal is
[codegraph] Using native engine (v3.9.6)on every iteration despite the binary being built from PR-1070 source — that'sgetNativePackageVersion()reading the stale platform-packagepackage.json.Fix (option C from #1066 discussion)
Both the runtime overwrite and the CI hot-swap that creates the version drift in the first place:
Runtime (TS)
getActiveEnginenow returns a newbinaryVersionfield alongside the package-json-preferred displayversion. The display path is unchanged.PipelineContext.nativeBinaryVersioncarries it through.setBuildMeta(post-orchestrator,pipeline.ts) andpersistBuildMetadata(finalize.ts) now writenativeBinaryVersion, matching what Rust reads.checkEngineSchemaMismatchcompares againstnativeBinaryVersionfor symmetry — so JS doesn't promote on its own when binary/package.json drift.CI (workflow)
scripts/ci-install-native.mjsnow also rewrites the platform package'spackage.jsonversionto match the just-built binary, sourcing the value fromNATIVE_BUILD_VERSION(preferred) or the in-treeCargo.toml(correct fallback for flows that build from current source without a bump).publish.yml's pre-publish-benchmark step — the only site whereCargo.tomlis bumped before the artifact ships — passesNATIVE_BUILD_VERSION: \${{ needs.compute-version.outputs.version }}.ci.ymltest/parity,publish.ymlpreflight) build from current source without a bump, so theCargo.tomlfallback resolves to the same value the binary embedded.Why two layers
The runtime fix alone closes the door for users hot-swapping binaries between releases. The CI fix alone unblocks the gate but leaves the JS pipeline writing a value that doesn't round-trip through Rust's comparator. Both together make the invariant explicit at the layer that owns each side.
Test plan
Using native engine (v...)log line shows the correct version on the bench job (3.10.0, not the stale 3.9.6)build_meta reflects actual engine and version after buildintegration test (tests/integration/build.test.ts:481) still passes — it asserts a valid semver string, which the binary version satisfiesgetActiveEngineparser tests (tests/parsers/unified.test.ts) still pass — the newbinaryVersionfield is additiveNotes
vitest.config.tsinjecting--strip-typesintoNODE_OPTIONSfor child processes, which Node 24.10 (this maintainer's machine) rejects withERR_WORKER_INVALID_EXEC_ARGV. CI runs Node 22 where that path is fine. The TypeScript build (tsc, run via the postinstall script) and the Biome lint (npm run lint, exit 0) both pass locally; smoke-testedgetActiveEngine()to confirm the new return shape.ctx.engineVersionfield is preserved as the display version (used by theUsing native engine (v…)log and the legacy< 3.9.1orchestrator-buggy guard atpipeline.ts:305) — those paths benefit from the package.json-preferred value.Closes #1066.