Skip to content

fix(native): break engine_version mismatch loop in CI hot-swap flows#1074

Merged
carlos-alm merged 2 commits into
mainfrom
fix/1066-engine-version-mismatch
May 6, 2026
Merged

fix(native): break engine_version mismatch loop in CI hot-swap flows#1074
carlos-alm merged 2 commits into
mainfrom
fix/1066-engine-version-mismatch

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

Fixes the actual root cause of the ~2s 1-file-rebuild regression in #1066. PR #1070 corrected detect_removed_files (a real but downstream bug) — the underlying issue is that JS overwrites the binary's engine_version in build_meta with a stale platform-package version, so check_version_mismatch promotes every incremental to a full rebuild on the publish gate.

What was happening

The Rust orchestrator writes build_meta.engine_version = CARGO_PKG_VERSION (build_pipeline.rs:749) and on the next incremental compares against it (check_version_mismatch, build_pipeline.rs:769). The JS pipeline's post-orchestrator setBuildMeta (and persistBuildMetadata in finalize.ts) then overwrote that value with ctx.engineVersion, which is getNativePackageVersion() — the platform-package package.json version.

In the publish gate, build-native bumps Cargo.toml to compute-version.outputs.version (e.g. 3.10.0) and builds, but scripts/ci-install-native.mjs only copied the .node binary over the published v3.9.6 platform package — leaving package.json at 3.9.6. JS wrote 3.9.6 to build_meta; binary's CARGO_PKG_VERSION was 3.10.0; check_version_mismatch returned true; every incremental ran the full pipeline (~2s).

In failed run 25419286720, the smoking-gun signal is [codegraph] Using native engine (v3.9.6) on every iteration despite the binary being built from PR-1070 source — that's getNativePackageVersion() reading the stale platform-package package.json.

Fix (option C from #1066 discussion)

Both the runtime overwrite and the CI hot-swap that creates the version drift in the first place:

Runtime (TS)

  • getActiveEngine now returns a new binaryVersion field alongside the package-json-preferred display version. The display path is unchanged.
  • PipelineContext.nativeBinaryVersion carries it through.
  • setBuildMeta (post-orchestrator, pipeline.ts) and persistBuildMetadata (finalize.ts) now write nativeBinaryVersion, matching what Rust reads.
  • checkEngineSchemaMismatch compares against nativeBinaryVersion for symmetry — so JS doesn't promote on its own when binary/package.json drift.

CI (workflow)

  • scripts/ci-install-native.mjs now also rewrites the platform package's package.json version to match the just-built binary, sourcing the value from NATIVE_BUILD_VERSION (preferred) or the in-tree Cargo.toml (correct fallback for flows that build from current source without a bump).
  • publish.yml's pre-publish-benchmark step — the only site where Cargo.toml is bumped before the artifact ships — passes NATIVE_BUILD_VERSION: \${{ needs.compute-version.outputs.version }}.
  • Other call sites (ci.yml test/parity, publish.yml preflight) build from current source without a bump, so the Cargo.toml fallback resolves to the same value the binary embedded.

Why two layers

The runtime fix alone closes the door for users hot-swapping binaries between releases. The CI fix alone unblocks the gate but leaves the JS pipeline writing a value that doesn't round-trip through Rust's comparator. Both together make the invariant explicit at the layer that owns each side.

Test plan

  • CI regression-guard gate: native 1-file rebuild back under 100ms (was 2139ms in run 25419286720)
  • Native no-op rebuild stays fast (was already 19ms after fix(native): persist file_hashes for dropped/symbol-less files #1069 — should remain)
  • Using native engine (v...) log line shows the correct version on the bench job (3.10.0, not the stale 3.9.6)
  • Existing build_meta reflects actual engine and version after build integration test (tests/integration/build.test.ts:481) still passes — it asserts a valid semver string, which the binary version satisfies
  • getActiveEngine parser tests (tests/parsers/unified.test.ts) still pass — the new binaryVersion field is additive
  • No build-parity regression — the change only affects which version string is persisted, not graph contents

Notes

  • Local vitest verification was blocked by the project's vitest.config.ts injecting --strip-types into NODE_OPTIONS for child processes, which Node 24.10 (this maintainer's machine) rejects with ERR_WORKER_INVALID_EXEC_ARGV. CI runs Node 22 where that path is fine. The TypeScript build (tsc, run via the postinstall script) and the Biome lint (npm run lint, exit 0) both pass locally; smoke-tested getActiveEngine() to confirm the new return shape.
  • The ctx.engineVersion field is preserved as the display version (used by the Using native engine (v…) log and the legacy < 3.9.1 orchestrator-buggy guard at pipeline.ts:305) — those paths benefit from the package.json-preferred value.

Closes #1066.

The Rust orchestrator writes `build_meta.engine_version = CARGO_PKG_VERSION`
(the binary's value) and `check_version_mismatch` compares against it on the
next build. The JS post-processing was overwriting that with the platform
package.json version (`getNativePackageVersion()`), which drifts from the
binary in CI flows that hot-swap a freshly-built `.node` over the published
binary without also updating the platform package's `package.json`. The
mismatch promoted every incremental rebuild to a full rebuild, producing
the ~2s 1-file-rebuild floor reported in #1066.

- Plumb the binary's `engineVersion()` through `getActiveEngine` as a new
  `binaryVersion` field, kept distinct from the package.json-preferred
  display `version`.
- `PipelineContext.nativeBinaryVersion` carries it; `setBuildMeta` after
  the orchestrator and `persistBuildMetadata` in finalize now write it
  instead of `engineVersion`. `checkEngineSchemaMismatch` compares against
  it for symmetry with what Rust reads.
- `scripts/ci-install-native.mjs` now also rewrites the platform package's
  `package.json` `version` to match the just-built binary, sourcing the
  value from `NATIVE_BUILD_VERSION` (preferred) or the in-tree `Cargo.toml`
  (correct for flows that build without a version bump).
- `publish.yml`'s pre-publish-benchmark job — the only site where Cargo.toml
  is bumped before the build artifact ships — sets `NATIVE_BUILD_VERSION`
  from `compute-version.outputs.version`. Other call sites (CI test/parity,
  publish preflight) build from current source without a bump, so the
  Cargo.toml fallback resolves correctly.

Refs #1066.
@claude
Copy link
Copy Markdown

claude Bot commented May 6, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Engine Version Mismatch Fix

Reviewing PR #1074 that fixes the ~2s incremental build regression caused by engine version mismatches.

Todo List:

  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity check
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance assessment
  • Final recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: The PR fixes a performance regression where incremental builds were taking ~2s instead of ~20ms. The issue was that the JS pipeline overwrote the binary's engine_version in build_meta with a stale platform-package version, causing the Rust orchestrator's check_version_mismatch to promote every incremental to a full rebuild.

Why This Approach: The fix addresses both the runtime overwrite AND the CI hot-swap that creates version drift:

  • Runtime (TS): getActiveEngine now returns both binaryVersion (from CARGO_PKG_VERSION) and display version (package.json-preferred). Build metadata writes use binaryVersion to match what Rust reads.
  • CI (workflow): ci-install-native.mjs rewrites the platform package's package.json version to match the binary version.

Risk Assessment: This is a legitimate bug fix addressing a real performance regression with a well-reasoned two-layer approach.

Phase 2: Configuration Change Detection ✅

No high-risk configuration changes detected:

  • ✅ No linting rules disabled or relaxed
  • ✅ No test timeouts increased or coverage thresholds lowered
  • ✅ No build configuration or module resolution modified
  • ✅ No TypeScript strict mode flags disabled

The changes are purely functional code fixes with no rule relaxation.

Phase 3: Test Integrity ✅

No test integrity issues found:

  • ✅ No removed or weakened assertions
  • ✅ No tests made less strict
  • ✅ No test files modified alongside the code they test
  • ✅ No eslint-disable or workaround comments without justification
  • ✅ Manageable PR size (6 files changed)

The changes focus on a single well-scoped performance fix.

Phase 4: Code Quality ✅

Implementation Quality

Strong implementation with proper separation of concerns:

CI Script Changes (ci-install-native.mjs)

  • Robust version resolution: Prefers NATIVE_BUILD_VERSION env var, falls back to Cargo.toml reading
  • Safe file operations: Proper error handling with try/catch and informative logging
  • Clear documentation: Excellent docstring explaining the why and when of version rewriting
  • Defensive programming: Handles missing env var, unreadable Cargo.toml, missing package.json gracefully

Runtime Changes (getActiveEngine in parser.ts)

  • Clean API: Returns both version (display) and binaryVersion (metadata) without breaking compatibility
  • Consistent semantics: binaryVersion used for build_meta writes, version for user-facing logs
  • Proper error handling: Graceful fallback when getNativePackageVersion() fails

Pipeline Integration (pipeline.ts, context.ts, finalize.ts)

  • Threading correctness: nativeBinaryVersion properly carried through PipelineContext
  • Symmetry: Both setBuildMeta calls use nativeBinaryVersion when available
  • Consistency: checkEngineSchemaMismatch also uses nativeBinaryVersion for symmetry

Workflow Changes (.github/workflows/publish.yml)

  • Minimal, targeted: Only adds NATIVE_BUILD_VERSION env var to the one job that needs it
  • Correct placement: Pre-publish benchmark is the only place where Cargo.toml is bumped before artifact ships

Security & Performance

  • No security issues: No command injection, secrets handling, or privilege escalation
  • Performance positive: Fixes a 2s regression back to ~20ms incremental builds
  • No SQL injection: Uses parameterized queries in all DB operations

Potential Issues

🟡 Minor: The Cargo.toml parsing in ci-install-native.mjs:94 assumes the first version = line after a [ header is the package version. This works for standard Cargo.toml layout but could theoretically match a dependency version. However, this is a reasonable simplification for the CI environment and includes appropriate error handling.

Phase 5: Backlog Compliance ✅

Foundation Compliance

  • Zero-dep: ✅ No new runtime dependencies added
  • Foundation-aligned: ✅ Fully compliant with core principles
    • Maintains P1 (always-current) by ensuring correct version tracking
    • Preserves P4 (zero-cost core) by fixing performance regression
    • Supports P5 (embeddable-first) with clean API changes
    • No violations of P8 (we are not an agent)

Problem-Fit Assessment

  • Problem-fit: 4/5 — Directly improves agent accuracy
    • Fixes a major performance regression affecting CI/CD pipelines
    • Ensures consistent behavior between development and production environments
    • Reduces round-trip time for incremental builds from 2s to ~20ms
    • While not directly about AI token waste, fast rebuilds are critical for AI coding assistants

Breaking Changes

  • Breaking: No — Purely additive changes
    • getActiveEngine() adds new binaryVersion field alongside existing version
    • All existing API contracts preserved
    • No CLI output format changes
    • No database schema changes

Tier Classification: Tier 1 (zero-dep ✓ + foundation-aligned ✓)


Final Assessment

## Root Cause Analysis
**Problem Being Solved**: ~2s incremental build regression due to engine version mismatch causing full rebuilds
**Why This Approach**: Two-layer fix addressing both runtime overwrite and CI hot-swap version drift  
**Risk Assessment**: Low risk, legitimate performance bug fix with comprehensive solution

## Backlog Compliance
- Zero-dep: ✓ — no new runtime dependencies
- Foundation-aligned: ✓ — fully compliant, no principle violations
- Problem-fit: 4/5 — directly improves build performance for AI coding workflows
- Breaking: No — purely additive API changes
- Tier: 1

## Critical Concerns
None identified. This is a well-engineered fix for a legitimate performance regression.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE
- Reasoning: Excellent technical implementation of a necessary performance fix. Addresses root cause comprehensively with both runtime and CI-level changes. Clean API design maintains backward compatibility. Only minor concern is Cargo.toml parsing simplification, which is acceptable for CI environment.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR fixes the root cause of a ~2s incremental-rebuild regression (#1066) where the JS pipeline was overwriting build_meta.engine_version with the platform package.json version instead of the binary's embedded CARGO_PKG_VERSION, causing check_version_mismatch in the Rust orchestrator to promote every incremental build to a full rebuild in CI hot-swap flows.

  • Runtime fix: getActiveEngine now returns a separate binaryVersion field (the actual CARGO_PKG_VERSION from the native addon), and all three build_meta write sites (setBuildMeta in tryNativeOrchestrator, persistBuildMetadata in finalize.ts, and the read path in checkEngineSchemaMismatch) are aligned to use nativeBinaryVersion, while the display string (ctx.engineVersion) continues to prefer the package.json version.
  • CI fix: ci-install-native.mjs now also rewrites the platform package's package.json version field to match the built binary, sourcing the value from NATIVE_BUILD_VERSION (env) or the in-tree Cargo.toml; publish.yml passes NATIVE_BUILD_VERSION for the pre-publish benchmark step where Cargo.toml is bumped before the artifact ships.

Confidence Score: 5/5

Safe to merge — all three build_meta write sites and the version-comparison read path are now mutually consistent, and the CI script correctly syncs the platform package.json to avoid re-introducing the drift.

The change is narrowly scoped: it introduces a new binaryVersion field that flows through the pipeline context to three write sites (tryNativeOrchestrator, persistBuildMetadata, checkEngineSchemaMismatch), all of which were previously misaligned in the same direction and are now aligned to use the same value the Rust comparator reads. The CI script's Cargo.toml section parser correctly isolates the [package] section, handles missing files gracefully with warnings, and the env-var override path is wired correctly in publish.yml. No behavioral paths outside the version-metadata bookkeeping are touched.

No files require special attention.

Important Files Changed

Filename Overview
src/domain/parser.ts Adds binaryVersion (CARGO_PKG_VERSION from native addon) to the getActiveEngine return type; correctly seeds it before the display-version override so the two fields diverge only when getNativePackageVersion() succeeds.
src/domain/graph/builder/context.ts Adds `nativeBinaryVersion: string
src/domain/graph/builder/pipeline.ts Propagates nativeBinaryVersion into context; aligns checkEngineSchemaMismatch read path and tryNativeOrchestrator write path to use it, with CODEGRAPH_VERSION as the fallback — resolving the write/read asymmetry flagged in the previous review thread.
src/domain/graph/builder/stages/finalize.ts Switches persistBuildMetadata to write ctx.nativeBinaryVersion ?? CODEGRAPH_VERSION — consistent with the tryNativeOrchestrator write path and checkEngineSchemaMismatch read path.
scripts/ci-install-native.mjs Adds resolveBinaryVersion() to update the platform package.json version after .node copy; Cargo.toml section parsing correctly isolates the [package] section, and all fallback paths log appropriate warnings.
.github/workflows/publish.yml Adds NATIVE_BUILD_VERSION env var to the pre-publish benchmark step's ci-install-native.mjs invocation so the version-rewrite uses the bumped Cargo.toml value, not the fallback on-disk read.

Reviews (2): Last reviewed commit: "fix(native): align tryNativeOrchestrator..." | Re-trigger Greptile

Comment thread src/domain/graph/builder/pipeline.ts Outdated
Comment on lines +682 to +686
const nativeVersionForMeta = ctx.nativeBinaryVersion || ctx.engineVersion || '';
setBuildMeta(ctx.db, {
engine: ctx.engineName,
engine_version: ctx.engineVersion || '',
codegraph_version: ctx.engineVersion || CODEGRAPH_VERSION,
engine_version: nativeVersionForMeta,
codegraph_version: nativeVersionForMeta || CODEGRAPH_VERSION,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Write/read fallback asymmetry when nativeBinaryVersion is null

When ctx.nativeBinaryVersion is null (native addon does not expose engineVersion()), tryNativeOrchestrator writes codegraph_version = ctx.engineVersion (the platform package.json version, e.g. 3.9.6). On the very next incremental, checkEngineSchemaMismatch compares that stored value against CODEGRAPH_VERSION (the JS npm package version) — not ctx.engineVersion — because nativeBinaryVersion is still null and the condition falls through to the CODEGRAPH_VERSION branch. If those two strings differ, every incremental is promoted to a full rebuild, which is exactly the bug this PR is fixing. persistBuildMetadata (finalize.ts) already uses CODEGRAPH_VERSION as its fallback in this case, so the correct fix is to align the write in tryNativeOrchestrator to match both the read path and finalize.ts.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 637c233tryNativeOrchestrator now falls back to CODEGRAPH_VERSION when ctx.nativeBinaryVersion is null, matching both the read path (checkEngineSchemaMismatch) and the JS-pipeline write path (persistBuildMetadata in finalize.ts). Removed the ctx.engineVersion fallback that created the asymmetry, and updated the comment block to call out the alignment.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Codegraph Impact Analysis

7 functions changed15 callers affected across 10 files

  • resolveBinaryVersion in scripts/ci-install-native.mjs:86 (1 transitive callers)
  • PipelineContext in src/domain/graph/builder/context.ts:21 (6 transitive callers)
  • initializeEngine in src/domain/graph/builder/pipeline.ts:68 (5 transitive callers)
  • checkEngineSchemaMismatch in src/domain/graph/builder/pipeline.ts:90 (5 transitive callers)
  • tryNativeOrchestrator in src/domain/graph/builder/pipeline.ts:604 (5 transitive callers)
  • persistBuildMetadata in src/domain/graph/builder/stages/finalize.ts:77 (3 transitive callers)
  • getActiveEngine in src/domain/parser.ts:1197 (7 transitive callers)

#1074)

When ctx.nativeBinaryVersion is null (native addon lacks engineVersion()),
tryNativeOrchestrator was writing codegraph_version = ctx.engineVersion
(platform package.json), while checkEngineSchemaMismatch and
persistBuildMetadata both fall back to CODEGRAPH_VERSION (JS package).
The asymmetry could re-introduce a perpetual full-rebuild loop on older
addons whenever those two strings diverged — exactly the bug #1066 fixed.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 4d26f08 into main May 6, 2026
25 checks passed
@carlos-alm carlos-alm deleted the fix/1066-engine-version-mismatch branch May 6, 2026 20:28
@github-actions github-actions Bot locked and limited conversation to collaborators May 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

publish gate: native incremental rebuilds regress to ~2s, JS fast-skip not firing in CI

1 participant