fix(native): backfill new dropped-language files on quiet incrementals#1123
Conversation
The post-orchestrator gate (`isFullBuild || removedCount>0 || changedCount>0`) left a silent gap: when a brand-new file in an extension outside the Rust binary's `file_collector` (e.g. on an older binary) is added on an otherwise-quiet incremental, the orchestrator reports no activity and backfill is skipped — the file never enters `nodes`/`file_hashes` until a forced full rebuild. Extract the missing-file detection from `backfillNativeDroppedFiles` into `detectDroppedLanguageGap`, run it once before the gate, and pass the result to `backfillNativeDroppedFiles`. The scan extends both code paths: - The `result.earlyExit` branch now also repairs the gap before returning, so no-op rebuilds with stale `nodes` rows for dropped-extension files are healed instead of left silent. - The non-earlyExit gate adds `gap.missingAbs.length > 0` so quiet incrementals with a real gap trigger backfill. No redundant fs walks: incrementals where the orchestrator signals fire would have done the same walk inside the old `backfillNativeDroppedFiles` anyway. The new cost on a truly-quiet incremental with no gap is one `collectFiles` walk + two cheap DB queries (early-return before any parsing/WAL handoff). Closes #1083, closes #1091.
|
Claude finished @carlos-alm's task —— View job PR Review: Backfill New Dropped-Language Files on Quiet IncrementalsRoot Cause AnalysisProblem Being Solved: The PR fixes a silent gap where brand-new files with dropped-language extensions (extensions not supported by the Rust native engine but supported by WASM) are never added to the graph when introduced during quiet incremental builds. The specific issue occurs when:
Why This Approach: The solution extracts missing-file detection into a separate Risk Assessment: LOW RISK. The change is surgical and well-isolated. It preserves existing behavior while closing a specific gap. The filesystem walk cost (~20-30ms) is acceptable given it prevents silent data loss. Configuration Change Detection✅ CLEAN - No configuration changes, lint rule relaxations, or test expectation modifications detected. Test Integrity✅ EXCELLENT - The new test file is exemplary:
The test strategy aligns perfectly with CLAUDE.md guidance - it tests the fix without documenting wrong behavior as expected. Code QualityStrengths:
Architecture alignment:
Technical soundness:
Minor observations:
Backlog ComplianceZero-dep: ✅ - No new runtime dependencies added
Problem-fit: 4/5 - Directly improves agent accuracy Breaking: No - Purely additive behavior that improves existing functionality Tier: 1 (zero-dep ✓ + foundation-aligned ✓) Critical ConcernsNone. The implementation is solid, well-tested, and addresses a real gap without introducing risks. Final Recommendation
Key strengths:
This PR demonstrates the engineering discipline expected in a tool that analyzes itself - it fixes the gap without introducing new ones. |
Greptile SummaryThis PR closes a silent gap where a brand-new file in a dropped-language extension (one outside the Rust binary's
Confidence Score: 5/5Safe to merge. The change is a targeted refactor of an existing backfill path, touching no new external surfaces. The extraction of No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[tryNativeOrchestrator] --> B[nativeDb.buildGraph]
B --> C{result.earlyExit?}
C -- yes --> D[detectDroppedLanguageGap]
D --> E{gap.missingAbs.length > 0?}
E -- yes --> F[backfillNativeDroppedFiles
closesNativeDb, opens better-sqlite3
inserts nodes + file_hashes]
E -- no --> G[closeDbPair]
F --> G
G --> H[return 'early-exit']
C -- no --> I[setBuildMeta + analysis phases]
I --> J[detectDroppedLanguageGap]
J --> K{isFullBuild or removedCount > 0
or changedCount > 0
or gap.missingAbs.length > 0?}
K -- yes --> L[backfillNativeDroppedFiles
pre-computed gap]
K -- no --> M[closeDbPair]
L --> M
M --> N[return BuildResult]
Reviews (5): Last reviewed commit: "Merge branch 'main' into fix/1083-droppe..." | Re-trigger Greptile |
| describeOrSkip('Dropped-language gap repair on quiet incremental (#1083)', () => { | ||
| let projectDir: string; | ||
| let dbPath: string; | ||
| let tmpBase: string; | ||
| const targetFile = 'math.js'; | ||
|
|
||
| beforeAll(async () => { | ||
| tmpBase = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-1083-')); | ||
| projectDir = path.join(tmpBase, 'proj'); | ||
| copyDirSync(FIXTURE_DIR, projectDir); | ||
| dbPath = path.join(projectDir, '.codegraph', 'graph.db'); | ||
|
|
||
| // Full build → every file has nodes + file_hashes rows. | ||
| await buildGraph(projectDir, { | ||
| engine: 'native', | ||
| incremental: false, | ||
| skipRegistry: true, | ||
| }); | ||
|
|
||
| // Simulate the issue's DB state: a file is on disk and tracked in | ||
| // `file_hashes` (so the orchestrator's hash tier sees it as unchanged | ||
| // and reports `changedCount=0`), but its `kind='file'` node row is | ||
| // absent — the same shape produced when an old binary's collector | ||
| // never inserts dropped-extension files in the first place. | ||
| // | ||
| // Foreign keys are disabled for this surgical delete; edges referencing | ||
| // the file node are left intact so we only test the gap-detection path. | ||
| const db = new Database(dbPath); | ||
| db.pragma('foreign_keys = OFF'); | ||
| db.prepare("DELETE FROM nodes WHERE kind='file' AND file = ?").run(targetFile); | ||
| db.close(); | ||
|
|
||
| // Bump the file's mtime without changing content. This forces the | ||
| // JS-side fast-skip pre-flight (#1054) to fall through to the | ||
| // orchestrator (mtime mismatch with file_hashes), while the | ||
| // orchestrator's content-hash tier still classifies the file as | ||
| // unchanged (metadata-only update) — so `changedCount=0` and | ||
| // `removedCount=0`. That's the exact orchestrator-state the issue | ||
| // describes for a brand-new dropped-language file. | ||
| const targetAbs = path.join(projectDir, targetFile); | ||
| const future = new Date(Date.now() + 5000); | ||
| fs.utimesSync(targetAbs, future, future); | ||
|
|
||
| // Incremental rebuild — must detect the gap and re-insert the row. | ||
| await buildGraph(projectDir, { | ||
| engine: 'native', | ||
| incremental: true, | ||
| skipRegistry: true, | ||
| }); | ||
| }, 60_000); |
There was a problem hiding this comment.
earlyExit=true path has no test coverage
The new test exercises only the changedCount=0 / removedCount=0 / earlyExit=false quiet-incremental code path (the mtime bump forces the orchestrator to run and report changedCount=0, but result.earlyExit is false). The separate result.earlyExit branch added in pipeline.ts (lines 665–668) — where the orchestrator itself returns before any count fields are populated — is not exercised by any test in this PR. If that branch has a bug (e.g., incorrect DB state when backfillNativeDroppedFiles mutates ctx.db before the closeDbPair call), it would go undetected. A second describeOrSkip block that keeps file_hashes intact but never bumps mtime (so the orchestrator's hash tier triggers an early exit) would cover this path directly.
There was a problem hiding this comment.
Addressed in 6f21bfe.
You're right that the existing test only made one of the two backfill paths obvious. I split the test into two describeOrSkip blocks so each branch is explicit:
- earlyExit=true — mtime bump with unchanged content. The orchestrator's content-hash tier classifies as metadata-only,
parse_changesandremovedcome out empty after the!metadata_onlyfilter inbuild_pipeline.rs:159, so it returnsearly_exit=trueand backfill fires frompipeline.ts:665-667. Verified via debug log (No changes detected— the message emitted atpipeline.ts:660, distinct from the JS-side fast-skip'sNo changes detected. Graph is up to date.). - earlyExit=false — a real content change to a sibling file drives
changedCount=1, so the orchestrator falls through to the non-early-exit path. The gate (now augmented withgap.missingAbs.length > 0) drives backfill frompipeline.ts:776-778. Verified viaIncremental: 1 changed, 0 removed.
Note: your suggestion of "keeps file_hashes intact but never bumps mtime" would actually short-circuit at the JS-side fast-skip pre-flight (pipeline.ts:1162-1188) before the orchestrator is ever invoked — so the orchestrator's earlyExit branch wouldn't be exercised by that path. The mtime bump is what forces the JS fast-skip to fall through to the orchestrator, where the content-hash tier then drives early_exit=true.
Both scenarios end with the deleted nodes row repaired.
Codegraph Impact Analysis5 functions changed → 6 callers affected across 5 files
|
Greptile flagged that the test only exercised one of the two backfill call sites added by this PR. Split the test into two describeOrSkip blocks that explicitly target each path: - earlyExit=true: mtime bump with unchanged content → orchestrator's content-hash tier classifies as metadata-only → after filtering, parse_changes/removed are empty → returns early_exit=true → backfill fires from pipeline.ts:665-667. - earlyExit=false: real content change to a sibling file (utils.js) drives changedCount=1 → orchestrator returns early_exit=false → the non-early-exit gate (now augmented with gap.missingAbs.length > 0) drives backfill from pipeline.ts:776-778. Verified via debug logs: scenario 1 emits 'No changes detected' (orchestrator earlyExit branch), scenario 2 emits 'Incremental: 1 changed, 0 removed' (main path). Both end with the gap repaired.
Summary
Closes #1083, closes #1091.
The post-orchestrator gate (
isFullBuild || removedCount>0 || changedCount>0) left a silent gap: when a brand-new file with an extension outside the Rust binary'sfile_collector(e.g. on a published binary that predates a newly-added Rust extractor) is added on an otherwise-quiet incremental, the orchestrator reports no activity and the JS backfill is skipped. The file never entersnodes/file_hashesuntil a forced full rebuild.Fix
Extract the missing-file detection from
backfillNativeDroppedFilesinto a newdetectDroppedLanguageGaphelper, run it once before the gate, and pass the precomputed gap intobackfillNativeDroppedFiles. The scan extends both post-native paths:result.earlyExitbranch now also repairs the gap before returning, so no-op rebuilds with stalenodesrows for dropped-extension files are healed instead of left silent.gap.missingAbs.length > 0, so quiet incrementals with a real gap trigger backfill.Perf
No redundant fs walks: incrementals where the orchestrator signals fire would have done the same walk inside the old
backfillNativeDroppedFilesanyway. The new cost on a truly-quiet incremental with no gap is onecollectFileswalk + two cheap DB queries (early-return before any parsing or WAL handoff).Test plan
tests/integration/dropped-language-gap.test.tssimulates the issue's DB state (file on disk +file_hashesrow, nonodesrow + bumped mtime to force fast-skip fall-through) and asserts the row is restored after an incremental. Fails onmain, passes after the fix.npx vitest run tests/integration/— 573/573 pass (includesincremental-parity,build-parity,incr-edge-gap).npm run lint— clean (1 unrelated pre-existing warning).npx tsc --noEmit— clean.