Skip to content

follow-up: clean up DB rows when WASM-only files are deleted from disk #1073

@carlos-alm

Description

@carlos-alm

Deferred from PR #1070 review.

Original reviewer comment: #1070 (comment)

Context:
PR #1070 makes Rust's detect_removed_files skip files outside its supported-extension set (.clj, .gleam, .jl, .fs) so the orchestrator stops purging-and-reinserting them on every incremental rebuild (the #1066 ~2s floor).

A side effect of that guard, raised by Greptile: when a WASM-only file is genuinely deleted from disk, no path now removes its nodes / file_hashes rows.

  • Rust: skips the file because the extension is not in is_supported_extension.
  • JS: backfillNativeDroppedFiles (src/domain/graph/builder/pipeline.ts:759) only inserts rows for files in expected but missing from the DB — it does not delete rows in DB but missing from expected. The "early return when row exists" Greptile referenced fires only because there is nothing to backfill, not because anything cleans up.

Result: a deleted .clj/.gleam/.jl/.fs leaves stale nodes/file_hashes/related-table rows behind until the next full rebuild.

Proposed fix sketch:
In backfillNativeDroppedFiles, after building expected and reading existingNodes/existingHashes, compute staleRel = (existingNodes ∪ existingHashes) − expected, filter to extensions outside NATIVE_SUPPORTED_EXTENSIONS (Rust still owns cleanup for its own extensions via purge_changed_files), and call purgeFilesData(db, staleRel) from src/db/repository/build-stmts.ts. Add a regression test that builds a fixture containing a .clj/.fs, deletes the file, runs an incremental rebuild on the native path, and asserts the DB rows are gone.

Why deferred from #1070:
PR #1070's stated concern is the rebuild-floor regression — adding the deletion path adds new behavior to a different module (pipeline.ts) and warrants its own test coverage and review pass. CI for #1070 is green; the deletion regression is pre-existing in the sense that any user who deletes a WASM-only file between rebuilds will hit it, but it is bounded (one stale row per deleted file, fixed by the next full rebuild).

Metadata

Metadata

Assignees

No one assigned

    Labels

    follow-upDeferred work from PR reviews that needs tracking

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions