fix(ingestion,cli): make a broken parser fail loud, not silently produce a symbol-free graph#204
fix(ingestion,cli): make a broken parser fail loud, not silently produce a symbol-free graph#204theagenticguy wants to merge 5 commits into
Conversation
…uce a symbol-free graph The WASM-resolver bug (PR #201) shipped invisibly for ~5 days because a globally-broken parser degraded silently: every file's parse returned empty captures, `analyze` built a File/Directory-only skeleton graph, printed a node count, and exited 0. Nothing in the pipeline distinguished "parser is dead" from "repo legitimately has no symbols." This hardens that whole failure class plus the adjacent gaps the investigation surfaced. #1 Zero-symbol guard (run-level backstop). parse.ts now exposes treeSitterFileCount + treeSitterSymbolCount on ParseOutput; the orchestrator trips `zeroSymbolGuardTripped` (exported pure predicate shouldTripZeroSymbolGuard: >=5 tree-sitter files AND 0 symbols), pushes a loud warning, and analyze maps it to a distinct advisory exit code 3. Configs-only / cobol-only / unsupported-language repos report 0 tree-sitter files and never trip; CodeElement stubs are excluded so an import-only repo can't mask a break. #2 Distinguish global parser death from per-file failure. New WasmRuntimeUnavailableError (name set so it survives Piscina's structured clone). ensureWasmRuntime now THROWS it when the vendored grammar dir is missing / web-tree-sitter.wasm won't init (a deployment breakage), instead of the old ambiguous soft `undefined`. openWasmParser + parseOne rethrow it; the parse phase aborts the run with an actionable message naming the missing dir. The soft "web-tree-sitter package absent" path stays a no-throw hand-off to #1. Per-file errors (syntax, timeout, one missing grammar) still warn+skip. #3 Resolver drift-guard test. Extended the asset-resolver drift guard to cover ALL 5 shipped asset trees (added vendor/wasms + java to plugin-assets/ ci-templates/config), so a future fixed-offset regression in any resolver fails a test instead of shipping silently. #4 tsup rm-before-copy. copyTree now rm's each leaf dest before cp, so an incremental/watch rebuild can't leave a renamed/deleted asset dir behind in dist/ (the stale opencodehub-* skill dirs I hit mid-rename). Safe against the nested-dest hazard — every onSuccess dest is a distinct leaf. #5 Verifier asserts a real symbol. verify-global-install.sh now requires `codehub query 'Greet'` to return a Function/Class/Method row from greeter.go (a uniquely-cased Go func), instead of "any hit on 'export default'" — which passed on the 0-symbol skeleton (the stderr header alone satisfied the old gate). Verified end-to-end against the built bundle: happy path → exit 0, 16 nodes / 24 edges; vendor/wasms hidden → run ABORTS (exit 1) with the named-dir message, no skeleton graph persisted (was: silent exit 0). Tests: ingestion 594/594, cli 316/0/11-platform-skip; typecheck + lint + banned-strings clean.
…sion-proof
Follow-up hardening on this PR's own contract, from a review pass. Three gaps,
none a behavior regression today — but each let the contract silently rot:
1. `buildParserForLanguage` (complexity phase, main thread) is the one
runtime-seam caller whose contract flipped from "return undefined" to "may
throw WasmRuntimeUnavailableError" without being updated. Today the throw
correctly propagates (no try/catch swallows it) and aborts the run — but
only INCIDENTALLY. Document the invariant load-bearingly at the complexity
seam ("do NOT wrap in a try/catch that folds the throw into the undefined
skip path") and correct the stale `ensureWasmRuntime` docstring that still
described the old all-undefined contract. No logic change.
2. `parse.ts` dispatch catch matched `err.name === "WasmRuntimeUnavailableError"`
— but Piscina's structured-clone transport resets a custom Error subclass's
`.name` to "Error" across the worker boundary (verified empirically against
piscina 5.1.4), so that branch was DEAD. The run still aborted (runner wraps
the thrown error as `Phase 'parse' failed: …`, and `.message` survives), but
the actionable reinstall/re-vendor prefix never fired. Match on the stable
message token both throw sites emit instead, and document why.
3. No test pinned the complexity propagation, so a future "defensive" refactor
could re-swallow the throw with the suite staying green. Add two tests via a
minimal injectable-builder seam (mirrors parse-worker's parseFn idiom; inert
in production — defaults to buildParserForLanguage, fresh cache only when
injected): one asserts a global sentinel REJECTS runComplexity (verified
load-bearing — fails if the throw is swallowed into `skipped`), one asserts
the soft undefined case still degrades to a per-language skip.
ingestion 581/581, cli 316 pass / 0 fail / 11 platform-skip; typecheck + biome
clean. Net production logic change: zero (one corrected docstring, one intent
comment, one dead-branch→live-branch matcher fix, and a test-only seam).
Review follow-up — pushed a hardening commit (02c4a88)Reviewed this PR, then ran a multi-agent audit of its own fail-loud contract to confirm it's complete. The contract is functionally correct today — every reachable global-WASM-failure path aborts with a non-zero exit, no path produces a silent symbol-free exit-0 graph. But the audit (and an empirical Piscina probe) surfaced three gaps where the contract could silently rot. Fixed all three; zero production logic change. 1. 2. Dead error-match at 3. No regression test pinned the complexity propagation. A future "defensive" refactor could re-swallow the throw with the suite green. Added two tests via a minimal injectable-builder seam (mirrors Deliberately NOT done (out of scope, flagged for separate tickets): no try/catch-rethrow inside Verification: ingestion 581/581, cli 316 pass / 0 fail / 11 platform-skip, typecheck + biome clean. |
Why
The WASM-resolver bug fixed in #201 shipped invisibly for ~5 days. Root cause of the invisibility (separate from the resolver itself): a globally-broken parser degrades silently — every file returns empty captures,
analyzebuilds a File/Directory-only skeleton graph, prints a node count, and exits 0. Nothing distinguished "the parser is dead" from "this repo has no symbols." This PR closes that failure class and the adjacent gaps the investigation surfaced.The five fixes
#1 — Zero-symbol guard (run-level backstop).
ParseOutputnow carriestreeSitterFileCount+treeSitterSymbolCount; the orchestrator trips azeroSymbolGuardTrippedflag via the exported pure predicateshouldTripZeroSymbolGuard(≥5 tree-sitter files and 0 symbols), pushes a loud warning, andanalyzemaps it to a distinct advisory exit code 3. Configs-only / cobol-only / unsupported-language repos report 0 tree-sitter files and never trip; externalCodeElementimport stubs are excluded so an import-only repo can't mask a break.#2 — Distinguish global parser death from per-file failure. New
WasmRuntimeUnavailableError(withnameset so it survives Piscina's structured-clone across the worker boundary).ensureWasmRuntimenow throws it when the vendored grammar dir is missing /web-tree-sitter.wasmwon't init — a deployment breakage — instead of the old ambiguous softundefinedthat collapsed into N identical per-file warnings.openWasmParser+parseOnerethrow; the parse phase aborts the run with an actionable message. Per-file errors (syntax, timeout, one missing grammar) still warn-and-skip. The "web-tree-sitter package genuinely absent" path stays a no-throw hand-off to #1.#3 — Resolver drift-guard test. Extended the
asset-resolverdrift guard to cover all 5 shipped asset trees (addedvendor/wasms+javato plugin-assets / ci-templates / config). A future fixed-offset regression in any resolver now fails a test instead of shipping silently — codifies the audit that this session ran by hand.#4 — tsup rm-before-copy.
copyTreenowrms each leaf dest beforecp, so an incremental/watch rebuild can't accumulate renamed/deleted asset dirs indist/(the staleopencodehub-*skill dirs from the rename PR). Safe against the nested-dest hazard — eachonSuccessdest is a distinct leaf.#5 — Verifier asserts a real symbol.
verify-global-install.shnow requirescodehub query 'Greet'to return aFunction/Class/Methodrow fromgreeter.go(a uniquely-cased Go func), replacing the weak "any hit onexport default" gate — which passed on the 0-symbol skeleton because the stderr header alone satisfied "non-empty".Verification
Against the built bundle:
vendor/wasmshidden → run ABORTS (exit 1) withPhase 'parse' failed: web-tree-sitter runtime failed to initialize; vendored grammar directory not found at <dir> (reinstall … or re-vendor …). No skeleton graph persisted. (Before this PR: silent exit 0 with a 5-node skeleton — the exact bug that hid fix(cli): resolve runtime assets via walk-up probe across the flat tsup bundle #201.)The two parser fixes compose: #2 hard-aborts the runtime-death case before anything is written; #1's exit-3 guard is the backstop for the residual soft case (package genuinely absent → no throw but 0 symbols).
Tests: ingestion 594/594, cli 316 pass / 0 fail / 11 platform-skip; typecheck + repo lint + banned-strings clean. New tests:
wasm-runtime.test.ts(global-probe),parse-worker.test.ts(sentinel rethrow vs per-file warning),orchestrator.test.ts(predicate table + healthy-run no-trip), extendedasset-resolver.test.tsdrift guard.Notes
analyze(graph built, but zero code symbols). Documented in anindex.tscomment; the happy path stays exit 0.graphHash(hash is over graph nodes/edges only) — the byte-identical-hash determinism test stays green.