fix(julia): port parameterized-type / qualified-def / qualified-import fixes to WASM#1128
fix(julia): port parameterized-type / qualified-def / qualified-import fixes to WASM#1128carlos-alm wants to merge 3 commits into
Conversation
…t fixes to WASM The native Julia extractor was fixed in #1098 for three issues that were already latent in the WASM extractor but not surfaced by the existing fixtures. Per the dual-engine policy, port the fixes so both engines produce identical results. 1. Parameterized struct names (`struct Vec{T} <: AbstractArray{T,1}`) no longer silently emit the raw type-head text as the definition name — `findBaseName` recurses through `binary_expression`, `parametrized_type_expression`, and related wrappers to locate the base identifier. 2. Qualified function defs / short-form methods inside a module no longer get double-prefixed: `function Base.show ... end` inside `module Foo` now records `Base.show` (not `Foo.Base.show`); same for short-form `Foo.bar(x, y) = x + y` inside `module Outer`. 3. `selected_import` with a qualified module (`import LinearAlgebra.BLAS: gemm`) now correctly records `LinearAlgebra.BLAS` as the import source and `gemm` as the imported name. Also fixes a related latent bug: `findChild(node, 'call_expression')` on a `function_definition` was matching the body's first call (e.g. `println(...)`) instead of the signature, because the signature is wrapped in a `signature` node. Added a `signatureCall` helper mirroring the native code. Closes #1111
|
Claude finished @carlos-alm's task —— View job PR Review: Julia WASM Extractor Bug FixesTodo List:
Root Cause AnalysisProblem Being Solved: This PR addresses real dual-engine parity bugs in the Julia WASM extractor. Three specific issues from #1098 (native engine fixes) needed to be ported to WASM:
Why This Approach: The fix correctly implements dual-engine parity by mirroring the native Rust implementation. The approach adds targeted helper functions ( Risk Assessment: Low risk. This is correcting extraction accuracy to match the native engine - no behavioral changes to the query layer or API contracts. Backlog Compliance
Critical ConcernsNone identified. This is a well-executed bug fix that addresses real dual-engine parity issues. Strengths:
Code quality highlights:
Final Recommendation
|
Codegraph Impact Analysis8 functions changed → 9 callers affected across 2 files
|
Greptile SummaryThis PR ports the three Julia symbol-extraction fixes from #1098 (native Rust engine) to the WASM TypeScript extractor, closing the dual-engine parity gap tracked by #1111. All new helpers —
Confidence Score: 5/5Safe to merge — all new helpers are faithful ports of the already-reviewed Rust implementations, and all four new test cases plus the existing suite pass. Each of the three WASM helpers ( No files require special attention. Important Files Changed
Reviews (5): Last reviewed commit: "test(julia): assert selected_import shap..." | Re-trigger Greptile |
| const binary = findChild(typeHead, 'binary_expression'); | ||
| if (binary) { | ||
| // Walk into each side of the binary expression to find the base-name | ||
| // identifier — handles parameterized forms like `Vec{T} <: AbstractArray{T,1}`. | ||
| const sides: TreeSitterNode[] = []; | ||
| for (let i = 0; i < binary.childCount; i++) { | ||
| const c = binary.child(i); | ||
| if (c && c.type !== 'operator') sides.push(c); | ||
| } | ||
| nameNode = sides[0] ? findBaseName(sides[0]) : null; | ||
| supertypeNode = sides[1] ? findBaseName(sides[1]) : null; | ||
| } else { | ||
| nameNode = findBaseName(typeHead); | ||
| } |
There was a problem hiding this comment.
Missing test for non-parameterized struct inheritance
The old code explicitly looked for a subtype_expression node (findChild(typeHead, 'subtype_expression')) to detect Point <: AbstractPoint. That path has been entirely removed and replaced with a binary_expression lookup. If the tree-sitter-julia grammar represents simple non-parameterized inheritance with a subtype_expression node (rather than binary_expression), findChild(typeHead, 'binary_expression') returns null, findBaseName(typeHead) recurses without entering subtype_expression (not in TYPE_HEAD_WRAPPERS), and returns null — so the entire struct is silently dropped from ctx.definitions. The new parameterized test (Vec{T} <: AbstractArray{T,1}) confirms the grammar uses binary_expression for that form, but there is no test for the simple case struct Point <: AbstractPoint to verify the same grammar node is used and the extends relationship is still recorded.
There was a problem hiding this comment.
Fixed in 8c2e148 — added a test for non-parameterized struct inheritance (struct Point <: AbstractPoint) in tests/parsers/julia.test.ts. Confirmed via AST inspection that the Julia grammar wraps both the simple and parameterized cases in a binary_expression node, so the new code path handles both correctly. The native engine already had this test (crates/codegraph-core/src/extractors/julia.rs:592), so this brings WASM to parity.
| const TYPE_HEAD_WRAPPERS: ReadonlySet<string> = new Set([ | ||
| 'binary_expression', | ||
| 'parametrized_type_expression', | ||
| 'parameterized_identifier', | ||
| 'type_parameter_list', | ||
| 'type_argument_list', | ||
| ]); | ||
|
|
||
| function findBaseName(node: TreeSitterNode): TreeSitterNode | null { | ||
| if (node.type === 'identifier') return node; | ||
| const direct = findChild(node, 'identifier'); | ||
| if (direct) return direct; | ||
| for (let i = 0; i < node.childCount; i++) { | ||
| const child = node.child(i); | ||
| if (!child) continue; | ||
| if (TYPE_HEAD_WRAPPERS.has(child.type)) { | ||
| const found = findBaseName(child); | ||
| if (found) return found; | ||
| } | ||
| } | ||
| return null; | ||
| } |
There was a problem hiding this comment.
type_parameter_list / type_argument_list in TYPE_HEAD_WRAPPERS can yield the wrong identifier
findBaseName checks findChild(node, 'identifier') before recursing, so in practice the struct name is found before the loop reaches a type_parameter_list or type_argument_list. However, if findBaseName is ever called with a node that lacks a direct identifier child and does have one of those wrapper types as a child — for example, a future call site or an unusual parameterized form — the function will recurse into type_parameter_list and return the first type-parameter identifier (e.g. T) instead of the struct name. Removing those two entries from TYPE_HEAD_WRAPPERS would eliminate the risk without affecting correctness.
There was a problem hiding this comment.
Fixed in 8c2e148 — removed type_parameter_list and type_argument_list from TYPE_HEAD_WRAPPERS in both the WASM and native engines (preserving dual-engine parity per CLAUDE.md). AST inspection confirmed Julia's grammar uses curly_expression for {T} constructs, not those node kinds, so the entries were dead code. Removing them eliminates the risk of recursing into a type-parameter list and returning a type variable as the struct name, as you noted.
…ype test (#1128) - Remove type_parameter_list / type_argument_list from TYPE_HEAD_WRAPPERS in both WASM and native engines. Julia grammar uses curly_expression for {T} constructs, so these were dead code. Removing them prevents findBaseName from ever recursing into a type-parameter list and returning a type variable (e.g. T) instead of the struct name. - Add WASM test for non-parameterized struct inheritance (struct Point <: AbstractPoint). The native engine already covers this case; the WASM side now has parity.
…lback (#1128) - Strengthen the `import Base: show` test to assert the corrected source/names shape (was only checking that imports were emitted at all, so a regression to the broken pre-fix shape would have slipped through). - Document the grammar assumption behind `signatureCall` / `signature_call` in both engines: the call_expression fallback exists only for defensive grammar-drift protection, not as a routine path — if it ever fires on a real definition, the function name will silently match the first body call_expression instead.
|
Addressed Greptile's round-2 feedback in 92ce81b:
Both Rust and TypeScript still in parity; all julia tests pass (11 WASM, 16 native). |
Summary
src/extractors/julia.tsper the dual-engine policy.findBaseNamehelper that recurses throughbinary_expression/parametrized_type_expression/ parameterized-identifier / type-parameter / type-argument wrappers, mirroring the nativefind_base_name.!base.includes('.')inhandleFunctionDefandhandleAssignmentso qualified names likeBase.showinsidemodule Foono longer becomeFoo.Base.show.handleImportto walk intoselected_importchildren, treatingscoped_identifiermodules as the source and the trailing segment as the imported display name.signatureCallhelper to unwrapfunction_definition → signature → call_expression. Without this,findChild(node, 'call_expression')was matching the function body's first call (e.g.println(...)) and recording it as the function name — a latent bug not surfaced by the existing fixtures, but required for the qualified-def test to pass.Test plan
npx vitest run tests/parsers/julia.test.ts— 10/10 pass, including the four new parity tests (extracts parameterized struct base name,qualified short-form method does not double-prefix,qualified function def does not double-prefix,selected_import handles qualified module).cargo test --lib julia— native side still green (16/16).Out of scope
While auditing, I found three additional WASM-only divergences not in scope of #1111 — broken
handleAbstractDef, brokenhandleMacroDef, and missing signature-call skip inhandleCall. Filed as #1126.Closes #1111