Skip to content

fix(r): setMethod emits call edge, not duplicate definition#1125

Merged
carlos-alm merged 2 commits into
mainfrom
fix/r-setmethod-call-not-duplicate
May 15, 2026
Merged

fix(r): setMethod emits call edge, not duplicate definition#1125
carlos-alm merged 2 commits into
mainfrom
fix/r-setmethod-call-not-duplicate

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Idiomatic S4 code (setGeneric("greet", ...) + setMethod("greet", "Person", ...)) produced two function definition nodes with the same name in both engines, breaking resolution. Split the handler so setGeneric emits a function definition (the generic) and setMethod emits a call edge to the generic, modelling the implementation registration without duplicating the symbol.
  • While auditing the WASM extractor for parity, found three sibling handlers (setClass, setGeneric, source()) that silently produced no output because they did not unwrap the argument node that tree-sitter-r places around each positional string literal. The native (Rust) extractor handled this correctly via first_argument_value. Introduced a parity helper firstStringArgument mirroring the Rust logic and routed all three handlers through it.
  • Closes follow-up: setMethod/setGeneric should emit method call edges instead of duplicate definitions in both engines (R) #1109.

Test plan

  • cargo test r_lang:: — 19/19 pass (3 new: set_method_does_not_duplicate_generic_definition, set_method_emits_call_to_generic, set_method_body_calls_are_still_captured)
  • npx vitest run tests/parsers/r.test.ts — 11/11 pass (4 new parity guards: setClass, setGeneric, setMethod body captures, source() import)
  • npx vitest run tests/parsers tests/integration tests/graph tests/unit — 2233/2235 pass (the 2 failures are pre-existing flaky tests in tests/unit/snapshot.test.ts unrelated to this change; verified by running them on origin/main where they also fail)
  • Lint clean (npm run lint)

In idiomatic S4 code, `setGeneric("greet", ...)` followed by
`setMethod("greet", "Person", ...)` produced two `function` definition
nodes for the same name. Split the handler: `setGeneric` still emits a
function definition (the generic), while `setMethod` emits a call edge
to the generic — modelling the implementation registration without
duplicating the symbol. The method body's calls are still picked up by
the recursive walk of the anonymous function argument.

While auditing the WASM extractor, found three sibling handlers that
silently produced no output because they didn't unwrap the `argument`
node that tree-sitter-r places around each positional string literal:
setClass, setGeneric, and source(). The native (Rust) extractor handled
this correctly via `first_argument_value`. Introduced a parity helper
`firstStringArgument` mirroring the Rust logic and routed all three
handlers through it. Closes #1109.
@claude
Copy link
Copy Markdown

claude Bot commented May 14, 2026

Claude finished @carlos-alm's task —— View job


Code Review Complete

Reviewing PR #1125: "fix(r): setMethod emits call edge, not duplicate definition"

Review Checklist:

  • Phase 1: Root cause analysis - verify this fixes real issues vs hiding symptoms
  • Phase 2: Configuration change detection - check for rule relaxation
  • Phase 3: Test integrity verification
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance assessment
  • Final Assessment: Rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: The PR addresses a legitimate technical issue where R S4 idiomatic code (setGeneric("greet", ...) + setMethod("greet", "Person", ...)) was producing duplicate function definition nodes with the same name in both engines, causing resolution conflicts.

Why This Approach: The fix correctly models the semantic difference:

  • setGeneric creates a function definition (the generic)
  • setMethod registers an implementation via a call edge to the generic (not a new definition)

This is architecturally sound and fixes the root cause rather than masking symptoms.

Phase 2: Configuration Change Detection ✅

No concerning configuration changes detected: No biome rules disabled, no test thresholds lowered, no build configuration modified, no TypeScript strict flags disabled. This is pure code logic improvement.

Phase 3: Test Integrity ✅

Tests demonstrate correctness, not weakness:

  • New tests verify the fix works correctly (no duplicate definitions, method calls still captured)
  • Tests are strengthened, not weakened - they now verify parity between engines
  • Test assertions are specific and meaningful, not conditional or skipped
  • Test coverage expanded to catch the previously silent WASM extractor failures

The test plan shows comprehensive validation:

  • 19/19 cargo tests pass (3 new tests)
  • 11/11 vitest tests pass (4 new parity guards)
  • Full test suite passes (2233/2235, with 2 pre-existing unrelated failures)

Phase 4: Code Quality Review ✅

Excellent implementation quality:

Rust extractor (r_lang.rs):

  • Well-documented helper first_argument_value with comprehensive comment explaining tree-sitter-r's argument wrapping
  • Proper field-based access with fallbacks (child_by_field_name("value"))
  • Correct handling of named arguments vs positional
  • Good error handling and null safety
  • Tests are comprehensive and well-named

TypeScript extractor (r.ts):

  • Good parity helper firstStringArgument that mirrors Rust logic
  • Proper quote stripping using replace with anchors (safer than trim_matches)
  • Clear comments explaining the semantic change in handleSetMethod
  • Consistent error handling

No significant issues detected:

  • No potential memory leaks or resource issues
  • No SQL injection risks (this is AST parsing, not dynamic SQL)
  • No command injection (no exec calls)
  • Performance impact is minimal (just different edge creation)
  • Good separation of concerns

Minor observation: The stripQuotes regex in TypeScript is slightly different from the Rust implementation (regex vs index-based), but both are correct for their contexts.

Phase 5: Backlog Compliance Assessment ✅

Root Cause Analysis

Problem Being Solved: Fixes duplicate function definition nodes for R S4 method dispatch patterns, which broke resolution in both engines
Why This Approach: Correct semantic modeling - setGeneric defines the function, setMethod calls it (not defines it)
Risk Assessment: Low risk - improves accuracy, comprehensive test coverage, no breaking changes

Backlog Compliance

  • Zero-dep: ✓ — Uses existing tree-sitter AST parsing, no new runtime dependencies
  • Foundation-aligned: ✓ — Improves graph accuracy (P1: always current), no violations detected
  • Problem-fit: 5 — Directly prevents hallucinated dependencies and improves resolution accuracy for R codebases
  • Breaking: No — Purely additive improvement to edge extraction, no API changes
  • Tier: 1

Critical Concerns

None identified. This is a solid bug fix with excellent test coverage.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality fix that improves core graph accuracy with comprehensive testing. Addresses a legitimate resolution bug without introducing risks. The dual-engine parity work also fixes three silent failures in the WASM extractor. Well-implemented, well-tested, and directly improves the tool's core value proposition.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR fixes a graph correctness bug where idiomatic S4 R code (setGeneric + setMethod) produced two function definition nodes for the same name, breaking symbol resolution. It also fixes a parity gap in the WASM TypeScript extractor where setClass, setGeneric, and source() silently emitted nothing because they didn't unwrap the argument wrapper node that tree-sitter-r places around positional string literals.

  • r_lang.rs / r.ts: setMethod is split from setGeneric; it now emits a call edge to the generic instead of a duplicate definition, while recursive walking still captures method-body calls.
  • r.ts: Three handlers (handleSetClass, handleSetGeneric, handleSourceCall) are refactored onto a shared firstStringArgument helper that unwraps the argument node, mirroring first_argument_value in the Rust extractor.
  • tests/parsers/r.test.ts / r_lang.rs test module: Seven new tests (3 Rust, 4 TypeScript) guard the fixed behaviors and the Rust/WASM parity.

Confidence Score: 5/5

Safe to merge — the change is a targeted correctness fix with no destructive side effects on existing symbol extraction.

Both extractors (Rust and TypeScript) are updated in lock-step and converge on identical semantics. The firstStringArgument helper correctly handles both bare string nodes and the argument-wrapped form that tree-sitter-r uses for positional args. The refactored handleSetMethod emits a call edge rather than a duplicate definition, which is the right model. Seven new tests directly cover the fixed behaviors and Rust/WASM parity, and the existing 2000+ test suite shows no regressions.

No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/r_lang.rs Splits setGeneric/setMethod dispatch; new handle_set_method emits a call edge. Three new unit tests cover the core fix.
src/extractors/r.ts Adds firstStringArgument/stripQuotes helpers for argument-node unwrapping; refactors handleSetClass, handleSetGeneric, handleSourceCall, and adds handleSetMethod.
tests/parsers/r.test.ts Adds 6 new parity-guard tests for setClass, setGeneric, setMethod (no duplicate, call edge, body captures), and source() import extraction.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["handleCall(node)"] --> B{funcName?}
    B -->|setGeneric| C["handleSetGeneric(node)"]
    B -->|setMethod| D["handleSetMethod(node)"]
    B -->|setClass / setRefClass| E["handleSetClass(node)"]
    B -->|source| F["handleSourceCall(node)"]
    B -->|library / require| G["handleLibraryCall(node)"]
    B -->|other| H["emit call edge\n{ name: funcName }"]

    C --> I["firstStringArgument(node)"]
    D --> I
    E --> I
    F --> I

    I --> J{arg.type?}
    J -->|'string'| K["stripQuotes → return name"]
    J -->|'argument'| L["childForFieldName('value')"]
    L -->|string node| K
    L -->|null| M["fallback: iterate children\nfind first 'string'"]
    M --> K
    J -->|other| N["return null"]

    C --> O["ctx.definitions.push\n{ kind: 'function' }"]
    D --> P["ctx.calls.push\n{ name: genericName }"]
    E --> Q["ctx.definitions.push\n{ kind: 'class' }"]
    F --> R["ctx.imports.push\n{ names: 'source' }"]

    style D fill:#d4edda,stroke:#28a745
    style P fill:#d4edda,stroke:#28a745
    style I fill:#fff3cd,stroke:#ffc107
Loading

Reviews (2): Last reviewed commit: "Merge branch 'main' into fix/r-setmethod..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

Codegraph Impact Analysis

12 functions changed10 callers affected across 2 files

  • handle_call in crates/codegraph-core/src/extractors/r_lang.rs:140 (1 transitive callers)
  • handle_set_method in crates/codegraph-core/src/extractors/r_lang.rs:345 (2 transitive callers)
  • set_method_does_not_duplicate_generic_definition in crates/codegraph-core/src/extractors/r_lang.rs:464 (0 transitive callers)
  • set_method_emits_call_to_generic in crates/codegraph-core/src/extractors/r_lang.rs:485 (0 transitive callers)
  • set_method_body_calls_are_still_captured in crates/codegraph-core/src/extractors/r_lang.rs:497 (0 transitive callers)
  • handleCall in src/extractors/r.ts:103 (2 transitive callers)
  • handleSourceCall in src/extractors/r.ts:219 (3 transitive callers)
  • handleSetClass in src/extractors/r.ts:230 (3 transitive callers)
  • handleSetGeneric in src/extractors/r.ts:241 (3 transitive callers)
  • handleSetMethod in src/extractors/r.ts:258 (3 transitive callers)
  • firstStringArgument in src/extractors/r.ts:268 (6 transitive callers)
  • stripQuotes in src/extractors/r.ts:291 (6 transitive callers)

@carlos-alm carlos-alm merged commit aee042b into main May 15, 2026
27 checks passed
@carlos-alm carlos-alm deleted the fix/r-setmethod-call-not-duplicate branch May 15, 2026 06:07
@github-actions github-actions Bot locked and limited conversation to collaborators May 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

follow-up: setMethod/setGeneric should emit method call edges instead of duplicate definitions in both engines (R)

1 participant