Skip to content

WIP: refactor(cpp)#195

Open
wzekin wants to merge 8 commits into
cloudwego:mainfrom
wzekin:feat/cpp-typehierarchy-only
Open

WIP: refactor(cpp)#195
wzekin wants to merge 8 commits into
cloudwego:mainfrom
wzekin:feat/cpp-typehierarchy-only

Conversation

@wzekin
Copy link
Copy Markdown
Collaborator

@wzekin wzekin commented May 21, 2026

What type of PR is this?

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:
zh(optional):

(Optional) Which issue(s) this PR fixes:

(optional) The PR that updates user documentation:

wzekin and others added 2 commits May 21, 2026 15:17
Replace fragile C++ text heuristics with standard LSP requests where
possible.

Changes:
- lsp: add DocumentSymbol.Detail field, propagate Detail + SelectionRange
  through flattenDocumentSymbols (previously stripped).
- cpp/lsp: new CppProvider implementing PrepareTypeHierarchy /
  TypeHierarchySupertypes / TypeHierarchySubtypes / Hover /
  Implementation / WorkspaceSearchSymbols. Registered for uniast.Cpp.
- collect: collectImpl now resolves C++ bases via typeHierarchy as the
  primary path; falls back to text-level BaseClassRefs only when
  typeHierarchy returns nothing (typical for unresolved external
  templates). Saves ~50 lines of fragile Definition+alias-follow logic
  per class.
- collect: extractCppCallSig prefers clangd-canonical signature from
  sym.Detail ("void(int)") over source-scanned text ("void(int x)"),
  which is more stable across multi-line parameter lists.
- collect: dedupCppFunction uses signature-aware key (cppBaseName on
  full mid.Name) — overloads no longer collapse in Type.Methods, and
  multi-line .cpp parameter formatting no longer breaks .h/.cpp merge.

Bug fixes carried in this commit:
1. (T,Text)-keyed positive dep cache (collectDepsForEntity) collapsed
   distinct same-name calls onto one receiver. Cache reduced to negative
   only.
2. BaseClassRefs preserves the full qualified base name instead of just
   the last segment, so unresolved third_party::Provider and
   other_pkg::Provider remain distinguishable.
3. IsUsingAlias kind-aware: using-declarations importing
   functions/members (using Base::foo;) are no longer mis-routed
   through the alias-redirect path.
4. dedupCppFunction uses bracket-depth-aware cppFnHasBody helper instead
   of strings.Contains(content, "{"), so header decls with
   lambda/braced-init default arguments no longer get falsely classified
   as definitions.
5. baseSym/sym self-resolution detected by Location.Include containment
   rather than short-name equality, so app::Provider extending
   common::Provider isn't dropped as "resolved to self".
6. NVI synthesis dedup keyed by local signature (handle(int)) instead of
   short name (handle), so overloads survive inheritance.
7. Sysroot (cstdlib) + build-time codegen (build_generated) modules emit
   only edges, no nodes — scanner skips build64_release; addSymbol drops
   sysroot/codegen syms before they enter c.syms; export skips body
   emission for the same buckets.

Testdata + tests:
- testdata/cpp/{2_inheritance, 3_aliases, 4_inline_methods,
  5_external_calls, 6_nvi_dispatch, 7_provider_chain,
  8_overloaded_methods, 9_crossns_basename, 10_unresolved_basens,
  11_using_decl, 13_methods_key_consistency} for regression coverage.
- 15-case unit test for cppFnHasBody covering declarations vs
  definitions, lambda defaults, braced-init defaults, strings, comments.
- Integration tests in lang/parse_cpp_test.go for each fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t LSP calls

Make the C++ collector strictly LSP-based: drop every text-heuristic
fallback (BaseClassRefs / parseBaseSpecifiers / declarationText /
IsTypedefSymbol / IsUsingAlias / AliasTargetTokenIndex / aliasRedirect),
delete the unresolved-base post-passes (resolveCppExternalImplements /
resolveCppUnresolvedBasesActive), and run the typeHierarchy/AST
extension as the only source of truth.

Tradeoff: dependent template bases (`class Foo : public Tmpl<X,Y>`) no
longer get Implements edges — clangd's typeHierarchy doesn't expose
supertypes for unbound template bases, and we no longer try to recover
them via source-text fallback. ModPath="external" placeholders are
eliminated; consumers see clean cross-module Implements only when
typeHierarchy could fully resolve them.

Notable changes:

- lsp.ASTNode: HasFunctionBody() method + ASTRoleDeclaration /
  ASTKind{Compound,Typedef,TypeAlias,TypeAliasTemplate,Using,
  UsingShadow,UsingPack} constants.
- LSPClient.Call: retry-go-based transient retry (3 attempts, 50ms
  fixed delay) with a fast path that only enters the retry framework
  when the first call fails. MethodNotFound (-32601) and context
  cancel/deadline are terminal.
- IsJSONRPCMethodNotFound: exported from lang/lsp so collect doesn't
  duplicate the helper.
- collect: cppFileAST caches the whole-file AST per URI; cppASTForSymbol
  walks the cached tree to find the smallest containing declaration
  (returns nil on miss, NOT the TranslationUnit root) and whole-file
  range now covers the last line even when it has no trailing newline.
  AST request failure caches nil to stop further hammering, but
  transient failures get retried inside Call before that happens.
- export: dedupCppFunction uses AST-derived hasBody (Compound child of
  the function declaration only — lambda defaults no longer trigger
  false positives); typedef detection keys off ASTKindTypedef /
  ASTKindTypeAlias; alias suppression keys off ASTKindTypeAlias{,Template}
  for type aliases and only UsingDecl-family kinds for type-importing
  using-declarations.

Tests:

- TestCpp_UnresolvedBaseNamespace and testdata/cpp/10_unresolved_basens
  removed (they covered the deleted H4 fallback).
- TestCpp_Inheritance / TestCpp_ProviderChain / TestCpp_Aliases relax
  expectations for the now-unreachable cases (dependent template bases,
  using-alias redirect).
- cpp_ast_regression_test.go adds two regression tests:
  cppASTForSymbol misses must NOT fall back to the TU root, and
  whole-file AST range must include the last line without trailing
  newline.

freq_service v46 verified: 33 modules / 119 pkgs / 462 fns / 341 types
(unchanged vs the prior typeHierarchy-only run), 0 ModPath="external"
placeholders, abcoder-side token-not-found dropped from 5 (v45) to 1
(remaining issue is unrelated URL-decoding in a system header path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wzekin wzekin force-pushed the feat/cpp-typehierarchy-only branch from 92cca99 to 2be2e33 Compare May 21, 2026 07:25
@wzekin wzekin changed the title refactor(cpp) WIP: refactor(cpp) May 21, 2026
wzekin and others added 2 commits May 21, 2026 18:59
Modern LSP servers (gopls, recent clangd, pylsp Hoblovski/abc fork)
suppress `semanticTokensProvider` from the initialize response unless
the client declares semanticTokens support — at which point our
strict capability check rejects the server and parsing aborts:

  failed to initialize LSP server: server did not provide SemanticTokensProvider

Advertise the capability unconditionally so every server we currently
target (and any new ones) returns the provider. The empty tokenTypes /
tokenModifiers slices and `formats: ["relative"]` are the minimum LSP
3.17 client capability shape that turns the feature on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NewURI canonicalizes via filepath.EvalSymlinks so symbol URIs always
carry the realpath form. ScannerByJavaIPC's normalizePath only
filepath.Cleaned absolute file paths, so c.files was keyed by the
symlinked path and downstream `c.files[symbol.Location.URI.File()]`
lookups missed (e.g. /data00/<user>/... -> /home/<user>/...). The miss
returned a nil *uniast.File, and JavaSpec.NameSpace panicked on the
file.Package deref.

Apply canonicalizePath inside normalizePath so the Scanner stores the
same realpath shape as NewURI.

Fixes TestJavaIPC_InterfaceKindAndImplements on systems where the repo
is reached via a symlink (typical /home -> /data00 layout).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wzekin wzekin force-pushed the feat/cpp-typehierarchy-only branch from 6fef3cf to f32fb42 Compare May 21, 2026 11:42
wzekin and others added 4 commits May 21, 2026 19:49
Recent gopls (and likely other modern servers) interpret empty
tokenTypes/tokenModifiers arrays as "client supports zero token types"
and omit semanticTokensProvider from the initialize response — which
trips our strict capability check at parse start:

  Failed to initialize go LSP client: server did not provide
  SemanticTokensProvider

Advertise the standard LSP 3.17 token type + modifier sets so every
server treats us as a fully-capable client. Also flip on
dynamicRegistration and the `full.delta` request shape to match what
real editors send.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
initLSPClient rejected any server whose initialize response omitted
`semanticTokensProvider` — including modern LSP servers (e.g. recent
gopls in CI) that legitimately gate the feature on client-declared
capabilities. The hard-fail kept abcoder's own Go path (native parser,
never calls SemanticTokens) and Java path (tree-sitter) from starting
against such servers; CI's `gopls@latest` panic surfaced as

  Failed to initialize go LSP client:
  server did not provide SemanticTokensProvider

Treat absence as "this server doesn't support semantic tokens": store
empty tokenTypes/tokenModifiers, return success, and let any caller
that actually needs the feature surface a method-not-found error from
the LSP call itself. The other capability checks (Definition,
TypeDefinition, DocumentSymbol, References) stay required.

Reverts and supersedes the earlier attempts at advertising
semanticTokens client capability — both work for some gopls versions
but break against others. Capability-relaxation is the robust fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant