fix(retrieval): repair RRF fusion, ranking, FTS recall, and freshness#11
Conversation
Ten correctness/ranking fixes across the hybrid retrieval pipeline, plus a schema bump for symbol-name FTS indexing. - fusion: scale RRF by k so fused scores are O(1) (rerank bonuses were ~10x the entire fused signal, making rerank the de-facto ranker); merge on a coarse (path, line-bucket) key so cross-source hits actually reinforce, and count agreeing_sources at file granularity. - pipeline: scale-invariant relative-gap confidence; per-file diversification (<=3 hits/file on a page, overflow pushed to the tail, nothing dropped). - searchers: drop stopwords before building the FTS MATCH (NL queries like "how does auth work" no longer AND-in filler and zero out recall); remove the dead legacy lexical path (fts_response/fts_search/second Candidate/etc.). - schema/storage: denormalized chunks.symbol_names mirrored by the FTS triggers (external-content-safe), so symbol names are searchable. Bumps SCHEMA_VERSION 1 -> 2; older indexes stay readable and index/update rebuild on mismatch. - rerank: word-boundary test demotion (no more contest/latest false positives); damped name-reference fallback for symbols with no resolved in_degree. - graph: language-aware import resolution (prefer the importer's own extension). - freshness: content-aware (mtime -> size -> sha) so a bare touch isn't stale. Tests added/updated for every change; goldens regenerated. ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be8417c74d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # schema.sql is applied with IF NOT EXISTS (old tables/triggers persist). | ||
|
|
||
|
|
||
| def peek_schema_version(path: Path | str) -> int: |
There was a problem hiding this comment.
Keep enable_vectors on Database
Adding this top-level helper here closes the Database class before the existing enable_vectors method, so enable_vectors is now nested inside peek_schema_version and Database no longer has that attribute. Any embeddings-enabled build or vector search path that calls db.enable_vectors() (for example indexing with embeddings.enabled = true or query-time vector search) will now raise AttributeError before creating/loading the vector tables.
Useful? React with 👍 / 👎.
| if (source, key) in seen: | ||
| continue |
There was a problem hiding this comment.
Preserve distinct symbols in fusion buckets
Because the duplicate check keys only on (source, path, 40-line bucket), symbol-only results for multiple matching definitions in the same small file/window are discarded after the first one, not merely down-ranked. For example, two matching functions in the same 40-line bucket from the symbol retriever collapse to one result, so --mode symbol can lose valid definitions; include a symbol-specific discriminator or limit this dedupe to repeated chunk-style hits.
Useful? React with 👍 / 👎.
The previous commit inserted module-level `peek_schema_version` between `_guard_version` and `enable_vectors`, which silently nested `enable_vectors` inside it — so `Database.enable_vectors` no longer existed. Local mypy passed only on a stale cache and the vector tests were skipped (no sqlite_vec locally), so CI was first to catch it (`service.py:86: "Database" has no attribute "enable_vectors"`). Move `peek_schema_version` after the class. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Ten correctness/ranking fixes across the hybrid retrieval pipeline, found in a deep read of
retrieval/,graph/,storage/, andindexer/freshness. Plus a schema bump (v1→v2) so symbol names are FTS-indexed.Ranking & fusion
k— fused scores were~w/k(≈0.017), an order of magnitude below the reranker's bounded bonuses, so rerank was silently the primary ranker and RRF a tiebreak. Scaling bykis a pure monotonic transform (order unchanged) that puts fused scores and bonuses on the same O(1) scale.(path, line-bucket)fusion key — different retrievers report different line ranges for the same place, so the old exact(path, start, end)key almost never coincided and cross-source agreement never fired.agreeing_sourcesnow counted at file granularity.search_token.jsongolden).Recall & indexing
how/does/the/… are dropped before theMATCH, so NL queries no longer AND-in filler that code chunks never contain.chunks.symbol_namesmirrored verbatim by the sync triggers (external-content-safe; a live join could corrupt the index after a symbol cascade). BumpsSCHEMA_VERSION1→2 — older indexes stay readable;index/updatedetect the mismatch and rebuild.in_degree; they now get a half-capped bonus from a name-reference count. Precisein_degreestill wins.Correctness
contest/,latest.py,testimonials.tsxare no longer treated as tests.import './base'from a.tsfile resolves tobase.ts, not a same-namedbase.py.touch(mtime only, identical bytes) is a no-op forupdate, so it no longer reports the index stale.Cleanup
searchers.py(fts_response/fts_search/secondCandidate/_confidence/_fallbacks/_trim).Migration
Schema v1→v2 (added column). Rebuild is automatic on the next
index/update; no user action needed. Recorded inCHANGELOG.md(Unreleased).Testing
search_token.json,mcp_search_code.jsononly).ruff+mypyclean on all changed modules.test_real_local_embed_shapefails locally due to a torch/torchcodecnative-DLL issue unrelated to this change (no edited module is involved); excluded from the tally above.🤖 Generated with Claude Code