From 05b557a4dffbeb7134dad55801ac35d7b4a7ee74 Mon Sep 17 00:00:00 2001
From: denfry <aseraw115@gmail.com>
Date: Sun, 14 Jun 2026 12:37:25 +0300
Subject: [PATCH 1/3] feat(retrieval,discovery): dampen in_degree tiebreak +
 label config/IaC

Roadmap chunk C (large features, landed behind the benchmark gate).

- rerank: replace the linear in_degree bonus (saturated by in_degree=10, gave
  100-caller "god classes" the full bonus) with a logarithmic curve + lower cap,
  so centrality stays a tiebreak instead of floating god classes above relevant
  low-degree matches. Validated as no-regression on the public benchmark
  (Recall@k/MRR/nDCG unchanged) plus a targeted regression test.
- discovery/classify: label Dockerfile/Containerfile, Terraform (.tf/.tfvars),
  HCL, INI (.ini/.cfg/.conf/.properties) and Makefiles (Tier-C, FTS-only). These
  were already FTS-indexed as unknown text; labeling surfaces them in stats and
  lets agents scope searches to config.
- docs: typed-framework-edges design spec (M13 documented-first deliverable);
  LANGUAGES.md Tier-C row updated.
- regenerate tests/golden/search_token.json (one score shifted; order unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/LANGUAGES.md                             |   8 +-
 ...2026-06-14-typed-framework-edges-design.md | 136 ++++++++++++++++++
 src/codebase_index/discovery/classify.py      |  34 ++++-
 src/codebase_index/retrieval/rerank.py        |  14 +-
 tests/golden/search_token.json                |   2 +-
 tests/test_classify.py                        |  26 ++++
 tests/test_rerank.py                          |  29 +++-
 7 files changed, 242 insertions(+), 7 deletions(-)
 create mode 100644 docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md

diff --git a/docs/LANGUAGES.md b/docs/LANGUAGES.md
index e8cd5e4..cb1a7c2 100644
--- a/docs/LANGUAGES.md
+++ b/docs/LANGUAGES.md
@@ -6,7 +6,7 @@
 |---|---|---|
 | Tier A | Language-specific Tree-sitter `LangSpec` with definition, call, and import/inheritance patterns | Python, JavaScript, TypeScript, Java, Go, Rust, C, C++, C#, Ruby, PHP, Kotlin |
 | Tier B | Generic Tree-sitter path when a loadable grammar exists, without language-specific graph semantics | Lua |
-| Tier C | Line chunks + FTS5 lexical search only | Markdown, JSON, YAML, TOML, SQL and other text/config files |
+| Tier C | Line chunks + FTS5 lexical search only | Markdown, JSON, YAML, TOML, SQL; config/IaC: Dockerfile, Terraform (`.tf`/`.tfvars`), HCL, INI (`.ini`/`.cfg`/`.conf`/`.properties`), Makefiles; and other text/config files |
 
 Tier A is the only tier that should be advertised as symbol-aware. Tier B can
 surface useful definitions, but it is intentionally weaker and should be called
@@ -45,7 +45,11 @@ High-priority code languages:
 - Objective-C
 - Vue and Svelte component structure
 
-High-priority non-code and framework-aware extraction:
+High-priority non-code and framework-aware extraction (config/IaC files are now
+**Tier-C labeled** — indexed, language-tagged, and FTS-searchable; the items below
+are the deeper *structured* extraction still on the roadmap, and the framework
+graph part is designed in
+`docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md`):
 
 - SQL schema-aware parsing: tables, columns, migrations, model/query consumers
 - Terraform/HCL: resources, modules, variables, outputs
diff --git a/docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md b/docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md
new file mode 100644
index 0000000..9f0e086
--- /dev/null
+++ b/docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md
@@ -0,0 +1,136 @@
+# Typed Framework Edges — Design
+
+> **Status:** Design draft (2026-06-14). Implementation NOT started.
+> **Author:** denfry
+> **Milestone:** Product roadmap **M13 — Code intelligence graph** (extends ARCHITECTURE.md §9).
+> **Why a design doc first:** `PRODUCT_UPGRADE_PLAN.md` §10 marks typed edges *High risk* and the
+> repo rule is "anything that risks destabilizing retrieval quality or the security model is
+> documented here first and lands behind a benchmark." This is that document.
+
+## 1. Problem
+
+The graph today has four edge types — `import`, `call`, `reference`, `extends`/`implements`
+(see `storage/schema.sql` `edges.edge_type`). They power `refs`, bounded `impact`, and the rerank
+centrality bonus. They do **not** capture how modern frameworks actually wire code together, so
+`impact "routes/users.ts"` misses the handler→service→repository→model chain a human would trace.
+
+ARCHITECTURE.md §9 lists the target typed edges:
+
+- HTTP route → handler → service → repository → model
+- test → fixture → implementation
+- interface/trait → implementation (partially covered today by `implements`)
+- config key → consumer
+- migration → model → query
+- event producer → event consumer
+- DI container / framework wiring
+- frontend component → hook/store/api client
+- error string / log message → throw site → handler
+
+## 2. Why the current edge mechanism is not enough
+
+The existing capture-prefix mechanism (`treesitter._EDGE_PREFIXES`, e.g. `@import.module`) emits an
+edge whose target is an **identifier captured from the AST**, resolved later by symbol name or
+module path (`graph/builder.resolve_edges`). That works for imports and inheritance because the
+target *is* a named symbol/module in the same repo.
+
+Framework edges break this assumption in two ways:
+
+1. **The link is a string literal, not a symbol.** `@app.get("/users/{id}")` ties a URL pattern to a
+   handler function. There is no `"/users/{id}"` symbol to resolve to. The edge is really
+   *"this function is the handler for this route"* — an attribute of the handler plus a
+   route-string key that only matches another route-string elsewhere (e.g. a client `fetch`).
+2. **Resolution is heuristic and framework-specific.** "service" is a naming/DI convention, not a
+   language construct. Precision varies by framework, so edges need **confidence** and
+   **provenance** so agents can treat a Spring `@Autowired` edge differently from a guessed
+   `*Service` name match.
+
+A naive "add a `route` prefix" would emit unresolvable edges that pollute `impact`. We need a new,
+explicitly-typed, confidence-bearing edge path.
+
+## 3. Schema changes
+
+Extend `edges` (additive — bumps `storage/db.SCHEMA_VERSION`, with a migration that backfills
+defaults so old indexes rebuild rather than guess):
+
+```sql
+ALTER TABLE edges ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0;  -- 0..1; 1.0 = exact AST/import
+ALTER TABLE edges ADD COLUMN resolver   TEXT;                       -- provenance, e.g. 'fastapi.decorator'
+ALTER TABLE edges ADD COLUMN dst_key    TEXT;                       -- non-symbol join key (route string, event name, config key)
+CREATE INDEX idx_edges_dstkey ON edges(dst_key);
+```
+
+New `edge_type` values (open-ended TEXT, no enum migration needed): `route`, `handler`,
+`test_target`, `config_consumer`, `migration_model`, `event`, `di_wire`, `component_dep`,
+`log_site`. Existing four edge types keep `confidence = 1.0` and `resolver = NULL`, so all current
+behavior is byte-for-byte unchanged.
+
+`dst_key` is the join column for string-keyed edges: a `route` edge from a handler and a
+`component_dep` edge from a client both carry `dst_key = "GET /users/{id}"`; the builder pairs
+producers and consumers by `dst_key` instead of by symbol name.
+
+## 4. Resolver architecture
+
+A new `parsers/frameworks/` package, each module a `FrameworkResolver`:
+
+```python
+class FrameworkResolver(Protocol):
+    name: str                       # provenance string, e.g. "fastapi"
+    def detects(self, file: FileMeta, imports: list[str]) -> bool: ...
+    def edges(self, tree, symbols, source) -> list[TypedEdge]: ...   # carries confidence + resolver
+```
+
+Detection is import-gated (only run the FastAPI resolver on files importing `fastapi`/`starlette`),
+so cost is proportional to relevant files and an unrecognized stack adds nothing. First resolvers,
+chosen for coverage-per-effort:
+
+| Resolver | Edge(s) | Confidence basis |
+|---|---|---|
+| `fastapi` / `flask` | `route` (decorator → handler) | 1.0 (explicit decorator) |
+| `express` | `route` (`app.get(path, handler)`) | 0.9 (handler ref may be inline) |
+| `pytest` | `test_target` (test → impl by import + name) | 0.7 (name heuristic) |
+| `spring` | `di_wire` (`@Autowired`/constructor) | 0.95 |
+
+Each resolver is independently testable against a fixture file and contributes a labeled row to the
+graph benchmark (§6). New frameworks are added without touching the core — same spirit as the
+Tier-A `LangSpec` registry.
+
+## 5. Surfacing & honesty
+
+- `impact` / `refs` responses gain a per-edge `confidence` + `resolver` and group results as
+  **precise** (≥0.9) vs **heuristic** (<0.9), mirroring the existing `GraphCoverage.partial`
+  honesty signal — agents trust precise edges and treat heuristic ones as leads, not proof.
+- `stats` reports which framework resolvers fired and how many typed edges each produced.
+- `doctor` notes when typed edges exist but no resolver matched the repo's stack (so a missing
+  resolver is visible, not silent).
+- Rerank: typed edges are **excluded** from the `in_degree` centrality bonus initially (they would
+  re-introduce the god-class skew this release just dampened); revisit only behind the benchmark.
+
+## 6. Benchmark gate (required before merge)
+
+`tests/benchmark_public.py` already has a `graph_tasks` section (`route→handler→service` is an
+explicit TODO in `PRODUCT_UPGRADE_PLAN.md` §8). Before any resolver lands:
+
+1. Add hand-labeled framework-graph cases (route→handler→service→model paths) to the public fixture
+   and to a real multi-framework repo case.
+2. Report `graph_tasks.pass_rate` **before/after**, plus retrieval `recall@k`/`MRR`/`nDCG` to prove
+   no retrieval regression (the gate this release's rerank change passed).
+3. Publish raw logs next to the headline number (the §8 "no-overclaim procedure").
+
+A resolver merges only if it raises graph pass-rate **without** lowering retrieval metrics.
+
+## 7. Phasing
+
+- **Phase 1** — schema columns + migration + the `route`/`handler` pair for one Python framework
+  (FastAPI), `dst_key` pairing, confidence/provenance plumbing, benchmark cases. Smallest
+  end-to-end vertical slice.
+- **Phase 2** — Express + Flask routes; `test_target`; surface precise/heuristic split in
+  `impact`/`refs`.
+- **Phase 3** — `config_consumer`, `migration_model`, `event`, `di_wire`; per-resolver `stats`.
+- **Phase 4** — frontend `component_dep`, `log_site`; rerank integration (behind the benchmark).
+
+## 8. Non-goals
+
+- Not a type checker or a full call-graph resolver across dynamic dispatch.
+- Not cross-repo / monorepo graph (single-repo remains the product boundary).
+- No network or LLM-assisted resolution — resolvers stay static, local, and deterministic so the
+  privacy model (SECURITY.md) is untouched.
diff --git a/src/codebase_index/discovery/classify.py b/src/codebase_index/discovery/classify.py
index a075281..f41be35 100644
--- a/src/codebase_index/discovery/classify.py
+++ b/src/codebase_index/discovery/classify.py
@@ -36,6 +36,26 @@
     ".yaml": "yaml",
     ".toml": "toml",
     ".sql": "sql",
+    # Config / IaC (Tier C: line-chunk + FTS, no tree-sitter spec). These were already
+    # indexed as unknown-language text; labeling them surfaces infra files in `stats`
+    # and lets agents scope searches to config without a tree-sitter grammar.
+    ".tf": "terraform",
+    ".tfvars": "terraform",
+    ".hcl": "hcl",
+    ".ini": "ini",
+    ".cfg": "ini",
+    ".conf": "ini",
+    ".properties": "ini",
+}
+
+# Extension-less or specially-named config/IaC files, matched on the lowercased
+# filename (and a `name.suffix` form, e.g. `web.Dockerfile`). Kept separate from
+# the suffix table because these carry their identity in the name, not the suffix.
+_LANG_BY_NAME = {
+    "dockerfile": "dockerfile",
+    "containerfile": "dockerfile",
+    "makefile": "make",
+    "gnumakefile": "make",
 }
 
 # Authoritative set of *code* languages routed to tree-sitter (Guardrail 1). Every entry MUST
@@ -74,7 +94,19 @@
 
 
 def detect_language(path: str) -> Optional[str]:
-    return _LANG_BY_SUFFIX.get(PurePosixPath(path).suffix.lower())
+    pure = PurePosixPath(path)
+    suffix = pure.suffix.lower()
+    if suffix:
+        lang = _LANG_BY_SUFFIX.get(suffix)
+        if lang is not None:
+            return lang
+    name = pure.name.lower()
+    if name in _LANG_BY_NAME:
+        return _LANG_BY_NAME[name]
+    # `web.Dockerfile`, `base.dockerfile`, etc.: identity is the suffix-as-name.
+    if suffix and suffix[1:] in _LANG_BY_NAME:
+        return _LANG_BY_NAME[suffix[1:]]
+    return None
 
 
 def parser_for(lang: Optional[str]) -> str:
diff --git a/src/codebase_index/retrieval/rerank.py b/src/codebase_index/retrieval/rerank.py
index 7a63746..551c485 100644
--- a/src/codebase_index/retrieval/rerank.py
+++ b/src/codebase_index/retrieval/rerank.py
@@ -7,12 +7,22 @@
 
 from __future__ import annotations
 
+import math
 import re
 
 from .types import Candidate, Intent
 
 _TERM_RE = re.compile(r"[A-Za-z0-9_]+")
 
+# Graph-centrality bonus. Logarithmic (not linear) so a "god class" with hundreds
+# of callers cannot dominate a genuinely relevant low-degree match on a stray-term
+# tie. log1p compresses the tail — in_degree 4 → 10 → 100 yields a gently rising,
+# capped bonus instead of saturating the cap by in_degree 10 — and the lower cap
+# keeps centrality a tiebreak rather than an override. This dampens the god-class
+# over-ranking documented in tests/benchmark_honest_RESULTS.md.
+_DEGREE_SCALE = 0.03
+_DEGREE_CAP = 0.08
+
 
 def rerank(candidates: list[Candidate], *, query: str, intent: Intent) -> list[Candidate]:
     terms = {t.lower() for t in _TERM_RE.findall(query)}
@@ -33,10 +43,10 @@ def rerank(candidates: list[Candidate], *, query: str, intent: Intent) -> list[C
             reasons.append(f"in {c.path.rsplit('/', 1)[0] or '.'}/")
 
         if c.in_degree:
-            bonus += min(0.10, c.in_degree * 0.01)
+            bonus += min(_DEGREE_CAP, math.log1p(c.in_degree) * _DEGREE_SCALE)
             reasons.append(f"{c.in_degree} callers")
         if intent is Intent.ARCHITECTURE and (c.in_degree + c.out_degree):
-            bonus += min(0.10, (c.in_degree + c.out_degree) * 0.005)
+            bonus += min(_DEGREE_CAP, math.log1p(c.in_degree + c.out_degree) * (_DEGREE_SCALE / 2))
 
         wants_tests = "test" in terms or "tests" in terms
         if c.is_generated or (("test" in c.path.lower()) and not wants_tests):
diff --git a/tests/golden/search_token.json b/tests/golden/search_token.json
index 3822398..732f6c4 100644
--- a/tests/golden/search_token.json
+++ b/tests/golden/search_token.json
@@ -55,7 +55,7 @@
       "path": "src/auth/token.py",
       "rank": 1,
       "reason": "in src/auth/ · 2 callers",
-      "score": 0.1467,
+      "score": 0.1596,
       "snippet": "def refresh_access_token(refresh_token: str) -> str:",
       "symbols": [
         "refresh_access_token"
diff --git a/tests/test_classify.py b/tests/test_classify.py
index 3ce76f0..87aa43c 100644
--- a/tests/test_classify.py
+++ b/tests/test_classify.py
@@ -25,6 +25,32 @@ def test_parser_for_tree_sitter_languages():
     assert parser_for(None) == "line"
 
 
+def test_detect_config_and_iac_languages():
+    assert detect_language("infra/main.tf") == "terraform"
+    assert detect_language("infra/prod.tfvars") == "terraform"
+    assert detect_language("infra/policy.hcl") == "hcl"
+    assert detect_language("setup.cfg") == "ini"
+    assert detect_language("app/settings.ini") == "ini"
+    assert detect_language("app.conf") == "ini"
+    assert detect_language("gradle.properties") == "ini"
+
+
+def test_detect_language_by_filename():
+    # Dockerfile/Makefile carry identity in the name, not the suffix.
+    assert detect_language("Dockerfile") == "dockerfile"
+    assert detect_language("docker/Dockerfile") == "dockerfile"
+    assert detect_language("services/web.Dockerfile") == "dockerfile"
+    assert detect_language("Containerfile") == "dockerfile"
+    assert detect_language("Makefile") == "make"
+    assert detect_language("GNUmakefile") == "make"
+
+
+def test_config_and_iac_languages_stay_on_line_parser():
+    # Tier C: labeled, but FTS-only — never routed to a (missing) tree-sitter spec.
+    for lang in ("terraform", "hcl", "ini", "dockerfile", "make"):
+        assert parser_for(lang) == "line"
+
+
 def test_secret_filename_detection():
     for path in [".env", ".env.local", "secrets.pem", "id_rsa", "config/credentials.json"]:
         assert is_secret_filename(path)
diff --git a/tests/test_rerank.py b/tests/test_rerank.py
index ff4c57a..08fe1c5 100644
--- a/tests/test_rerank.py
+++ b/tests/test_rerank.py
@@ -1,4 +1,4 @@
-from codebase_index.retrieval.rerank import rerank
+from codebase_index.retrieval.rerank import _DEGREE_CAP, rerank
 from codebase_index.retrieval.types import Candidate, Intent
 
 
@@ -24,3 +24,30 @@ def test_reason_string_present():
     sym = _c("b.py", "symbol", 0.5, symbol="X", kind="function", exact_symbol=True, in_degree=4)
     out = rerank([sym], query="find X", intent=Intent.LOCATE_IMPL)
     assert out[0].reason and "exact symbol" in out[0].reason
+
+
+def test_in_degree_bonus_is_sublinear_and_capped():
+    """The graph-centrality bonus grows logarithmically and never exceeds the cap,
+    so 10x the callers is far from 10x the bonus (the old linear rule saturated by
+    in_degree=10 and gave god classes the full bonus)."""
+    scores = []
+    for deg in (1, 10, 100, 1000):
+        c = _c("x.py", "fts", 0.0, in_degree=deg)
+        rerank([c], query="zzz", intent=Intent.KEYWORD)
+        scores.append(c.score)
+    assert scores == sorted(scores)               # monotonic non-decreasing
+    assert scores[-1] <= _DEGREE_CAP + 1e-9       # capped
+    assert scores[2] < 2 * scores[1]              # 100 callers nowhere near 10x of 10
+
+
+def test_god_class_does_not_outrank_relevant_match_on_stray_term():
+    """A high-in_degree 'god class' that matches only a stray term must not float
+    above a genuinely relevant (name/path) match with a slightly lower base score.
+
+    Tuned to fail under the old linear `min(0.10, in_degree*0.01)` rule (god wins
+    0.62 > 0.60) and pass under the dampened rule (relevant wins ~0.613 > ~0.60).
+    """
+    relevant = _c("auth/religion.py", "fts", 0.48, symbol="Religion", in_degree=2)
+    god = _c("core/newtowny.py", "fts", 0.52, symbol="NewTowny", in_degree=200)
+    out = rerank([god, relevant], query="religion", intent=Intent.KEYWORD)
+    assert out[0].path == "auth/religion.py"

From 16a450dcff18dd133b7f5ad0f81b6db47d92210c Mon Sep 17 00:00:00 2001
From: denfry <aseraw115@gmail.com>
Date: Sun, 14 Jun 2026 12:37:42 +0300
Subject: [PATCH 2/3] feat(mcp): schema_version+tool envelope, golden
 snapshots, fix import on mcp>=1.27

Roadmap chunk B (MCP hardening).

- wrap every tool payload (success and error) in {schema_version: 1, tool: <name>};
  closes the docs/MCP.md follow-ups and makes the ARCHITECTURE.md schema_version
  claim true.
- golden snapshots for all 7 tools (tests/golden/mcp_*.json + test_mcp_golden.py);
  schema_version/tool asserted explicitly so a golden can't freeze a wrong version.
- fix: MCP server failed to import on mcp>=1.27 + pydantic>=2.10 (FastMCP built a
  structured-output schema from the `-> str` return annotation and raised). Register
  tools as unstructured (structured_output=False where supported; older mcp detected).
- golden_utils: mask package_version so the healthcheck golden survives version bumps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/ARCHITECTURE.md               |   3 +-
 docs/MCP.md                        |  35 +++++--
 src/codebase_index/mcp/server.py   |  74 ++++++++++-----
 tests/golden/mcp_explain_code.json |  23 +++++
 tests/golden/mcp_find_refs.json    |  34 +++++++
 tests/golden/mcp_find_symbol.json  |  23 +++++
 tests/golden/mcp_healthcheck.json  |  14 +++
 tests/golden/mcp_impact_of.json    |  40 ++++++++
 tests/golden/mcp_index_stats.json  |  23 +++++
 tests/golden/mcp_search_code.json  | 145 +++++++++++++++++++++++++++++
 tests/golden_utils.py              |   5 +
 tests/test_mcp_golden.py           |  94 +++++++++++++++++++
 tests/test_mcp_server.py           |  21 +++++
 13 files changed, 503 insertions(+), 31 deletions(-)
 create mode 100644 tests/golden/mcp_explain_code.json
 create mode 100644 tests/golden/mcp_find_refs.json
 create mode 100644 tests/golden/mcp_find_symbol.json
 create mode 100644 tests/golden/mcp_healthcheck.json
 create mode 100644 tests/golden/mcp_impact_of.json
 create mode 100644 tests/golden/mcp_index_stats.json
 create mode 100644 tests/golden/mcp_search_code.json
 create mode 100644 tests/test_mcp_golden.py

diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index 2215440..28d098a 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -199,7 +199,8 @@ Current implementation:
 - `src/codebase_index/mcp/server.py` is a thin adapter over `retrieval/`, `storage/`, and
   `indexer/freshness.py`.
 - `codebase-index mcp --root <repo>` runs the stdio server.
-- JSON payloads include `schema_version`.
+- Every JSON payload (including the error path) carries a `schema_version` + `tool` envelope,
+  locked by golden snapshots (`tests/golden/mcp_*.json`).
 - [MCP.md](MCP.md) provides config templates for Claude Desktop, Claude Code, Cursor, VS Code,
   Zed, and Windsurf.
 - `healthcheck` lets MCP clients distinguish "server running", "index missing",
diff --git a/docs/MCP.md b/docs/MCP.md
index 4a64a0b..cf708e7 100644
--- a/docs/MCP.md
+++ b/docs/MCP.md
@@ -41,11 +41,14 @@ The MCP server exposes the same retrieval contract as the CLI.
 
 ## Output contract
 
-Tool responses are JSON strings returned through MCP content blocks. The
-intended stable shape for retrieval responses is:
+Tool responses are JSON strings returned through MCP content blocks. **Every**
+payload — success or error — is wrapped in a stable envelope so clients can
+branch on the contract without sniffing the shape:
 
 ```json
 {
+  "schema_version": 1,
+  "tool": "search_code",
   "index": {
     "exists": true,
     "stale": false,
@@ -53,18 +56,30 @@ intended stable shape for retrieval responses is:
     "files_changed_since_build": 0
   },
   "results": [],
-  "recommended_reads": [],
-  "warnings": []
+  "recommended_reads": []
 }
 ```
 
+- `schema_version` (int) — the payload contract version. Bumped only on a
+  breaking change (field removal or type change); additive fields keep the same
+  version. The current version is **1**.
+- `tool` (string) — the emitting tool name (`search_code`, `find_symbol`,
+  `find_refs`, `impact_of`, `explain_code`, `index_stats`, `healthcheck`).
+- The no-index / error path carries the same envelope plus an `"error"` field.
+
 Rules:
 
-- Additive fields are allowed within a tool output version.
-- Field removal or type changes should be treated as a protocol change.
+- Additive fields are allowed within a `schema_version`.
+- Field removal or type changes bump `schema_version`.
 - Tool descriptions should include examples and expected failure modes.
 - Errors should fail closed: no partial unsafe result when config or index state is unsafe.
 
+Every tool's enveloped output is locked by golden snapshots in
+`tests/golden/mcp_*.json` (regenerate intentionally with
+`UPDATE_GOLDEN=1 pytest tests/test_mcp_golden.py`), and the `schema_version` /
+`tool` values are asserted explicitly so a golden can never silently freeze a
+wrong contract version.
+
 ## Client config templates
 
 ### Claude Desktop
@@ -143,8 +158,12 @@ same trust boundaries:
 - Done: `healthcheck`, `search_code`, `find_symbol`, `find_refs`, `impact_of`, `explain_code`,
   and `index_stats` tools.
 - Done: focused tests for tool registration, missing-index behavior, config resolution, and run entrypoint.
-- Follow-up: explicit schema/version field in every structured tool payload.
-- Follow-up: golden snapshots for every tool output.
+- Done: explicit `schema_version` + `tool` envelope on every structured tool payload (including the
+  error path), asserted by `tests/test_mcp_server.py` and `tests/test_mcp_golden.py`.
+- Done: golden snapshots for every tool output (`tests/golden/mcp_*.json`).
+- Done: unstructured-output registration (`structured_output=False` where supported) so the server
+  loads on `mcp>=1.27` + `pydantic>=2.10`, where auto-detecting a structured schema from the `-> str`
+  return annotation otherwise raises at import time.
 - Follow-up: verified client-specific docs for Claude Desktop, Claude Code, Cursor, VS Code, Zed,
   and Windsurf.
 - Follow-up: paging or progressive result support.
diff --git a/src/codebase_index/mcp/server.py b/src/codebase_index/mcp/server.py
index 710917a..ae16c07 100644
--- a/src/codebase_index/mcp/server.py
+++ b/src/codebase_index/mcp/server.py
@@ -17,6 +17,7 @@
 
 from __future__ import annotations
 
+import inspect
 import json
 import os
 import sys
@@ -44,6 +45,35 @@
     ),
 )
 
+# Contract version for every structured tool payload. Bump on a breaking change
+# (field removal / type change); additive fields keep the same version. Every tool
+# return — including errors — is wrapped by `_emit`, so clients can branch on
+# `schema_version` and `tool` without sniffing the shape. See docs/MCP.md.
+MCP_SCHEMA_VERSION = 1
+
+
+def _emit(tool: str, payload: dict) -> str:
+    """Serialize a tool payload inside the stable MCP envelope.
+
+    `schema_version` and `tool` lead; the payload follows. A payload key never
+    shadows the envelope (payloads do not carry these keys), but the explicit
+    order makes the contract self-describing in the raw JSON.
+    """
+    return json.dumps({"schema_version": MCP_SCHEMA_VERSION, "tool": tool, **payload})
+
+
+# Tools return JSON *strings* (unstructured text). Newer FastMCP otherwise
+# auto-builds a structured-output schema from the `-> str` return annotation,
+# which crashes on some mcp/pydantic combinations (mcp>=1.27 + pydantic 2.10).
+# Force unstructured output where the kwarg exists; older mcp (>=1.0) lacks it.
+_SUPPORTS_STRUCTURED_OUTPUT = "structured_output" in inspect.signature(mcp.tool).parameters
+
+
+def _tool():
+    if _SUPPORTS_STRUCTURED_OUTPUT:
+        return mcp.tool(structured_output=False)
+    return mcp.tool()
+
 
 def _resolve_db() -> tuple[Path, "Config"]:
     """Return (db_path, config). Respects CBX_DB_PATH and CBX_ROOT env vars."""
@@ -60,11 +90,11 @@ def _search_backend(cfg: "Config"):
     return search_backend(cfg, warn=lambda m: print(m, file=sys.stderr))
 
 
-def _no_index_error() -> str:
-    return json.dumps({"error": "No index found. Run `codebase-index index` in your project first."})
+def _no_index_payload() -> dict:
+    return {"error": "No index found. Run `codebase-index index` in your project first."}
 
 
-@mcp.tool()
+@_tool()
 def healthcheck() -> str:
     """Report package, root, and index health for MCP clients."""
     db_path, cfg = _resolve_db()
@@ -83,10 +113,10 @@ def healthcheck() -> str:
                 "path": str(db_path),
                 **compute_freshness(db.conn, Path(cfg.root), cfg).model_dump(),
             }
-    return json.dumps(payload)
+    return _emit("healthcheck", payload)
 
 
-@mcp.tool()
+@_tool()
 def search_code(
     query: str,
     mode: str = "hybrid",
@@ -112,7 +142,7 @@ def search_code(
     """
     db_path, cfg = _resolve_db()
     if not db_path.exists():
-        return _no_index_error()
+        return _emit("search_code", _no_index_payload())
 
     from ..service import search_payload
 
@@ -120,10 +150,10 @@ def search_code(
         db_path, cfg, query, mode=mode, limit=limit, offset=offset,
         token_budget=token_budget, no_fallback=False, backend=_search_backend(cfg),
     )
-    return json.dumps(payload)
+    return _emit("search_code", payload)
 
 
-@mcp.tool()
+@_tool()
 def find_symbol(
     name: str,
     kind: Optional[str] = None,
@@ -140,17 +170,17 @@ def find_symbol(
     """
     db_path, _ = _resolve_db()
     if not db_path.exists():
-        return _no_index_error()
+        return _emit("find_symbol", _no_index_payload())
 
     from ..retrieval.searchers import symbol_lookup
     from ..storage.db import Database
 
     with Database(db_path) as db:
         resp = symbol_lookup(db.conn, name, kind=kind, exact=exact)
-    return json.dumps(resp.model_dump())
+    return _emit("find_symbol", resp.model_dump())
 
 
-@mcp.tool()
+@_tool()
 def find_refs(
     symbol: str,
     kind: str = "all",
@@ -165,17 +195,17 @@ def find_refs(
     """
     db_path, _ = _resolve_db()
     if not db_path.exists():
-        return _no_index_error()
+        return _emit("find_refs", _no_index_payload())
 
     from ..retrieval.searchers import refs_lookup
     from ..storage.db import Database
 
     with Database(db_path) as db:
         resp = refs_lookup(db.conn, symbol, kind=kind)
-    return json.dumps(resp.model_dump())
+    return _emit("find_refs", resp.model_dump())
 
 
-@mcp.tool()
+@_tool()
 def impact_of(
     target: str,
     depth: int = 2,
@@ -192,17 +222,17 @@ def impact_of(
     """
     db_path, _ = _resolve_db()
     if not db_path.exists():
-        return _no_index_error()
+        return _emit("impact_of", _no_index_payload())
 
     from ..graph.expand import impact_lookup
     from ..storage.db import Database
 
     with Database(db_path) as db:
         resp = impact_lookup(db.conn, target, depth=depth, direction=direction)
-    return json.dumps(resp.model_dump())
+    return _emit("impact_of", resp.model_dump())
 
 
-@mcp.tool()
+@_tool()
 def explain_code(
     query: str,
     token_budget: int = 2200,
@@ -221,7 +251,7 @@ def explain_code(
     """
     db_path, cfg = _resolve_db()
     if not db_path.exists():
-        return _no_index_error()
+        return _emit("explain_code", _no_index_payload())
 
     from ..service import normalize_explain_query, search_payload
 
@@ -230,22 +260,22 @@ def explain_code(
         offset=offset, token_budget=token_budget, no_fallback=False,
         backend=_search_backend(cfg),
     )
-    return json.dumps(payload)
+    return _emit("explain_code", payload)
 
 
-@mcp.tool()
+@_tool()
 def index_stats() -> str:
     """Return index freshness, file count, symbol count, and per-language coverage."""
     db_path, _ = _resolve_db()
     if not db_path.exists():
-        return json.dumps({"exists": False, "error": "No index found."})
+        return _emit("index_stats", {"exists": False, "error": "No index found."})
 
     from ..service import stats_payload
     from ..storage.db import Database
 
     with Database(db_path) as db:
         payload = stats_payload(db.conn)
-    return json.dumps(payload)
+    return _emit("index_stats", payload)
 
 
 def run() -> None:
diff --git a/tests/golden/mcp_explain_code.json b/tests/golden/mcp_explain_code.json
new file mode 100644
index 0000000..74eebe9
--- /dev/null
+++ b/tests/golden/mcp_explain_code.json
@@ -0,0 +1,23 @@
+{
+  "confidence": "low",
+  "fallback_suggestions": {
+    "ripgrep": [
+      "rg -n \"authentication\"",
+      "rg -n \"how.*does.*authentication\""
+    ]
+  },
+  "index": {
+    "built_at": "<TS>",
+    "exists": true,
+    "files_changed_since_build": 0,
+    "head_commit": "<SHA>",
+    "stale": false
+  },
+  "intent": "how_it_works",
+  "mode": "hybrid",
+  "query": "how does authentication work",
+  "recommended_reads": [],
+  "results": [],
+  "schema_version": 1,
+  "tool": "explain_code"
+}
diff --git a/tests/golden/mcp_find_refs.json b/tests/golden/mcp_find_refs.json
new file mode 100644
index 0000000..abdd727
--- /dev/null
+++ b/tests/golden/mcp_find_refs.json
@@ -0,0 +1,34 @@
+{
+  "coverage": {
+    "languages": [],
+    "partial": false,
+    "reason": null
+  },
+  "index": {
+    "built_at": "<TS>",
+    "exists": true,
+    "files_changed_since_build": 0,
+    "head_commit": "<SHA>",
+    "stale": false
+  },
+  "query": "refresh_access_token",
+  "schema_version": 1,
+  "sites": [
+    {
+      "kind": "call",
+      "line": 11,
+      "path": "src/api/service.py"
+    },
+    {
+      "kind": "definition",
+      "line": 4,
+      "path": "src/auth/token.py"
+    },
+    {
+      "kind": "call",
+      "line": 11,
+      "path": "src/auth/token.py"
+    }
+  ],
+  "tool": "find_refs"
+}
diff --git a/tests/golden/mcp_find_symbol.json b/tests/golden/mcp_find_symbol.json
new file mode 100644
index 0000000..697368d
--- /dev/null
+++ b/tests/golden/mcp_find_symbol.json
@@ -0,0 +1,23 @@
+{
+  "index": {
+    "built_at": "<TS>",
+    "exists": true,
+    "files_changed_since_build": 0,
+    "head_commit": "<SHA>",
+    "stale": false
+  },
+  "query": "User",
+  "schema_version": 1,
+  "symbols": [
+    {
+      "kind": "class",
+      "line_end": 6,
+      "line_start": 4,
+      "name": "User",
+      "path": "src/models/user.py",
+      "qualified": "User",
+      "signature": "class User:"
+    }
+  ],
+  "tool": "find_symbol"
+}
diff --git a/tests/golden/mcp_healthcheck.json b/tests/golden/mcp_healthcheck.json
new file mode 100644
index 0000000..565a111
--- /dev/null
+++ b/tests/golden/mcp_healthcheck.json
@@ -0,0 +1,14 @@
+{
+  "index": {
+    "built_at": "<TS>",
+    "exists": true,
+    "files_changed_since_build": 0,
+    "head_commit": "<SHA>",
+    "path": ".claude/cache/codebase-index/index.sqlite",
+    "stale": false
+  },
+  "package_version": "<VERSION>",
+  "root": "",
+  "schema_version": 1,
+  "tool": "healthcheck"
+}
diff --git a/tests/golden/mcp_impact_of.json b/tests/golden/mcp_impact_of.json
new file mode 100644
index 0000000..5fc14dc
--- /dev/null
+++ b/tests/golden/mcp_impact_of.json
@@ -0,0 +1,40 @@
+{
+  "coverage": {
+    "languages": [],
+    "partial": false,
+    "reason": null
+  },
+  "depth": 2,
+  "direction": "up",
+  "files": [
+    "src/api/service.py"
+  ],
+  "index": {
+    "built_at": "<TS>",
+    "exists": true,
+    "files_changed_since_build": 0,
+    "head_commit": "<SHA>",
+    "stale": false
+  },
+  "nodes": [
+    {
+      "distance": 1,
+      "kind": "file",
+      "line_start": null,
+      "name": null,
+      "path": "src/api/service.py",
+      "via_edge": "import"
+    },
+    {
+      "distance": 1,
+      "kind": "symbol",
+      "line_start": 7,
+      "name": "AdminUser",
+      "path": "src/api/service.py",
+      "via_edge": "extends"
+    }
+  ],
+  "schema_version": 1,
+  "target": "src/models/user.py",
+  "tool": "impact_of"
+}
diff --git a/tests/golden/mcp_index_stats.json b/tests/golden/mcp_index_stats.json
new file mode 100644
index 0000000..d8dcf27
--- /dev/null
+++ b/tests/golden/mcp_index_stats.json
@@ -0,0 +1,23 @@
+{
+  "built_at": "<TS>",
+  "exists": true,
+  "files": 6,
+  "head_commit": "<SHA>",
+  "schema_version": 1,
+  "symbols": 7,
+  "tool": "index_stats",
+  "treesitter_coverage": [
+    {
+      "files": 3,
+      "graph": "full",
+      "lang": "python",
+      "symbols": 6
+    },
+    {
+      "files": 2,
+      "graph": "full",
+      "lang": "typescript",
+      "symbols": 1
+    }
+  ]
+}
diff --git a/tests/golden/mcp_search_code.json b/tests/golden/mcp_search_code.json
new file mode 100644
index 0000000..7f81862
--- /dev/null
+++ b/tests/golden/mcp_search_code.json
@@ -0,0 +1,145 @@
+{
+  "confidence": "high",
+  "fallback_suggestions": {},
+  "index": {
+    "built_at": "<TS>",
+    "exists": true,
+    "files_changed_since_build": 0,
+    "head_commit": "<SHA>",
+    "stale": false
+  },
+  "intent": "keyword",
+  "mode": "hybrid",
+  "query": "token",
+  "recommended_reads": [
+    {
+      "line_end": 6,
+      "line_start": 4,
+      "path": "src/auth/token.py"
+    },
+    {
+      "line_end": 5,
+      "line_start": 4,
+      "path": "src/auth/token.py"
+    },
+    {
+      "line_end": 11,
+      "line_start": 9,
+      "path": "src/auth/token.py"
+    },
+    {
+      "line_end": 2,
+      "line_start": 1,
+      "path": "src/auth/token.py"
+    },
+    {
+      "line_end": 10,
+      "line_start": 9,
+      "path": "src/auth/token.py"
+    },
+    {
+      "line_end": 1,
+      "line_start": 1,
+      "path": "src/auth/token.py"
+    },
+    {
+      "line_end": 5,
+      "line_start": 1,
+      "path": "src/api/service.py"
+    }
+  ],
+  "results": [
+    {
+      "line_end": 6,
+      "line_start": 4,
+      "path": "src/auth/token.py",
+      "rank": 1,
+      "reason": "in src/auth/ · 2 callers",
+      "score": 0.1596,
+      "snippet": "def refresh_access_token(refresh_token: str) -> str:",
+      "symbols": [
+        "refresh_access_token"
+      ],
+      "token_est": 13
+    },
+    {
+      "line_end": 5,
+      "line_start": 4,
+      "path": "src/auth/token.py",
+      "rank": 2,
+      "reason": "in src/auth/",
+      "score": 0.0664,
+      "snippet": "Exchange a refresh token for a new access token.",
+      "symbols": [],
+      "token_est": 12
+    },
+    {
+      "line_end": 11,
+      "line_start": 9,
+      "path": "src/auth/token.py",
+      "rank": 3,
+      "reason": "in src/auth/",
+      "score": 0.0661,
+      "snippet": "def login(refresh_token: str) -> str:\n    \"\"\"Calls refresh_access_token so refs/impact tests have an edge.\"\"\"\n    return refresh_access_token(refresh_token)",
+      "symbols": [],
+      "token_est": 39
+    },
+    {
+      "line_end": 2,
+      "line_start": 1,
+      "path": "src/auth/token.py",
+      "rank": 4,
+      "reason": "in src/auth/",
+      "score": 0.0659,
+      "snippet": "\"\"\"Token helpers (fixture).\"\"\"\n",
+      "symbols": [],
+      "token_est": 8
+    },
+    {
+      "line_end": 10,
+      "line_start": 9,
+      "path": "src/auth/token.py",
+      "rank": 5,
+      "reason": "in src/auth/",
+      "score": 0.0652,
+      "snippet": "Calls refresh_access_token so refs/impact tests have an edge.",
+      "symbols": [],
+      "token_est": 15
+    },
+    {
+      "line_end": 1,
+      "line_start": 1,
+      "path": "src/auth/token.py",
+      "rank": 6,
+      "reason": "in src/auth/",
+      "score": 0.0583,
+      "snippet": null,
+      "symbols": [],
+      "token_est": 0
+    },
+    {
+      "line_end": 11,
+      "line_start": 7,
+      "path": "src/api/service.py",
+      "rank": 7,
+      "reason": "fts",
+      "score": 0.0156,
+      "snippet": "class AdminUser(User):\n    \"\"\"Subclass of User; imported-from edge target for impact tests.\"\"\"\n\n    def renew(self, refresh_token: str) -> str:\n        return refresh_access_token(refresh_token)",
+      "symbols": [],
+      "token_est": 48
+    },
+    {
+      "line_end": 5,
+      "line_start": 1,
+      "path": "src/api/service.py",
+      "rank": 8,
+      "reason": "fts",
+      "score": 0.0154,
+      "snippet": "\"\"\"Service layer (fixture) - exercises cross-file edges for impact tests.\"\"\"\n\nfrom auth.token import refresh_access_token\nfrom models.user import User\n",
+      "symbols": [],
+      "token_est": 38
+    }
+  ],
+  "schema_version": 1,
+  "tool": "search_code"
+}
diff --git a/tests/golden_utils.py b/tests/golden_utils.py
index e8b02a2..ced4803 100644
--- a/tests/golden_utils.py
+++ b/tests/golden_utils.py
@@ -20,6 +20,9 @@
 # Keys whose values are inherently volatile and must be masked, not compared.
 _TS_KEYS = {"built_at", "indexed_at", "generated_at"}
 _SHA_KEYS = {"head_commit"}
+# Released package version churns on every bump; mask it so goldens don't. Note
+# `schema_version` is deliberately NOT masked — it IS the contract under test.
+_VERSION_KEYS = {"package_version"}
 
 
 def _scrub(value: Any, root: str) -> Any:
@@ -30,6 +33,8 @@ def _scrub(value: Any, root: str) -> Any:
                 out[k] = "<TS>"
             elif k in _SHA_KEYS and v is not None:
                 out[k] = "<SHA>"
+            elif k in _VERSION_KEYS and v is not None:
+                out[k] = "<VERSION>"
             else:
                 out[k] = _scrub(v, root)
         return out
diff --git a/tests/test_mcp_golden.py b/tests/test_mcp_golden.py
new file mode 100644
index 0000000..20b4da8
--- /dev/null
+++ b/tests/test_mcp_golden.py
@@ -0,0 +1,94 @@
+"""Golden-snapshot tests for every MCP tool output (requires the mcp extra).
+
+Mirrors tests/test_cli_golden.py but drives the MCP server tool functions, which
+wrap each payload in the stable envelope (schema_version + tool). The MCP tools
+resolve the index from CBX_ROOT / CBX_DB_PATH, so we build a real index from the
+shared sample_repo fixture and point the env at it.
+
+Regenerate intentionally with:  UPDATE_GOLDEN=1 pytest tests/test_mcp_golden.py
+"""
+from __future__ import annotations
+
+import json as _json
+import os
+import subprocess
+from unittest.mock import patch
+
+import pytest
+from typer.testing import CliRunner
+
+try:
+    from codebase_index.mcp import server as mcp_server
+    MCP_AVAILABLE = True
+except ImportError:
+    MCP_AVAILABLE = False
+
+pytestmark = pytest.mark.skipif(not MCP_AVAILABLE, reason="mcp extra not installed")
+
+from codebase_index.cli import app  # noqa: E402  (after the skip guard)
+from tests.golden_utils import assert_matches_golden  # noqa: E402
+
+runner = CliRunner()
+
+
+@pytest.fixture(scope="module")
+def indexed_repo(tmp_path_factory):
+    """A copy of sample_repo with a freshly built index, isolated from the source tree."""
+    import shutil
+
+    from tests.conftest import FIXTURE_ROOT
+
+    dest = tmp_path_factory.mktemp("mcp_indexed") / "repo"
+    shutil.copytree(FIXTURE_ROOT, dest)
+
+    identity = ["-c", "user.name=golden", "-c", "user.email=golden@test"]
+    subprocess.run(["git", "init"], cwd=dest, capture_output=True)
+    subprocess.run(["git", "add", "."], cwd=dest, capture_output=True)
+    commit = subprocess.run(
+        ["git", *identity, "commit", "-m", "initial"], cwd=dest, capture_output=True, text=True
+    )
+    assert commit.returncode == 0, commit.stderr
+    assert runner.invoke(app, ["--root", str(dest), "index"]).exit_code == 0
+    return dest
+
+
+def _call(indexed_repo, tool_fn, **kwargs):
+    """Invoke an MCP tool against the indexed fixture and parse the JSON envelope."""
+    db_path = indexed_repo / ".claude" / "cache" / "codebase-index" / "index.sqlite"
+    env = {"CBX_ROOT": str(indexed_repo), "CBX_DB_PATH": str(db_path)}
+    with patch.dict(os.environ, env, clear=False):
+        return _json.loads(tool_fn(**kwargs))
+
+
+# name -> (tool function, kwargs). Queries mirror the CLI goldens for parity.
+CASES = {
+    "mcp_healthcheck": (lambda: mcp_server.healthcheck, {}),
+    "mcp_search_code": (lambda: mcp_server.search_code, {"query": "token"}),
+    "mcp_find_symbol": (lambda: mcp_server.find_symbol, {"name": "User"}),
+    "mcp_find_refs": (lambda: mcp_server.find_refs, {"symbol": "refresh_access_token"}),
+    "mcp_impact_of": (lambda: mcp_server.impact_of, {"target": "src/models/user.py", "direction": "up"}),
+    "mcp_explain_code": (lambda: mcp_server.explain_code, {"query": "how does authentication work"}),
+    "mcp_index_stats": (lambda: mcp_server.index_stats, {}),
+}
+
+# golden name -> the tool field every envelope must carry.
+_EXPECTED_TOOL = {
+    "mcp_healthcheck": "healthcheck",
+    "mcp_search_code": "search_code",
+    "mcp_find_symbol": "find_symbol",
+    "mcp_find_refs": "find_refs",
+    "mcp_impact_of": "impact_of",
+    "mcp_explain_code": "explain_code",
+    "mcp_index_stats": "index_stats",
+}
+
+
+@pytest.mark.parametrize("name", list(CASES), ids=list(CASES))
+def test_mcp_tool_matches_golden(indexed_repo, name):
+    fn_factory, kwargs = CASES[name]
+    payload = _call(indexed_repo, fn_factory(), **kwargs)
+    # The contract values are asserted explicitly — a golden alone would happily
+    # freeze a wrong schema_version. The golden then guards the rest of the shape.
+    assert payload["schema_version"] == mcp_server.MCP_SCHEMA_VERSION
+    assert payload["tool"] == _EXPECTED_TOOL[name]
+    assert_matches_golden(name, payload, root=str(indexed_repo))
diff --git a/tests/test_mcp_server.py b/tests/test_mcp_server.py
index c76f91b..063e4c1 100644
--- a/tests/test_mcp_server.py
+++ b/tests/test_mcp_server.py
@@ -60,6 +60,27 @@ def test_search_code_no_index():
     assert "index" in result["error"].lower()
 
 
+# ── schema envelope (schema_version + tool on every payload, incl. errors) ─────
+
+_ENVELOPE_CALLS = {
+    "healthcheck": lambda: _call(mcp_server.healthcheck),
+    "search_code": lambda: _call(mcp_server.search_code, query="foo"),
+    "find_symbol": lambda: _call(mcp_server.find_symbol, name="Foo"),
+    "find_refs": lambda: _call(mcp_server.find_refs, symbol="foo"),
+    "impact_of": lambda: _call(mcp_server.impact_of, target="foo.py"),
+    "explain_code": lambda: _call(mcp_server.explain_code, query="how does foo work"),
+    "index_stats": lambda: _call(mcp_server.index_stats),
+}
+
+
+@pytest.mark.parametrize("tool,call", list(_ENVELOPE_CALLS.items()), ids=list(_ENVELOPE_CALLS))
+def test_every_tool_payload_carries_schema_envelope(tool, call):
+    """Even the no-index error path is wrapped in the stable envelope."""
+    result = _with_missing_db(call)
+    assert result["schema_version"] == mcp_server.MCP_SCHEMA_VERSION
+    assert result["tool"] == tool
+
+
 def test_healthcheck_no_index():
     result = _with_missing_db(lambda: _call(mcp_server.healthcheck))
     assert result["package_version"]

From 44e03d4a987ec7eb1f64304291cebc91b4859892 Mon Sep 17 00:00:00 2001
From: denfry <aseraw115@gmail.com>
Date: Sun, 14 Jun 2026 12:37:54 +0300
Subject: [PATCH 3/3] docs: sync roadmap with shipped MCP, add trust-model
 callout, changelog

Roadmap chunk A (roadmap sync + docs).

- docs/ROADMAP.md: M10 MCP bridge marked shipped (was "planned"); reconcile the
  technical vs product milestone numbering instead of claiming one is canonical.
- "Trust model in 60 seconds" callout, identical in README.md and docs/SECURITY.md.
- PRODUCT_UPGRADE_PLAN.md: mark shipped items (schema_version, golden snapshots,
  in_degree dampening, config/IaC labeling, trust-model doc).
- CHANGELOG.md: Unreleased entries for chunks A/B/C and the MCP import fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                 | 35 +++++++++++++++++++++++++++++++++++
 README.md                    |  8 ++++++++
 docs/PRODUCT_UPGRADE_PLAN.md | 32 ++++++++++++++++++++------------
 docs/ROADMAP.md              | 23 ++++++++++++++++-------
 docs/SECURITY.md             | 10 ++++++++++
 5 files changed, 89 insertions(+), 19 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 58a6aa9..54c9e6e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -18,8 +18,37 @@ All notable changes to this project are documented here. The format is based on
 - **`docs/RELEASE_CHECKLIST.md`**: a repeatable release checklist (version sync,
   tests, benchmarks, doctor, install/plugin/MCP smoke, changelog) with signed
   checksums + SBOM tracked as future hardening.
+- **MCP contract hardening (M11.5)**: every MCP tool payload — success *and* the
+  no-index/error path — is now wrapped in a stable envelope (`schema_version`: 1,
+  `tool`: <name>). Golden snapshots lock every tool's output
+  (`tests/golden/mcp_*.json` via `tests/test_mcp_golden.py`), and the contract
+  values are asserted explicitly so a golden can't freeze a wrong version. Closes
+  the long-standing `docs/MCP.md` follow-ups and makes the `schema_version` claim
+  in `docs/ARCHITECTURE.md` §8 true.
+- **Config / IaC language labeling**: Dockerfile, Containerfile, `*.tf`/`*.tfvars`
+  (terraform), `*.hcl`, `*.ini`/`*.cfg`/`*.conf`/`*.properties` (ini), and
+  Makefiles now get a real language label. These files were already FTS-indexed as
+  unknown text; labeling surfaces infra files in `stats` and lets agents scope
+  searches to config. They stay on the line/FTS floor (no tree-sitter spec).
+- **Typed framework edges — design doc**
+  (`docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md`): the
+  documented-first deliverable for the M13 code-intelligence graph
+  (route→handler→service→model, test→impl, config→consumer, …) with a schema,
+  confidence/provenance model, resolver architecture, and a benchmark gate.
+- **"Trust model in 60 seconds"** callout, identical in `README.md` and
+  `docs/SECURITY.md`.
 
 ### Changed
+- **Reranker: dampened the god-class `in_degree` tiebreak** (`retrieval/rerank.py`).
+  The graph-centrality bonus is now logarithmic with a lower cap instead of linear
+  (which saturated by in_degree 10, giving 100-caller "god classes" the full bonus
+  and floating them above genuinely relevant low-degree matches on stray-term ties).
+  Validated as no-regression on the public benchmark (Recall@k / MRR / nDCG
+  unchanged) with a targeted regression test; the real-repo gain on the honest Java
+  misses is tracked under M12.5. CLI/MCP `search` goldens regenerated accordingly.
+- **`docs/ROADMAP.md`**: M10 MCP bridge marked shipped (was "planned"); reconciled
+  the technical-vs-product milestone numbering instead of claiming one is canonical.
+
 - **README**: added "Who Is It For?" and a "How Is This Different?" section that
   answers why-not-grep / Cursor / Aider repo-map / Sourcegraph / Codebase-Memory
   MCP on the first screen, plus a proven-today-vs-roadmap table.
@@ -30,6 +59,12 @@ All notable changes to this project are documented here. The format is based on
   TODO-friendly benchmark task checklist with a no-overclaim procedure.
 
 ### Fixed
+- **MCP server failed to import on `mcp>=1.27` + `pydantic>=2.10`**: newer FastMCP
+  auto-built a structured-output schema from each tool's `-> str` return annotation
+  and raised `PydanticUserError` at import time, breaking the server and its test
+  suite. Tools now register as unstructured (`structured_output=False` where the
+  kwarg exists; older `mcp` is detected and unaffected), preserving the existing
+  text-content wire contract.
 - `docs/FAQ.md`: removed a dangling/duplicated sentence in "Is it
   production-ready?" and documented the real `clean` / `clean --all` behavior.
 
diff --git a/README.md b/README.md
index 6b01fbd..3f8a346 100644
--- a/README.md
+++ b/README.md
@@ -429,6 +429,14 @@ Answer with precise file:line citations
 
 ## Safety and Privacy
 
+> **Trust model in 60 seconds**
+> 1. **Offline by default** — the base install has zero network dependencies; nothing leaves your machine.
+> 2. **One opt-in exit, triple-gated** — external embeddings require `allow_external` **and** an env API key **and** a printed endpoint warning, or they are refused.
+> 3. **Secrets never get in** — `.env`, keys, certs, and credential files are excluded before parsing (multi-gate ignore pipeline).
+> 4. **Secrets never get out** — every snippet is redacted (AWS keys, private keys, JWTs, bearer tokens, connection strings) before it reaches the agent.
+> 5. **No telemetry, ever** — no analytics, no phone-home, no usage data.
+> 6. **Verify it yourself** — `codebase-index doctor --strict` audits all of the above and exits non-zero in CI on any high-severity finding.
+
 `codebase-index` is designed with privacy as a first principle:
 
 - **No telemetry** — No usage data, analytics, or crash reports are collected or transmitted.
diff --git a/docs/PRODUCT_UPGRADE_PLAN.md b/docs/PRODUCT_UPGRADE_PLAN.md
index 16be998..cab7433 100644
--- a/docs/PRODUCT_UPGRADE_PLAN.md
+++ b/docs/PRODUCT_UPGRADE_PLAN.md
@@ -89,7 +89,7 @@ transparent Python implementation, a strict privacy model, and honest benchmarks
 | Weakness | Impact | Plan |
 |---|---|---|
 | No large-scale real-repo benchmark | Can't claim 100k/1M LOC quality | Benchmark tasks §8; recruit public repos |
-| Graph is import/call/ref only | `impact` misses framework wiring | ARCHITECTURE §9 typed-edge roadmap |
+| Graph is import/call/ref only | `impact` misses framework wiring | ARCHITECTURE §9 + design doc `specs/2026-06-14-typed-framework-edges-design.md`; implementation behind §8 benchmark |
 | GitHub-only distribution | No `pip install codebase-index` / `uvx` | Distribution tasks §9 |
 | MCP client docs unverified | Templates may be wrong per client version | Verify against each client, add per-client docs |
 | Single-repo only | No monorepo/fleet context | Out of scope near-term; documented as non-goal |
@@ -101,12 +101,15 @@ transparent Python implementation, a strict privacy model, and honest benchmarks
    logs. Highest credibility lever.
 2. **Typed framework edges** (route→handler→service→model, test→impl, config→consumer)
    with source spans + confidence. Biggest product-quality lever for `impact`.
+   *Design approved this pass* (`specs/2026-06-14-typed-framework-edges-design.md`);
+   implementation gated on the §8 graph benchmark.
 3. **Distribution hardening**: PyPI publish, `uvx`/`pipx` story, signed checksums,
    SBOM. Lowers adoption friction and raises supply-chain trust.
-4. **MCP contract hardening**: `schema_version` on every payload, golden
-   snapshots per tool, verified client docs, paging/progressive results.
-5. **Retrieval tuning**: dampen the god-class `in_degree` tiebreak (the 3 honest
-   misses in the Java run), per-intent weights review.
+4. **MCP contract hardening**: ✅ `schema_version` on every payload + golden
+   snapshots per tool (this pass). Remaining: verified client docs, paging/progressive results.
+5. **Retrieval tuning**: ✅ dampened the god-class `in_degree` tiebreak this pass
+   (log curve + lower cap, validated no-regression on the public suite). Remaining:
+   confirm the real-repo gain on the 3 honest Java misses (needs M12.5), per-intent weights review.
 6. **Language reach**: config/IaC awareness (Dockerfile, Terraform, migrations,
    CI), plus Swift/Dart/Scala/Vue/Svelte gaps called out in FAQ.
 
@@ -119,7 +122,7 @@ transparent Python implementation, a strict privacy model, and honest benchmarks
 - [x] `docs/BENCHMARKS.md` "claims not to make yet" + TODO benchmark checklist.
 - [x] `docs/RELEASE_CHECKLIST.md`.
 - [ ] Verified per-client MCP setup docs (after testing each client version).
-- [ ] A short "trust model in 60 seconds" callout reused across README/SECURITY.
+- [x] A short "trust model in 60 seconds" callout reused across README/SECURITY.
 
 ## 8. Benchmark tasks
 
@@ -150,14 +153,19 @@ Track in [BENCHMARKS.md](BENCHMARKS.md); none may be reported until run with log
 
 | # | Improvement | Impact | Risk | Status |
 |---|---|---|---|---|
-| 1 | Implement `clean` (documented but was a stub) | Fixes doc/reality gap | Low | **Shipped this pass** |
-| 2 | Dampen god-class `in_degree` tiebreak in rerank | +recall on real repos | Medium (retune) | Planned |
-| 3 | `schema_version` on every MCP payload | Stable contract | Low | Partly (architecture claims it) — verify+test |
-| 4 | Golden snapshots for each MCP tool output | Regression safety | Low | Planned |
-| 5 | Typed framework edges in the graph | Better `impact` | High | Roadmap (ARCHITECTURE §9) |
-| 6 | Config/IaC parsers (Dockerfile, Terraform, migrations) | Coverage | Medium | Roadmap |
+| 1 | Implement `clean` (documented but was a stub) | Fixes doc/reality gap | Low | **Shipped (1.3.0 line)** |
+| 2 | Dampen god-class `in_degree` tiebreak in rerank | +recall on real repos | Medium (retune) | **Shipped this pass** — log dampening + lower cap; no-regression on the public suite + a targeted regression test. Real-repo gain still needs M12.5. |
+| 3 | `schema_version` on every MCP payload | Stable contract | Low | **Shipped this pass** — `schema_version` + `tool` envelope on every payload (incl. errors), asserted + golden-locked. |
+| 4 | Golden snapshots for each MCP tool output | Regression safety | Low | **Shipped this pass** — `tests/golden/mcp_*.json` via `tests/test_mcp_golden.py`. |
+| 5 | Typed framework edges in the graph | Better `impact` | High | Design doc shipped this pass (`docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md`); implementation behind the §8 benchmark. |
+| 6 | Config/IaC parsers (Dockerfile, Terraform, migrations) | Coverage | Medium | **Partly shipped this pass** — Tier-C labeling for Dockerfile/Terraform/HCL/INI/Make (already FTS-indexed, now language-labeled); tree-sitter parsing of these still roadmap. |
 | 7 | Paging/progressive MCP results | Big-repo UX | Medium | Roadmap (MCP.md) |
 
+Also fixed this pass (not previously tracked): the MCP server failed to import on
+`mcp>=1.27` + `pydantic>=2.10` (FastMCP auto-built a structured-output schema from
+the `-> str` return annotation and raised). Tools now register as unstructured
+(`structured_output=False` where supported), so the server loads on current `mcp`.
+
 Rule for this repo: small, safe, tested changes land directly; anything that
 risks destabilizing retrieval quality or the security model is documented here
 first and lands behind a benchmark.
diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index 6015393..2001288 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -1,8 +1,13 @@
 # Roadmap & First Implementation Tasks
 
 Milestones are vertical-ish slices: each ends with something runnable and testable.
-This numbering is canonical — the product-level [ROADMAP.md](../ROADMAP.md) and the
-`(Mx)` tags in [CHANGELOG.md](../CHANGELOG.md) follow it.
+This is the **technical-milestone** view (M0–M10). The product-level
+[ROADMAP.md](../ROADMAP.md) tells the same story at a finer grain and carries it
+further (it splits the MCP server into M11 and adds M11.5/M12/M12.5/M13 for MCP
+hardening, benchmarks, and the typed-edge graph). Where the two disagree on a
+number, the product roadmap is the current product view; this file tracks the
+original implementation slices. The `(Mx)` tags in
+[CHANGELOG.md](../CHANGELOG.md) follow this technical numbering.
 
 ## M0 — Architecture & scaffold ✅ (this repo)
 - Repo tree, docs (ARCHITECTURE/RETRIEVAL/SCHEMA/SECURITY/INSTALLATION), SKILL.md draft.
@@ -77,11 +82,15 @@ release with the built artifacts (GitHub-only distribution — no PyPI publish).
 "git+https://github.com/denfry/codebase-index.git@v1.2.0"` -> `init` -> `index` -> ask a question is
 verified end-to-end by `scripts/release_smoke.py`.*
 
-## M10 — Optional MCP bridge (planned)
-- Model Context Protocol server exposing `search`, `symbol`, `refs`, `impact` as tools for
-  MCP-compatible clients (Claude Desktop, Cursor, etc.). An optional addition, not a replacement
-  for the Skill/CLI interface.
-- **Exit:** `codebase-index` can be used as an MCP tool by any MCP-compatible client.
+## M10 — MCP bridge ✅ (product roadmap M11)
+- Shipped: a stdio Model Context Protocol server (`codebase-index mcp --root <repo>`, or the
+  `codebase-index-mcp` entry point) exposing `healthcheck`, `search_code`, `find_symbol`,
+  `find_refs`, `impact_of`, `explain_code`, and `index_stats` over the same `service.py` layer the
+  CLI uses — an optional addition, not a replacement for the Skill/CLI interface. Every payload
+  carries a `schema_version` + `tool` envelope, locked by golden snapshots (`tests/golden/mcp_*.json`).
+- **Exit:** `codebase-index` can be used as an MCP tool by any MCP-compatible client. See
+  [MCP.md](MCP.md).
+- Follow-up (product roadmap M11.5): verified per-client setup docs and paging/progressive results.
 
 ---
 
diff --git a/docs/SECURITY.md b/docs/SECURITY.md
index dbb5762..be4c959 100644
--- a/docs/SECURITY.md
+++ b/docs/SECURITY.md
@@ -3,6 +3,16 @@
 `codebase-index` is **local-first and offline by default**. Its threat model assumes the indexed
 repository may contain secrets and that a skill must not exfiltrate code or run dangerous commands.
 
+> **Trust model in 60 seconds**
+> 1. **Offline by default** — the base install has zero network dependencies; nothing leaves your machine (§1, §4).
+> 2. **One opt-in exit, triple-gated** — external embeddings require `allow_external` **and** an env API key **and** a printed endpoint warning, or they are refused (§4).
+> 3. **Secrets never get in** — `.env`, keys, certs, and credential files are excluded before parsing (§2).
+> 4. **Secrets never get out** — every snippet is redacted before it reaches the agent (§3).
+> 5. **No telemetry, ever** — no analytics, no phone-home, no usage data.
+> 6. **Verify it yourself** — `codebase-index doctor --strict` audits all of the above and gates CI (§6).
+>
+> The same callout appears in the README so the trust story is identical wherever a reader lands.
+
 ## 1. Principles
 
 1. **Local-first** — index, query, and storage all happen on the user's machine.