From 05b557a4dffbeb7134dad55801ac35d7b4a7ee74 Mon Sep 17 00:00:00 2001 From: denfry Date: Sun, 14 Jun 2026 12:37:25 +0300 Subject: [PATCH 1/3] feat(retrieval,discovery): dampen in_degree tiebreak + label config/IaC Roadmap chunk C (large features, landed behind the benchmark gate). - rerank: replace the linear in_degree bonus (saturated by in_degree=10, gave 100-caller "god classes" the full bonus) with a logarithmic curve + lower cap, so centrality stays a tiebreak instead of floating god classes above relevant low-degree matches. Validated as no-regression on the public benchmark (Recall@k/MRR/nDCG unchanged) plus a targeted regression test. - discovery/classify: label Dockerfile/Containerfile, Terraform (.tf/.tfvars), HCL, INI (.ini/.cfg/.conf/.properties) and Makefiles (Tier-C, FTS-only). These were already FTS-indexed as unknown text; labeling surfaces them in stats and lets agents scope searches to config. - docs: typed-framework-edges design spec (M13 documented-first deliverable); LANGUAGES.md Tier-C row updated. - regenerate tests/golden/search_token.json (one score shifted; order unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/LANGUAGES.md | 8 +- ...2026-06-14-typed-framework-edges-design.md | 136 ++++++++++++++++++ src/codebase_index/discovery/classify.py | 34 ++++- src/codebase_index/retrieval/rerank.py | 14 +- tests/golden/search_token.json | 2 +- tests/test_classify.py | 26 ++++ tests/test_rerank.py | 29 +++- 7 files changed, 242 insertions(+), 7 deletions(-) create mode 100644 docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md diff --git a/docs/LANGUAGES.md b/docs/LANGUAGES.md index e8cd5e4..cb1a7c2 100644 --- a/docs/LANGUAGES.md +++ b/docs/LANGUAGES.md @@ -6,7 +6,7 @@ |---|---|---| | Tier A | Language-specific Tree-sitter `LangSpec` with definition, call, and import/inheritance patterns | Python, JavaScript, TypeScript, Java, Go, Rust, C, C++, C#, Ruby, PHP, Kotlin | | Tier B | Generic Tree-sitter path when a loadable grammar exists, without language-specific graph semantics | Lua | -| Tier C | Line chunks + FTS5 lexical search only | Markdown, JSON, YAML, TOML, SQL and other text/config files | +| Tier C | Line chunks + FTS5 lexical search only | Markdown, JSON, YAML, TOML, SQL; config/IaC: Dockerfile, Terraform (`.tf`/`.tfvars`), HCL, INI (`.ini`/`.cfg`/`.conf`/`.properties`), Makefiles; and other text/config files | Tier A is the only tier that should be advertised as symbol-aware. Tier B can surface useful definitions, but it is intentionally weaker and should be called @@ -45,7 +45,11 @@ High-priority code languages: - Objective-C - Vue and Svelte component structure -High-priority non-code and framework-aware extraction: +High-priority non-code and framework-aware extraction (config/IaC files are now +**Tier-C labeled** — indexed, language-tagged, and FTS-searchable; the items below +are the deeper *structured* extraction still on the roadmap, and the framework +graph part is designed in +`docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md`): - SQL schema-aware parsing: tables, columns, migrations, model/query consumers - Terraform/HCL: resources, modules, variables, outputs diff --git a/docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md b/docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md new file mode 100644 index 0000000..9f0e086 --- /dev/null +++ b/docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md @@ -0,0 +1,136 @@ +# Typed Framework Edges — Design + +> **Status:** Design draft (2026-06-14). Implementation NOT started. +> **Author:** denfry +> **Milestone:** Product roadmap **M13 — Code intelligence graph** (extends ARCHITECTURE.md §9). +> **Why a design doc first:** `PRODUCT_UPGRADE_PLAN.md` §10 marks typed edges *High risk* and the +> repo rule is "anything that risks destabilizing retrieval quality or the security model is +> documented here first and lands behind a benchmark." This is that document. + +## 1. Problem + +The graph today has four edge types — `import`, `call`, `reference`, `extends`/`implements` +(see `storage/schema.sql` `edges.edge_type`). They power `refs`, bounded `impact`, and the rerank +centrality bonus. They do **not** capture how modern frameworks actually wire code together, so +`impact "routes/users.ts"` misses the handler→service→repository→model chain a human would trace. + +ARCHITECTURE.md §9 lists the target typed edges: + +- HTTP route → handler → service → repository → model +- test → fixture → implementation +- interface/trait → implementation (partially covered today by `implements`) +- config key → consumer +- migration → model → query +- event producer → event consumer +- DI container / framework wiring +- frontend component → hook/store/api client +- error string / log message → throw site → handler + +## 2. Why the current edge mechanism is not enough + +The existing capture-prefix mechanism (`treesitter._EDGE_PREFIXES`, e.g. `@import.module`) emits an +edge whose target is an **identifier captured from the AST**, resolved later by symbol name or +module path (`graph/builder.resolve_edges`). That works for imports and inheritance because the +target *is* a named symbol/module in the same repo. + +Framework edges break this assumption in two ways: + +1. **The link is a string literal, not a symbol.** `@app.get("/users/{id}")` ties a URL pattern to a + handler function. There is no `"/users/{id}"` symbol to resolve to. The edge is really + *"this function is the handler for this route"* — an attribute of the handler plus a + route-string key that only matches another route-string elsewhere (e.g. a client `fetch`). +2. **Resolution is heuristic and framework-specific.** "service" is a naming/DI convention, not a + language construct. Precision varies by framework, so edges need **confidence** and + **provenance** so agents can treat a Spring `@Autowired` edge differently from a guessed + `*Service` name match. + +A naive "add a `route` prefix" would emit unresolvable edges that pollute `impact`. We need a new, +explicitly-typed, confidence-bearing edge path. + +## 3. Schema changes + +Extend `edges` (additive — bumps `storage/db.SCHEMA_VERSION`, with a migration that backfills +defaults so old indexes rebuild rather than guess): + +```sql +ALTER TABLE edges ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0; -- 0..1; 1.0 = exact AST/import +ALTER TABLE edges ADD COLUMN resolver TEXT; -- provenance, e.g. 'fastapi.decorator' +ALTER TABLE edges ADD COLUMN dst_key TEXT; -- non-symbol join key (route string, event name, config key) +CREATE INDEX idx_edges_dstkey ON edges(dst_key); +``` + +New `edge_type` values (open-ended TEXT, no enum migration needed): `route`, `handler`, +`test_target`, `config_consumer`, `migration_model`, `event`, `di_wire`, `component_dep`, +`log_site`. Existing four edge types keep `confidence = 1.0` and `resolver = NULL`, so all current +behavior is byte-for-byte unchanged. + +`dst_key` is the join column for string-keyed edges: a `route` edge from a handler and a +`component_dep` edge from a client both carry `dst_key = "GET /users/{id}"`; the builder pairs +producers and consumers by `dst_key` instead of by symbol name. + +## 4. Resolver architecture + +A new `parsers/frameworks/` package, each module a `FrameworkResolver`: + +```python +class FrameworkResolver(Protocol): + name: str # provenance string, e.g. "fastapi" + def detects(self, file: FileMeta, imports: list[str]) -> bool: ... + def edges(self, tree, symbols, source) -> list[TypedEdge]: ... # carries confidence + resolver +``` + +Detection is import-gated (only run the FastAPI resolver on files importing `fastapi`/`starlette`), +so cost is proportional to relevant files and an unrecognized stack adds nothing. First resolvers, +chosen for coverage-per-effort: + +| Resolver | Edge(s) | Confidence basis | +|---|---|---| +| `fastapi` / `flask` | `route` (decorator → handler) | 1.0 (explicit decorator) | +| `express` | `route` (`app.get(path, handler)`) | 0.9 (handler ref may be inline) | +| `pytest` | `test_target` (test → impl by import + name) | 0.7 (name heuristic) | +| `spring` | `di_wire` (`@Autowired`/constructor) | 0.95 | + +Each resolver is independently testable against a fixture file and contributes a labeled row to the +graph benchmark (§6). New frameworks are added without touching the core — same spirit as the +Tier-A `LangSpec` registry. + +## 5. Surfacing & honesty + +- `impact` / `refs` responses gain a per-edge `confidence` + `resolver` and group results as + **precise** (≥0.9) vs **heuristic** (<0.9), mirroring the existing `GraphCoverage.partial` + honesty signal — agents trust precise edges and treat heuristic ones as leads, not proof. +- `stats` reports which framework resolvers fired and how many typed edges each produced. +- `doctor` notes when typed edges exist but no resolver matched the repo's stack (so a missing + resolver is visible, not silent). +- Rerank: typed edges are **excluded** from the `in_degree` centrality bonus initially (they would + re-introduce the god-class skew this release just dampened); revisit only behind the benchmark. + +## 6. Benchmark gate (required before merge) + +`tests/benchmark_public.py` already has a `graph_tasks` section (`route→handler→service` is an +explicit TODO in `PRODUCT_UPGRADE_PLAN.md` §8). Before any resolver lands: + +1. Add hand-labeled framework-graph cases (route→handler→service→model paths) to the public fixture + and to a real multi-framework repo case. +2. Report `graph_tasks.pass_rate` **before/after**, plus retrieval `recall@k`/`MRR`/`nDCG` to prove + no retrieval regression (the gate this release's rerank change passed). +3. Publish raw logs next to the headline number (the §8 "no-overclaim procedure"). + +A resolver merges only if it raises graph pass-rate **without** lowering retrieval metrics. + +## 7. Phasing + +- **Phase 1** — schema columns + migration + the `route`/`handler` pair for one Python framework + (FastAPI), `dst_key` pairing, confidence/provenance plumbing, benchmark cases. Smallest + end-to-end vertical slice. +- **Phase 2** — Express + Flask routes; `test_target`; surface precise/heuristic split in + `impact`/`refs`. +- **Phase 3** — `config_consumer`, `migration_model`, `event`, `di_wire`; per-resolver `stats`. +- **Phase 4** — frontend `component_dep`, `log_site`; rerank integration (behind the benchmark). + +## 8. Non-goals + +- Not a type checker or a full call-graph resolver across dynamic dispatch. +- Not cross-repo / monorepo graph (single-repo remains the product boundary). +- No network or LLM-assisted resolution — resolvers stay static, local, and deterministic so the + privacy model (SECURITY.md) is untouched. diff --git a/src/codebase_index/discovery/classify.py b/src/codebase_index/discovery/classify.py index a075281..f41be35 100644 --- a/src/codebase_index/discovery/classify.py +++ b/src/codebase_index/discovery/classify.py @@ -36,6 +36,26 @@ ".yaml": "yaml", ".toml": "toml", ".sql": "sql", + # Config / IaC (Tier C: line-chunk + FTS, no tree-sitter spec). These were already + # indexed as unknown-language text; labeling them surfaces infra files in `stats` + # and lets agents scope searches to config without a tree-sitter grammar. + ".tf": "terraform", + ".tfvars": "terraform", + ".hcl": "hcl", + ".ini": "ini", + ".cfg": "ini", + ".conf": "ini", + ".properties": "ini", +} + +# Extension-less or specially-named config/IaC files, matched on the lowercased +# filename (and a `name.suffix` form, e.g. `web.Dockerfile`). Kept separate from +# the suffix table because these carry their identity in the name, not the suffix. +_LANG_BY_NAME = { + "dockerfile": "dockerfile", + "containerfile": "dockerfile", + "makefile": "make", + "gnumakefile": "make", } # Authoritative set of *code* languages routed to tree-sitter (Guardrail 1). Every entry MUST @@ -74,7 +94,19 @@ def detect_language(path: str) -> Optional[str]: - return _LANG_BY_SUFFIX.get(PurePosixPath(path).suffix.lower()) + pure = PurePosixPath(path) + suffix = pure.suffix.lower() + if suffix: + lang = _LANG_BY_SUFFIX.get(suffix) + if lang is not None: + return lang + name = pure.name.lower() + if name in _LANG_BY_NAME: + return _LANG_BY_NAME[name] + # `web.Dockerfile`, `base.dockerfile`, etc.: identity is the suffix-as-name. + if suffix and suffix[1:] in _LANG_BY_NAME: + return _LANG_BY_NAME[suffix[1:]] + return None def parser_for(lang: Optional[str]) -> str: diff --git a/src/codebase_index/retrieval/rerank.py b/src/codebase_index/retrieval/rerank.py index 7a63746..551c485 100644 --- a/src/codebase_index/retrieval/rerank.py +++ b/src/codebase_index/retrieval/rerank.py @@ -7,12 +7,22 @@ from __future__ import annotations +import math import re from .types import Candidate, Intent _TERM_RE = re.compile(r"[A-Za-z0-9_]+") +# Graph-centrality bonus. Logarithmic (not linear) so a "god class" with hundreds +# of callers cannot dominate a genuinely relevant low-degree match on a stray-term +# tie. log1p compresses the tail — in_degree 4 → 10 → 100 yields a gently rising, +# capped bonus instead of saturating the cap by in_degree 10 — and the lower cap +# keeps centrality a tiebreak rather than an override. This dampens the god-class +# over-ranking documented in tests/benchmark_honest_RESULTS.md. +_DEGREE_SCALE = 0.03 +_DEGREE_CAP = 0.08 + def rerank(candidates: list[Candidate], *, query: str, intent: Intent) -> list[Candidate]: terms = {t.lower() for t in _TERM_RE.findall(query)} @@ -33,10 +43,10 @@ def rerank(candidates: list[Candidate], *, query: str, intent: Intent) -> list[C reasons.append(f"in {c.path.rsplit('/', 1)[0] or '.'}/") if c.in_degree: - bonus += min(0.10, c.in_degree * 0.01) + bonus += min(_DEGREE_CAP, math.log1p(c.in_degree) * _DEGREE_SCALE) reasons.append(f"{c.in_degree} callers") if intent is Intent.ARCHITECTURE and (c.in_degree + c.out_degree): - bonus += min(0.10, (c.in_degree + c.out_degree) * 0.005) + bonus += min(_DEGREE_CAP, math.log1p(c.in_degree + c.out_degree) * (_DEGREE_SCALE / 2)) wants_tests = "test" in terms or "tests" in terms if c.is_generated or (("test" in c.path.lower()) and not wants_tests): diff --git a/tests/golden/search_token.json b/tests/golden/search_token.json index 3822398..732f6c4 100644 --- a/tests/golden/search_token.json +++ b/tests/golden/search_token.json @@ -55,7 +55,7 @@ "path": "src/auth/token.py", "rank": 1, "reason": "in src/auth/ · 2 callers", - "score": 0.1467, + "score": 0.1596, "snippet": "def refresh_access_token(refresh_token: str) -> str:", "symbols": [ "refresh_access_token" diff --git a/tests/test_classify.py b/tests/test_classify.py index 3ce76f0..87aa43c 100644 --- a/tests/test_classify.py +++ b/tests/test_classify.py @@ -25,6 +25,32 @@ def test_parser_for_tree_sitter_languages(): assert parser_for(None) == "line" +def test_detect_config_and_iac_languages(): + assert detect_language("infra/main.tf") == "terraform" + assert detect_language("infra/prod.tfvars") == "terraform" + assert detect_language("infra/policy.hcl") == "hcl" + assert detect_language("setup.cfg") == "ini" + assert detect_language("app/settings.ini") == "ini" + assert detect_language("app.conf") == "ini" + assert detect_language("gradle.properties") == "ini" + + +def test_detect_language_by_filename(): + # Dockerfile/Makefile carry identity in the name, not the suffix. + assert detect_language("Dockerfile") == "dockerfile" + assert detect_language("docker/Dockerfile") == "dockerfile" + assert detect_language("services/web.Dockerfile") == "dockerfile" + assert detect_language("Containerfile") == "dockerfile" + assert detect_language("Makefile") == "make" + assert detect_language("GNUmakefile") == "make" + + +def test_config_and_iac_languages_stay_on_line_parser(): + # Tier C: labeled, but FTS-only — never routed to a (missing) tree-sitter spec. + for lang in ("terraform", "hcl", "ini", "dockerfile", "make"): + assert parser_for(lang) == "line" + + def test_secret_filename_detection(): for path in [".env", ".env.local", "secrets.pem", "id_rsa", "config/credentials.json"]: assert is_secret_filename(path) diff --git a/tests/test_rerank.py b/tests/test_rerank.py index ff4c57a..08fe1c5 100644 --- a/tests/test_rerank.py +++ b/tests/test_rerank.py @@ -1,4 +1,4 @@ -from codebase_index.retrieval.rerank import rerank +from codebase_index.retrieval.rerank import _DEGREE_CAP, rerank from codebase_index.retrieval.types import Candidate, Intent @@ -24,3 +24,30 @@ def test_reason_string_present(): sym = _c("b.py", "symbol", 0.5, symbol="X", kind="function", exact_symbol=True, in_degree=4) out = rerank([sym], query="find X", intent=Intent.LOCATE_IMPL) assert out[0].reason and "exact symbol" in out[0].reason + + +def test_in_degree_bonus_is_sublinear_and_capped(): + """The graph-centrality bonus grows logarithmically and never exceeds the cap, + so 10x the callers is far from 10x the bonus (the old linear rule saturated by + in_degree=10 and gave god classes the full bonus).""" + scores = [] + for deg in (1, 10, 100, 1000): + c = _c("x.py", "fts", 0.0, in_degree=deg) + rerank([c], query="zzz", intent=Intent.KEYWORD) + scores.append(c.score) + assert scores == sorted(scores) # monotonic non-decreasing + assert scores[-1] <= _DEGREE_CAP + 1e-9 # capped + assert scores[2] < 2 * scores[1] # 100 callers nowhere near 10x of 10 + + +def test_god_class_does_not_outrank_relevant_match_on_stray_term(): + """A high-in_degree 'god class' that matches only a stray term must not float + above a genuinely relevant (name/path) match with a slightly lower base score. + + Tuned to fail under the old linear `min(0.10, in_degree*0.01)` rule (god wins + 0.62 > 0.60) and pass under the dampened rule (relevant wins ~0.613 > ~0.60). + """ + relevant = _c("auth/religion.py", "fts", 0.48, symbol="Religion", in_degree=2) + god = _c("core/newtowny.py", "fts", 0.52, symbol="NewTowny", in_degree=200) + out = rerank([god, relevant], query="religion", intent=Intent.KEYWORD) + assert out[0].path == "auth/religion.py" From 16a450dcff18dd133b7f5ad0f81b6db47d92210c Mon Sep 17 00:00:00 2001 From: denfry Date: Sun, 14 Jun 2026 12:37:42 +0300 Subject: [PATCH 2/3] feat(mcp): schema_version+tool envelope, golden snapshots, fix import on mcp>=1.27 Roadmap chunk B (MCP hardening). - wrap every tool payload (success and error) in {schema_version: 1, tool: }; closes the docs/MCP.md follow-ups and makes the ARCHITECTURE.md schema_version claim true. - golden snapshots for all 7 tools (tests/golden/mcp_*.json + test_mcp_golden.py); schema_version/tool asserted explicitly so a golden can't freeze a wrong version. - fix: MCP server failed to import on mcp>=1.27 + pydantic>=2.10 (FastMCP built a structured-output schema from the `-> str` return annotation and raised). Register tools as unstructured (structured_output=False where supported; older mcp detected). - golden_utils: mask package_version so the healthcheck golden survives version bumps. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/ARCHITECTURE.md | 3 +- docs/MCP.md | 35 +++++-- src/codebase_index/mcp/server.py | 74 ++++++++++----- tests/golden/mcp_explain_code.json | 23 +++++ tests/golden/mcp_find_refs.json | 34 +++++++ tests/golden/mcp_find_symbol.json | 23 +++++ tests/golden/mcp_healthcheck.json | 14 +++ tests/golden/mcp_impact_of.json | 40 ++++++++ tests/golden/mcp_index_stats.json | 23 +++++ tests/golden/mcp_search_code.json | 145 +++++++++++++++++++++++++++++ tests/golden_utils.py | 5 + tests/test_mcp_golden.py | 94 +++++++++++++++++++ tests/test_mcp_server.py | 21 +++++ 13 files changed, 503 insertions(+), 31 deletions(-) create mode 100644 tests/golden/mcp_explain_code.json create mode 100644 tests/golden/mcp_find_refs.json create mode 100644 tests/golden/mcp_find_symbol.json create mode 100644 tests/golden/mcp_healthcheck.json create mode 100644 tests/golden/mcp_impact_of.json create mode 100644 tests/golden/mcp_index_stats.json create mode 100644 tests/golden/mcp_search_code.json create mode 100644 tests/test_mcp_golden.py diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 2215440..28d098a 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -199,7 +199,8 @@ Current implementation: - `src/codebase_index/mcp/server.py` is a thin adapter over `retrieval/`, `storage/`, and `indexer/freshness.py`. - `codebase-index mcp --root ` runs the stdio server. -- JSON payloads include `schema_version`. +- Every JSON payload (including the error path) carries a `schema_version` + `tool` envelope, + locked by golden snapshots (`tests/golden/mcp_*.json`). - [MCP.md](MCP.md) provides config templates for Claude Desktop, Claude Code, Cursor, VS Code, Zed, and Windsurf. - `healthcheck` lets MCP clients distinguish "server running", "index missing", diff --git a/docs/MCP.md b/docs/MCP.md index 4a64a0b..cf708e7 100644 --- a/docs/MCP.md +++ b/docs/MCP.md @@ -41,11 +41,14 @@ The MCP server exposes the same retrieval contract as the CLI. ## Output contract -Tool responses are JSON strings returned through MCP content blocks. The -intended stable shape for retrieval responses is: +Tool responses are JSON strings returned through MCP content blocks. **Every** +payload — success or error — is wrapped in a stable envelope so clients can +branch on the contract without sniffing the shape: ```json { + "schema_version": 1, + "tool": "search_code", "index": { "exists": true, "stale": false, @@ -53,18 +56,30 @@ intended stable shape for retrieval responses is: "files_changed_since_build": 0 }, "results": [], - "recommended_reads": [], - "warnings": [] + "recommended_reads": [] } ``` +- `schema_version` (int) — the payload contract version. Bumped only on a + breaking change (field removal or type change); additive fields keep the same + version. The current version is **1**. +- `tool` (string) — the emitting tool name (`search_code`, `find_symbol`, + `find_refs`, `impact_of`, `explain_code`, `index_stats`, `healthcheck`). +- The no-index / error path carries the same envelope plus an `"error"` field. + Rules: -- Additive fields are allowed within a tool output version. -- Field removal or type changes should be treated as a protocol change. +- Additive fields are allowed within a `schema_version`. +- Field removal or type changes bump `schema_version`. - Tool descriptions should include examples and expected failure modes. - Errors should fail closed: no partial unsafe result when config or index state is unsafe. +Every tool's enveloped output is locked by golden snapshots in +`tests/golden/mcp_*.json` (regenerate intentionally with +`UPDATE_GOLDEN=1 pytest tests/test_mcp_golden.py`), and the `schema_version` / +`tool` values are asserted explicitly so a golden can never silently freeze a +wrong contract version. + ## Client config templates ### Claude Desktop @@ -143,8 +158,12 @@ same trust boundaries: - Done: `healthcheck`, `search_code`, `find_symbol`, `find_refs`, `impact_of`, `explain_code`, and `index_stats` tools. - Done: focused tests for tool registration, missing-index behavior, config resolution, and run entrypoint. -- Follow-up: explicit schema/version field in every structured tool payload. -- Follow-up: golden snapshots for every tool output. +- Done: explicit `schema_version` + `tool` envelope on every structured tool payload (including the + error path), asserted by `tests/test_mcp_server.py` and `tests/test_mcp_golden.py`. +- Done: golden snapshots for every tool output (`tests/golden/mcp_*.json`). +- Done: unstructured-output registration (`structured_output=False` where supported) so the server + loads on `mcp>=1.27` + `pydantic>=2.10`, where auto-detecting a structured schema from the `-> str` + return annotation otherwise raises at import time. - Follow-up: verified client-specific docs for Claude Desktop, Claude Code, Cursor, VS Code, Zed, and Windsurf. - Follow-up: paging or progressive result support. diff --git a/src/codebase_index/mcp/server.py b/src/codebase_index/mcp/server.py index 710917a..ae16c07 100644 --- a/src/codebase_index/mcp/server.py +++ b/src/codebase_index/mcp/server.py @@ -17,6 +17,7 @@ from __future__ import annotations +import inspect import json import os import sys @@ -44,6 +45,35 @@ ), ) +# Contract version for every structured tool payload. Bump on a breaking change +# (field removal / type change); additive fields keep the same version. Every tool +# return — including errors — is wrapped by `_emit`, so clients can branch on +# `schema_version` and `tool` without sniffing the shape. See docs/MCP.md. +MCP_SCHEMA_VERSION = 1 + + +def _emit(tool: str, payload: dict) -> str: + """Serialize a tool payload inside the stable MCP envelope. + + `schema_version` and `tool` lead; the payload follows. A payload key never + shadows the envelope (payloads do not carry these keys), but the explicit + order makes the contract self-describing in the raw JSON. + """ + return json.dumps({"schema_version": MCP_SCHEMA_VERSION, "tool": tool, **payload}) + + +# Tools return JSON *strings* (unstructured text). Newer FastMCP otherwise +# auto-builds a structured-output schema from the `-> str` return annotation, +# which crashes on some mcp/pydantic combinations (mcp>=1.27 + pydantic 2.10). +# Force unstructured output where the kwarg exists; older mcp (>=1.0) lacks it. +_SUPPORTS_STRUCTURED_OUTPUT = "structured_output" in inspect.signature(mcp.tool).parameters + + +def _tool(): + if _SUPPORTS_STRUCTURED_OUTPUT: + return mcp.tool(structured_output=False) + return mcp.tool() + def _resolve_db() -> tuple[Path, "Config"]: """Return (db_path, config). Respects CBX_DB_PATH and CBX_ROOT env vars.""" @@ -60,11 +90,11 @@ def _search_backend(cfg: "Config"): return search_backend(cfg, warn=lambda m: print(m, file=sys.stderr)) -def _no_index_error() -> str: - return json.dumps({"error": "No index found. Run `codebase-index index` in your project first."}) +def _no_index_payload() -> dict: + return {"error": "No index found. Run `codebase-index index` in your project first."} -@mcp.tool() +@_tool() def healthcheck() -> str: """Report package, root, and index health for MCP clients.""" db_path, cfg = _resolve_db() @@ -83,10 +113,10 @@ def healthcheck() -> str: "path": str(db_path), **compute_freshness(db.conn, Path(cfg.root), cfg).model_dump(), } - return json.dumps(payload) + return _emit("healthcheck", payload) -@mcp.tool() +@_tool() def search_code( query: str, mode: str = "hybrid", @@ -112,7 +142,7 @@ def search_code( """ db_path, cfg = _resolve_db() if not db_path.exists(): - return _no_index_error() + return _emit("search_code", _no_index_payload()) from ..service import search_payload @@ -120,10 +150,10 @@ def search_code( db_path, cfg, query, mode=mode, limit=limit, offset=offset, token_budget=token_budget, no_fallback=False, backend=_search_backend(cfg), ) - return json.dumps(payload) + return _emit("search_code", payload) -@mcp.tool() +@_tool() def find_symbol( name: str, kind: Optional[str] = None, @@ -140,17 +170,17 @@ def find_symbol( """ db_path, _ = _resolve_db() if not db_path.exists(): - return _no_index_error() + return _emit("find_symbol", _no_index_payload()) from ..retrieval.searchers import symbol_lookup from ..storage.db import Database with Database(db_path) as db: resp = symbol_lookup(db.conn, name, kind=kind, exact=exact) - return json.dumps(resp.model_dump()) + return _emit("find_symbol", resp.model_dump()) -@mcp.tool() +@_tool() def find_refs( symbol: str, kind: str = "all", @@ -165,17 +195,17 @@ def find_refs( """ db_path, _ = _resolve_db() if not db_path.exists(): - return _no_index_error() + return _emit("find_refs", _no_index_payload()) from ..retrieval.searchers import refs_lookup from ..storage.db import Database with Database(db_path) as db: resp = refs_lookup(db.conn, symbol, kind=kind) - return json.dumps(resp.model_dump()) + return _emit("find_refs", resp.model_dump()) -@mcp.tool() +@_tool() def impact_of( target: str, depth: int = 2, @@ -192,17 +222,17 @@ def impact_of( """ db_path, _ = _resolve_db() if not db_path.exists(): - return _no_index_error() + return _emit("impact_of", _no_index_payload()) from ..graph.expand import impact_lookup from ..storage.db import Database with Database(db_path) as db: resp = impact_lookup(db.conn, target, depth=depth, direction=direction) - return json.dumps(resp.model_dump()) + return _emit("impact_of", resp.model_dump()) -@mcp.tool() +@_tool() def explain_code( query: str, token_budget: int = 2200, @@ -221,7 +251,7 @@ def explain_code( """ db_path, cfg = _resolve_db() if not db_path.exists(): - return _no_index_error() + return _emit("explain_code", _no_index_payload()) from ..service import normalize_explain_query, search_payload @@ -230,22 +260,22 @@ def explain_code( offset=offset, token_budget=token_budget, no_fallback=False, backend=_search_backend(cfg), ) - return json.dumps(payload) + return _emit("explain_code", payload) -@mcp.tool() +@_tool() def index_stats() -> str: """Return index freshness, file count, symbol count, and per-language coverage.""" db_path, _ = _resolve_db() if not db_path.exists(): - return json.dumps({"exists": False, "error": "No index found."}) + return _emit("index_stats", {"exists": False, "error": "No index found."}) from ..service import stats_payload from ..storage.db import Database with Database(db_path) as db: payload = stats_payload(db.conn) - return json.dumps(payload) + return _emit("index_stats", payload) def run() -> None: diff --git a/tests/golden/mcp_explain_code.json b/tests/golden/mcp_explain_code.json new file mode 100644 index 0000000..74eebe9 --- /dev/null +++ b/tests/golden/mcp_explain_code.json @@ -0,0 +1,23 @@ +{ + "confidence": "low", + "fallback_suggestions": { + "ripgrep": [ + "rg -n \"authentication\"", + "rg -n \"how.*does.*authentication\"" + ] + }, + "index": { + "built_at": "", + "exists": true, + "files_changed_since_build": 0, + "head_commit": "", + "stale": false + }, + "intent": "how_it_works", + "mode": "hybrid", + "query": "how does authentication work", + "recommended_reads": [], + "results": [], + "schema_version": 1, + "tool": "explain_code" +} diff --git a/tests/golden/mcp_find_refs.json b/tests/golden/mcp_find_refs.json new file mode 100644 index 0000000..abdd727 --- /dev/null +++ b/tests/golden/mcp_find_refs.json @@ -0,0 +1,34 @@ +{ + "coverage": { + "languages": [], + "partial": false, + "reason": null + }, + "index": { + "built_at": "", + "exists": true, + "files_changed_since_build": 0, + "head_commit": "", + "stale": false + }, + "query": "refresh_access_token", + "schema_version": 1, + "sites": [ + { + "kind": "call", + "line": 11, + "path": "src/api/service.py" + }, + { + "kind": "definition", + "line": 4, + "path": "src/auth/token.py" + }, + { + "kind": "call", + "line": 11, + "path": "src/auth/token.py" + } + ], + "tool": "find_refs" +} diff --git a/tests/golden/mcp_find_symbol.json b/tests/golden/mcp_find_symbol.json new file mode 100644 index 0000000..697368d --- /dev/null +++ b/tests/golden/mcp_find_symbol.json @@ -0,0 +1,23 @@ +{ + "index": { + "built_at": "", + "exists": true, + "files_changed_since_build": 0, + "head_commit": "", + "stale": false + }, + "query": "User", + "schema_version": 1, + "symbols": [ + { + "kind": "class", + "line_end": 6, + "line_start": 4, + "name": "User", + "path": "src/models/user.py", + "qualified": "User", + "signature": "class User:" + } + ], + "tool": "find_symbol" +} diff --git a/tests/golden/mcp_healthcheck.json b/tests/golden/mcp_healthcheck.json new file mode 100644 index 0000000..565a111 --- /dev/null +++ b/tests/golden/mcp_healthcheck.json @@ -0,0 +1,14 @@ +{ + "index": { + "built_at": "", + "exists": true, + "files_changed_since_build": 0, + "head_commit": "", + "path": ".claude/cache/codebase-index/index.sqlite", + "stale": false + }, + "package_version": "", + "root": "", + "schema_version": 1, + "tool": "healthcheck" +} diff --git a/tests/golden/mcp_impact_of.json b/tests/golden/mcp_impact_of.json new file mode 100644 index 0000000..5fc14dc --- /dev/null +++ b/tests/golden/mcp_impact_of.json @@ -0,0 +1,40 @@ +{ + "coverage": { + "languages": [], + "partial": false, + "reason": null + }, + "depth": 2, + "direction": "up", + "files": [ + "src/api/service.py" + ], + "index": { + "built_at": "", + "exists": true, + "files_changed_since_build": 0, + "head_commit": "", + "stale": false + }, + "nodes": [ + { + "distance": 1, + "kind": "file", + "line_start": null, + "name": null, + "path": "src/api/service.py", + "via_edge": "import" + }, + { + "distance": 1, + "kind": "symbol", + "line_start": 7, + "name": "AdminUser", + "path": "src/api/service.py", + "via_edge": "extends" + } + ], + "schema_version": 1, + "target": "src/models/user.py", + "tool": "impact_of" +} diff --git a/tests/golden/mcp_index_stats.json b/tests/golden/mcp_index_stats.json new file mode 100644 index 0000000..d8dcf27 --- /dev/null +++ b/tests/golden/mcp_index_stats.json @@ -0,0 +1,23 @@ +{ + "built_at": "", + "exists": true, + "files": 6, + "head_commit": "", + "schema_version": 1, + "symbols": 7, + "tool": "index_stats", + "treesitter_coverage": [ + { + "files": 3, + "graph": "full", + "lang": "python", + "symbols": 6 + }, + { + "files": 2, + "graph": "full", + "lang": "typescript", + "symbols": 1 + } + ] +} diff --git a/tests/golden/mcp_search_code.json b/tests/golden/mcp_search_code.json new file mode 100644 index 0000000..7f81862 --- /dev/null +++ b/tests/golden/mcp_search_code.json @@ -0,0 +1,145 @@ +{ + "confidence": "high", + "fallback_suggestions": {}, + "index": { + "built_at": "", + "exists": true, + "files_changed_since_build": 0, + "head_commit": "", + "stale": false + }, + "intent": "keyword", + "mode": "hybrid", + "query": "token", + "recommended_reads": [ + { + "line_end": 6, + "line_start": 4, + "path": "src/auth/token.py" + }, + { + "line_end": 5, + "line_start": 4, + "path": "src/auth/token.py" + }, + { + "line_end": 11, + "line_start": 9, + "path": "src/auth/token.py" + }, + { + "line_end": 2, + "line_start": 1, + "path": "src/auth/token.py" + }, + { + "line_end": 10, + "line_start": 9, + "path": "src/auth/token.py" + }, + { + "line_end": 1, + "line_start": 1, + "path": "src/auth/token.py" + }, + { + "line_end": 5, + "line_start": 1, + "path": "src/api/service.py" + } + ], + "results": [ + { + "line_end": 6, + "line_start": 4, + "path": "src/auth/token.py", + "rank": 1, + "reason": "in src/auth/ · 2 callers", + "score": 0.1596, + "snippet": "def refresh_access_token(refresh_token: str) -> str:", + "symbols": [ + "refresh_access_token" + ], + "token_est": 13 + }, + { + "line_end": 5, + "line_start": 4, + "path": "src/auth/token.py", + "rank": 2, + "reason": "in src/auth/", + "score": 0.0664, + "snippet": "Exchange a refresh token for a new access token.", + "symbols": [], + "token_est": 12 + }, + { + "line_end": 11, + "line_start": 9, + "path": "src/auth/token.py", + "rank": 3, + "reason": "in src/auth/", + "score": 0.0661, + "snippet": "def login(refresh_token: str) -> str:\n \"\"\"Calls refresh_access_token so refs/impact tests have an edge.\"\"\"\n return refresh_access_token(refresh_token)", + "symbols": [], + "token_est": 39 + }, + { + "line_end": 2, + "line_start": 1, + "path": "src/auth/token.py", + "rank": 4, + "reason": "in src/auth/", + "score": 0.0659, + "snippet": "\"\"\"Token helpers (fixture).\"\"\"\n", + "symbols": [], + "token_est": 8 + }, + { + "line_end": 10, + "line_start": 9, + "path": "src/auth/token.py", + "rank": 5, + "reason": "in src/auth/", + "score": 0.0652, + "snippet": "Calls refresh_access_token so refs/impact tests have an edge.", + "symbols": [], + "token_est": 15 + }, + { + "line_end": 1, + "line_start": 1, + "path": "src/auth/token.py", + "rank": 6, + "reason": "in src/auth/", + "score": 0.0583, + "snippet": null, + "symbols": [], + "token_est": 0 + }, + { + "line_end": 11, + "line_start": 7, + "path": "src/api/service.py", + "rank": 7, + "reason": "fts", + "score": 0.0156, + "snippet": "class AdminUser(User):\n \"\"\"Subclass of User; imported-from edge target for impact tests.\"\"\"\n\n def renew(self, refresh_token: str) -> str:\n return refresh_access_token(refresh_token)", + "symbols": [], + "token_est": 48 + }, + { + "line_end": 5, + "line_start": 1, + "path": "src/api/service.py", + "rank": 8, + "reason": "fts", + "score": 0.0154, + "snippet": "\"\"\"Service layer (fixture) - exercises cross-file edges for impact tests.\"\"\"\n\nfrom auth.token import refresh_access_token\nfrom models.user import User\n", + "symbols": [], + "token_est": 38 + } + ], + "schema_version": 1, + "tool": "search_code" +} diff --git a/tests/golden_utils.py b/tests/golden_utils.py index e8b02a2..ced4803 100644 --- a/tests/golden_utils.py +++ b/tests/golden_utils.py @@ -20,6 +20,9 @@ # Keys whose values are inherently volatile and must be masked, not compared. _TS_KEYS = {"built_at", "indexed_at", "generated_at"} _SHA_KEYS = {"head_commit"} +# Released package version churns on every bump; mask it so goldens don't. Note +# `schema_version` is deliberately NOT masked — it IS the contract under test. +_VERSION_KEYS = {"package_version"} def _scrub(value: Any, root: str) -> Any: @@ -30,6 +33,8 @@ def _scrub(value: Any, root: str) -> Any: out[k] = "" elif k in _SHA_KEYS and v is not None: out[k] = "" + elif k in _VERSION_KEYS and v is not None: + out[k] = "" else: out[k] = _scrub(v, root) return out diff --git a/tests/test_mcp_golden.py b/tests/test_mcp_golden.py new file mode 100644 index 0000000..20b4da8 --- /dev/null +++ b/tests/test_mcp_golden.py @@ -0,0 +1,94 @@ +"""Golden-snapshot tests for every MCP tool output (requires the mcp extra). + +Mirrors tests/test_cli_golden.py but drives the MCP server tool functions, which +wrap each payload in the stable envelope (schema_version + tool). The MCP tools +resolve the index from CBX_ROOT / CBX_DB_PATH, so we build a real index from the +shared sample_repo fixture and point the env at it. + +Regenerate intentionally with: UPDATE_GOLDEN=1 pytest tests/test_mcp_golden.py +""" +from __future__ import annotations + +import json as _json +import os +import subprocess +from unittest.mock import patch + +import pytest +from typer.testing import CliRunner + +try: + from codebase_index.mcp import server as mcp_server + MCP_AVAILABLE = True +except ImportError: + MCP_AVAILABLE = False + +pytestmark = pytest.mark.skipif(not MCP_AVAILABLE, reason="mcp extra not installed") + +from codebase_index.cli import app # noqa: E402 (after the skip guard) +from tests.golden_utils import assert_matches_golden # noqa: E402 + +runner = CliRunner() + + +@pytest.fixture(scope="module") +def indexed_repo(tmp_path_factory): + """A copy of sample_repo with a freshly built index, isolated from the source tree.""" + import shutil + + from tests.conftest import FIXTURE_ROOT + + dest = tmp_path_factory.mktemp("mcp_indexed") / "repo" + shutil.copytree(FIXTURE_ROOT, dest) + + identity = ["-c", "user.name=golden", "-c", "user.email=golden@test"] + subprocess.run(["git", "init"], cwd=dest, capture_output=True) + subprocess.run(["git", "add", "."], cwd=dest, capture_output=True) + commit = subprocess.run( + ["git", *identity, "commit", "-m", "initial"], cwd=dest, capture_output=True, text=True + ) + assert commit.returncode == 0, commit.stderr + assert runner.invoke(app, ["--root", str(dest), "index"]).exit_code == 0 + return dest + + +def _call(indexed_repo, tool_fn, **kwargs): + """Invoke an MCP tool against the indexed fixture and parse the JSON envelope.""" + db_path = indexed_repo / ".claude" / "cache" / "codebase-index" / "index.sqlite" + env = {"CBX_ROOT": str(indexed_repo), "CBX_DB_PATH": str(db_path)} + with patch.dict(os.environ, env, clear=False): + return _json.loads(tool_fn(**kwargs)) + + +# name -> (tool function, kwargs). Queries mirror the CLI goldens for parity. +CASES = { + "mcp_healthcheck": (lambda: mcp_server.healthcheck, {}), + "mcp_search_code": (lambda: mcp_server.search_code, {"query": "token"}), + "mcp_find_symbol": (lambda: mcp_server.find_symbol, {"name": "User"}), + "mcp_find_refs": (lambda: mcp_server.find_refs, {"symbol": "refresh_access_token"}), + "mcp_impact_of": (lambda: mcp_server.impact_of, {"target": "src/models/user.py", "direction": "up"}), + "mcp_explain_code": (lambda: mcp_server.explain_code, {"query": "how does authentication work"}), + "mcp_index_stats": (lambda: mcp_server.index_stats, {}), +} + +# golden name -> the tool field every envelope must carry. +_EXPECTED_TOOL = { + "mcp_healthcheck": "healthcheck", + "mcp_search_code": "search_code", + "mcp_find_symbol": "find_symbol", + "mcp_find_refs": "find_refs", + "mcp_impact_of": "impact_of", + "mcp_explain_code": "explain_code", + "mcp_index_stats": "index_stats", +} + + +@pytest.mark.parametrize("name", list(CASES), ids=list(CASES)) +def test_mcp_tool_matches_golden(indexed_repo, name): + fn_factory, kwargs = CASES[name] + payload = _call(indexed_repo, fn_factory(), **kwargs) + # The contract values are asserted explicitly — a golden alone would happily + # freeze a wrong schema_version. The golden then guards the rest of the shape. + assert payload["schema_version"] == mcp_server.MCP_SCHEMA_VERSION + assert payload["tool"] == _EXPECTED_TOOL[name] + assert_matches_golden(name, payload, root=str(indexed_repo)) diff --git a/tests/test_mcp_server.py b/tests/test_mcp_server.py index c76f91b..063e4c1 100644 --- a/tests/test_mcp_server.py +++ b/tests/test_mcp_server.py @@ -60,6 +60,27 @@ def test_search_code_no_index(): assert "index" in result["error"].lower() +# ── schema envelope (schema_version + tool on every payload, incl. errors) ───── + +_ENVELOPE_CALLS = { + "healthcheck": lambda: _call(mcp_server.healthcheck), + "search_code": lambda: _call(mcp_server.search_code, query="foo"), + "find_symbol": lambda: _call(mcp_server.find_symbol, name="Foo"), + "find_refs": lambda: _call(mcp_server.find_refs, symbol="foo"), + "impact_of": lambda: _call(mcp_server.impact_of, target="foo.py"), + "explain_code": lambda: _call(mcp_server.explain_code, query="how does foo work"), + "index_stats": lambda: _call(mcp_server.index_stats), +} + + +@pytest.mark.parametrize("tool,call", list(_ENVELOPE_CALLS.items()), ids=list(_ENVELOPE_CALLS)) +def test_every_tool_payload_carries_schema_envelope(tool, call): + """Even the no-index error path is wrapped in the stable envelope.""" + result = _with_missing_db(call) + assert result["schema_version"] == mcp_server.MCP_SCHEMA_VERSION + assert result["tool"] == tool + + def test_healthcheck_no_index(): result = _with_missing_db(lambda: _call(mcp_server.healthcheck)) assert result["package_version"] From 44e03d4a987ec7eb1f64304291cebc91b4859892 Mon Sep 17 00:00:00 2001 From: denfry Date: Sun, 14 Jun 2026 12:37:54 +0300 Subject: [PATCH 3/3] docs: sync roadmap with shipped MCP, add trust-model callout, changelog Roadmap chunk A (roadmap sync + docs). - docs/ROADMAP.md: M10 MCP bridge marked shipped (was "planned"); reconcile the technical vs product milestone numbering instead of claiming one is canonical. - "Trust model in 60 seconds" callout, identical in README.md and docs/SECURITY.md. - PRODUCT_UPGRADE_PLAN.md: mark shipped items (schema_version, golden snapshots, in_degree dampening, config/IaC labeling, trust-model doc). - CHANGELOG.md: Unreleased entries for chunks A/B/C and the MCP import fix. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 35 +++++++++++++++++++++++++++++++++++ README.md | 8 ++++++++ docs/PRODUCT_UPGRADE_PLAN.md | 32 ++++++++++++++++++++------------ docs/ROADMAP.md | 23 ++++++++++++++++------- docs/SECURITY.md | 10 ++++++++++ 5 files changed, 89 insertions(+), 19 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 58a6aa9..54c9e6e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,8 +18,37 @@ All notable changes to this project are documented here. The format is based on - **`docs/RELEASE_CHECKLIST.md`**: a repeatable release checklist (version sync, tests, benchmarks, doctor, install/plugin/MCP smoke, changelog) with signed checksums + SBOM tracked as future hardening. +- **MCP contract hardening (M11.5)**: every MCP tool payload — success *and* the + no-index/error path — is now wrapped in a stable envelope (`schema_version`: 1, + `tool`: ). Golden snapshots lock every tool's output + (`tests/golden/mcp_*.json` via `tests/test_mcp_golden.py`), and the contract + values are asserted explicitly so a golden can't freeze a wrong version. Closes + the long-standing `docs/MCP.md` follow-ups and makes the `schema_version` claim + in `docs/ARCHITECTURE.md` §8 true. +- **Config / IaC language labeling**: Dockerfile, Containerfile, `*.tf`/`*.tfvars` + (terraform), `*.hcl`, `*.ini`/`*.cfg`/`*.conf`/`*.properties` (ini), and + Makefiles now get a real language label. These files were already FTS-indexed as + unknown text; labeling surfaces infra files in `stats` and lets agents scope + searches to config. They stay on the line/FTS floor (no tree-sitter spec). +- **Typed framework edges — design doc** + (`docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md`): the + documented-first deliverable for the M13 code-intelligence graph + (route→handler→service→model, test→impl, config→consumer, …) with a schema, + confidence/provenance model, resolver architecture, and a benchmark gate. +- **"Trust model in 60 seconds"** callout, identical in `README.md` and + `docs/SECURITY.md`. ### Changed +- **Reranker: dampened the god-class `in_degree` tiebreak** (`retrieval/rerank.py`). + The graph-centrality bonus is now logarithmic with a lower cap instead of linear + (which saturated by in_degree 10, giving 100-caller "god classes" the full bonus + and floating them above genuinely relevant low-degree matches on stray-term ties). + Validated as no-regression on the public benchmark (Recall@k / MRR / nDCG + unchanged) with a targeted regression test; the real-repo gain on the honest Java + misses is tracked under M12.5. CLI/MCP `search` goldens regenerated accordingly. +- **`docs/ROADMAP.md`**: M10 MCP bridge marked shipped (was "planned"); reconciled + the technical-vs-product milestone numbering instead of claiming one is canonical. + - **README**: added "Who Is It For?" and a "How Is This Different?" section that answers why-not-grep / Cursor / Aider repo-map / Sourcegraph / Codebase-Memory MCP on the first screen, plus a proven-today-vs-roadmap table. @@ -30,6 +59,12 @@ All notable changes to this project are documented here. The format is based on TODO-friendly benchmark task checklist with a no-overclaim procedure. ### Fixed +- **MCP server failed to import on `mcp>=1.27` + `pydantic>=2.10`**: newer FastMCP + auto-built a structured-output schema from each tool's `-> str` return annotation + and raised `PydanticUserError` at import time, breaking the server and its test + suite. Tools now register as unstructured (`structured_output=False` where the + kwarg exists; older `mcp` is detected and unaffected), preserving the existing + text-content wire contract. - `docs/FAQ.md`: removed a dangling/duplicated sentence in "Is it production-ready?" and documented the real `clean` / `clean --all` behavior. diff --git a/README.md b/README.md index 6b01fbd..3f8a346 100644 --- a/README.md +++ b/README.md @@ -429,6 +429,14 @@ Answer with precise file:line citations ## Safety and Privacy +> **Trust model in 60 seconds** +> 1. **Offline by default** — the base install has zero network dependencies; nothing leaves your machine. +> 2. **One opt-in exit, triple-gated** — external embeddings require `allow_external` **and** an env API key **and** a printed endpoint warning, or they are refused. +> 3. **Secrets never get in** — `.env`, keys, certs, and credential files are excluded before parsing (multi-gate ignore pipeline). +> 4. **Secrets never get out** — every snippet is redacted (AWS keys, private keys, JWTs, bearer tokens, connection strings) before it reaches the agent. +> 5. **No telemetry, ever** — no analytics, no phone-home, no usage data. +> 6. **Verify it yourself** — `codebase-index doctor --strict` audits all of the above and exits non-zero in CI on any high-severity finding. + `codebase-index` is designed with privacy as a first principle: - **No telemetry** — No usage data, analytics, or crash reports are collected or transmitted. diff --git a/docs/PRODUCT_UPGRADE_PLAN.md b/docs/PRODUCT_UPGRADE_PLAN.md index 16be998..cab7433 100644 --- a/docs/PRODUCT_UPGRADE_PLAN.md +++ b/docs/PRODUCT_UPGRADE_PLAN.md @@ -89,7 +89,7 @@ transparent Python implementation, a strict privacy model, and honest benchmarks | Weakness | Impact | Plan | |---|---|---| | No large-scale real-repo benchmark | Can't claim 100k/1M LOC quality | Benchmark tasks §8; recruit public repos | -| Graph is import/call/ref only | `impact` misses framework wiring | ARCHITECTURE §9 typed-edge roadmap | +| Graph is import/call/ref only | `impact` misses framework wiring | ARCHITECTURE §9 + design doc `specs/2026-06-14-typed-framework-edges-design.md`; implementation behind §8 benchmark | | GitHub-only distribution | No `pip install codebase-index` / `uvx` | Distribution tasks §9 | | MCP client docs unverified | Templates may be wrong per client version | Verify against each client, add per-client docs | | Single-repo only | No monorepo/fleet context | Out of scope near-term; documented as non-goal | @@ -101,12 +101,15 @@ transparent Python implementation, a strict privacy model, and honest benchmarks logs. Highest credibility lever. 2. **Typed framework edges** (route→handler→service→model, test→impl, config→consumer) with source spans + confidence. Biggest product-quality lever for `impact`. + *Design approved this pass* (`specs/2026-06-14-typed-framework-edges-design.md`); + implementation gated on the §8 graph benchmark. 3. **Distribution hardening**: PyPI publish, `uvx`/`pipx` story, signed checksums, SBOM. Lowers adoption friction and raises supply-chain trust. -4. **MCP contract hardening**: `schema_version` on every payload, golden - snapshots per tool, verified client docs, paging/progressive results. -5. **Retrieval tuning**: dampen the god-class `in_degree` tiebreak (the 3 honest - misses in the Java run), per-intent weights review. +4. **MCP contract hardening**: ✅ `schema_version` on every payload + golden + snapshots per tool (this pass). Remaining: verified client docs, paging/progressive results. +5. **Retrieval tuning**: ✅ dampened the god-class `in_degree` tiebreak this pass + (log curve + lower cap, validated no-regression on the public suite). Remaining: + confirm the real-repo gain on the 3 honest Java misses (needs M12.5), per-intent weights review. 6. **Language reach**: config/IaC awareness (Dockerfile, Terraform, migrations, CI), plus Swift/Dart/Scala/Vue/Svelte gaps called out in FAQ. @@ -119,7 +122,7 @@ transparent Python implementation, a strict privacy model, and honest benchmarks - [x] `docs/BENCHMARKS.md` "claims not to make yet" + TODO benchmark checklist. - [x] `docs/RELEASE_CHECKLIST.md`. - [ ] Verified per-client MCP setup docs (after testing each client version). -- [ ] A short "trust model in 60 seconds" callout reused across README/SECURITY. +- [x] A short "trust model in 60 seconds" callout reused across README/SECURITY. ## 8. Benchmark tasks @@ -150,14 +153,19 @@ Track in [BENCHMARKS.md](BENCHMARKS.md); none may be reported until run with log | # | Improvement | Impact | Risk | Status | |---|---|---|---|---| -| 1 | Implement `clean` (documented but was a stub) | Fixes doc/reality gap | Low | **Shipped this pass** | -| 2 | Dampen god-class `in_degree` tiebreak in rerank | +recall on real repos | Medium (retune) | Planned | -| 3 | `schema_version` on every MCP payload | Stable contract | Low | Partly (architecture claims it) — verify+test | -| 4 | Golden snapshots for each MCP tool output | Regression safety | Low | Planned | -| 5 | Typed framework edges in the graph | Better `impact` | High | Roadmap (ARCHITECTURE §9) | -| 6 | Config/IaC parsers (Dockerfile, Terraform, migrations) | Coverage | Medium | Roadmap | +| 1 | Implement `clean` (documented but was a stub) | Fixes doc/reality gap | Low | **Shipped (1.3.0 line)** | +| 2 | Dampen god-class `in_degree` tiebreak in rerank | +recall on real repos | Medium (retune) | **Shipped this pass** — log dampening + lower cap; no-regression on the public suite + a targeted regression test. Real-repo gain still needs M12.5. | +| 3 | `schema_version` on every MCP payload | Stable contract | Low | **Shipped this pass** — `schema_version` + `tool` envelope on every payload (incl. errors), asserted + golden-locked. | +| 4 | Golden snapshots for each MCP tool output | Regression safety | Low | **Shipped this pass** — `tests/golden/mcp_*.json` via `tests/test_mcp_golden.py`. | +| 5 | Typed framework edges in the graph | Better `impact` | High | Design doc shipped this pass (`docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md`); implementation behind the §8 benchmark. | +| 6 | Config/IaC parsers (Dockerfile, Terraform, migrations) | Coverage | Medium | **Partly shipped this pass** — Tier-C labeling for Dockerfile/Terraform/HCL/INI/Make (already FTS-indexed, now language-labeled); tree-sitter parsing of these still roadmap. | | 7 | Paging/progressive MCP results | Big-repo UX | Medium | Roadmap (MCP.md) | +Also fixed this pass (not previously tracked): the MCP server failed to import on +`mcp>=1.27` + `pydantic>=2.10` (FastMCP auto-built a structured-output schema from +the `-> str` return annotation and raised). Tools now register as unstructured +(`structured_output=False` where supported), so the server loads on current `mcp`. + Rule for this repo: small, safe, tested changes land directly; anything that risks destabilizing retrieval quality or the security model is documented here first and lands behind a benchmark. diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 6015393..2001288 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -1,8 +1,13 @@ # Roadmap & First Implementation Tasks Milestones are vertical-ish slices: each ends with something runnable and testable. -This numbering is canonical — the product-level [ROADMAP.md](../ROADMAP.md) and the -`(Mx)` tags in [CHANGELOG.md](../CHANGELOG.md) follow it. +This is the **technical-milestone** view (M0–M10). The product-level +[ROADMAP.md](../ROADMAP.md) tells the same story at a finer grain and carries it +further (it splits the MCP server into M11 and adds M11.5/M12/M12.5/M13 for MCP +hardening, benchmarks, and the typed-edge graph). Where the two disagree on a +number, the product roadmap is the current product view; this file tracks the +original implementation slices. The `(Mx)` tags in +[CHANGELOG.md](../CHANGELOG.md) follow this technical numbering. ## M0 — Architecture & scaffold ✅ (this repo) - Repo tree, docs (ARCHITECTURE/RETRIEVAL/SCHEMA/SECURITY/INSTALLATION), SKILL.md draft. @@ -77,11 +82,15 @@ release with the built artifacts (GitHub-only distribution — no PyPI publish). "git+https://github.com/denfry/codebase-index.git@v1.2.0"` -> `init` -> `index` -> ask a question is verified end-to-end by `scripts/release_smoke.py`.* -## M10 — Optional MCP bridge (planned) -- Model Context Protocol server exposing `search`, `symbol`, `refs`, `impact` as tools for - MCP-compatible clients (Claude Desktop, Cursor, etc.). An optional addition, not a replacement - for the Skill/CLI interface. -- **Exit:** `codebase-index` can be used as an MCP tool by any MCP-compatible client. +## M10 — MCP bridge ✅ (product roadmap M11) +- Shipped: a stdio Model Context Protocol server (`codebase-index mcp --root `, or the + `codebase-index-mcp` entry point) exposing `healthcheck`, `search_code`, `find_symbol`, + `find_refs`, `impact_of`, `explain_code`, and `index_stats` over the same `service.py` layer the + CLI uses — an optional addition, not a replacement for the Skill/CLI interface. Every payload + carries a `schema_version` + `tool` envelope, locked by golden snapshots (`tests/golden/mcp_*.json`). +- **Exit:** `codebase-index` can be used as an MCP tool by any MCP-compatible client. See + [MCP.md](MCP.md). +- Follow-up (product roadmap M11.5): verified per-client setup docs and paging/progressive results. --- diff --git a/docs/SECURITY.md b/docs/SECURITY.md index dbb5762..be4c959 100644 --- a/docs/SECURITY.md +++ b/docs/SECURITY.md @@ -3,6 +3,16 @@ `codebase-index` is **local-first and offline by default**. Its threat model assumes the indexed repository may contain secrets and that a skill must not exfiltrate code or run dangerous commands. +> **Trust model in 60 seconds** +> 1. **Offline by default** — the base install has zero network dependencies; nothing leaves your machine (§1, §4). +> 2. **One opt-in exit, triple-gated** — external embeddings require `allow_external` **and** an env API key **and** a printed endpoint warning, or they are refused (§4). +> 3. **Secrets never get in** — `.env`, keys, certs, and credential files are excluded before parsing (§2). +> 4. **Secrets never get out** — every snippet is redacted before it reaches the agent (§3). +> 5. **No telemetry, ever** — no analytics, no phone-home, no usage data. +> 6. **Verify it yourself** — `codebase-index doctor --strict` audits all of the above and gates CI (§6). +> +> The same callout appears in the README so the trust story is identical wherever a reader lands. + ## 1. Principles 1. **Local-first** — index, query, and storage all happen on the user's machine.