Skip to content

Commit 86a284b

Browse files
authored
feat: call resolution precision/recall benchmark suite (4.4) (#507)
* feat: add call resolution precision/recall benchmark suite (4.4) Add tests/benchmarks/resolution/ with hand-annotated fixture projects for JavaScript and TypeScript. Each fixture declares expected call edges in an expected-edges.json manifest. The benchmark runner builds the graph, compares resolved edges against expected, and reports precision/recall per language and per resolution mode (static, receiver-typed). Runs as part of npm test — CI fails if metrics drop below baseline. Current baselines: - JS: 100% precision, 60% recall (9/15 edges) - TS: 100% precision, 69% recall (11/16 edges) Impact: 43 functions changed, 9 affected * fix: tighten TS thresholds and harden benchmark runner - Ratchet TypeScript recall thresholds to measured-10pp (recall 0.58, receiverRecall 0.45, staticRecall 0.9) so the CI gate catches regressions - Remove duplicate formatReport console.log from precision test (afterAll already prints the summary) - Use withFileTypes in copyFixture to skip subdirectories safely - Guard discoverFixtures with existsSync to prevent opaque ENOENT at import
1 parent c089631 commit 86a284b

15 files changed

Lines changed: 832 additions & 11 deletions

CONTRIBUTING.md

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -153,11 +153,12 @@ enabled. The test structure:
153153

154154
```
155155
tests/
156-
integration/ # buildGraph + full query commands
157-
graph/ # Cycle detection, DOT/Mermaid export
158-
parsers/ # Language parser extraction (one file per language)
159-
search/ # Semantic search + embeddings
160-
fixtures/ # Sample projects used by tests
156+
integration/ # buildGraph + full query commands
157+
graph/ # Cycle detection, DOT/Mermaid export
158+
parsers/ # Language parser extraction (one file per language)
159+
search/ # Semantic search + embeddings
160+
benchmarks/resolution/ # Call resolution precision/recall (per-language fixtures)
161+
fixtures/ # Sample projects used by tests
161162
```
162163

163164
- Integration tests create temporary copies of fixture projects for isolation
@@ -166,18 +167,45 @@ tests/
166167

167168
## Regression Benchmarks
168169

169-
Two regression benchmark scripts live in `scripts/`. These are **not** unit
170-
tests — they measure performance metrics that reviewers use to judge whether a
171-
change is acceptable. If your PR touches code covered by a benchmark, you
172-
**must** run it before and after your changes and include the results in the PR
173-
description.
170+
Several regression benchmarks track codegraph's accuracy and performance across
171+
versions. Some live in `scripts/` (run manually), while the resolution benchmark
172+
runs automatically as part of `npm test`. If your PR touches code covered by a
173+
benchmark, you **must** run it before and after your changes and include the
174+
results in the PR description.
174175

175176
| Benchmark | What it measures | When to run |
176177
|-----------|-----------------|-------------|
177178
| `node scripts/benchmark.js` | Build speed (native vs WASM), query latency | Changes to `builder.js`, `parser.js`, `queries.js`, `resolve.js`, `db.js`, or the native engine |
178179
| `node scripts/embedding-benchmark.js` | Search recall (Hit@1/3/5/10) across models | Changes to `embedder.js` or embedding strategies |
179180
| `node scripts/query-benchmark.js` | Query depth scaling, diff-impact latency | Changes to `queries.js`, `resolve.js`, or `db.js` |
180181
| `node scripts/incremental-benchmark.js` | Incremental build, import resolution throughput | Changes to `builder.js`, `resolve.js`, `parser.js`, or `journal.js` |
182+
| `npx vitest run tests/benchmarks/resolution/` | Call resolution precision/recall per language | Changes to `build-edges.js`, `resolve.js`, `parser.js`, or any extractor |
183+
184+
### Resolution precision/recall benchmark
185+
186+
The resolution benchmark (`tests/benchmarks/resolution/`) measures how
187+
accurately codegraph resolves call edges. It uses hand-annotated fixture projects
188+
with an `expected-edges.json` manifest per language that declares every call edge
189+
that should be detected.
190+
191+
The benchmark runner builds the graph for each fixture, compares resolved edges
192+
against the manifest, and reports:
193+
194+
- **Precision** — what fraction of resolved edges are correct (no false positives)
195+
- **Recall** — what fraction of expected edges were found (no false negatives)
196+
- **Per-mode breakdown** — separate recall for `static`, `receiver-typed`, and
197+
`interface-dispatched` resolution modes
198+
199+
**CI gate:** The benchmark runs as part of `npm test`. If precision or recall
200+
drops below the configured thresholds for any language, the test fails.
201+
202+
**Adding a new language fixture:**
203+
204+
1. Create `tests/benchmarks/resolution/fixtures/<language>/` with source files
205+
2. Add an `expected-edges.json` manifest (see the JSON schema at
206+
`tests/benchmarks/resolution/expected-edges.schema.json`)
207+
3. Add thresholds in `resolution-benchmark.test.js``THRESHOLDS`
208+
4. The benchmark runner auto-discovers fixtures with an `expected-edges.json`
181209

182210
### How to report results
183211

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -546,7 +546,7 @@ Codegraph also extracts symbols from common callback patterns: Commander `.comma
546546

547547
## 📊 Performance
548548

549-
Self-measured on every release via CI ([build benchmarks](generated/benchmarks/BUILD-BENCHMARKS.md) | [embedding benchmarks](generated/benchmarks/EMBEDDING-BENCHMARKS.md)):
549+
Self-measured on every release via CI ([build benchmarks](generated/benchmarks/BUILD-BENCHMARKS.md) | [embedding benchmarks](generated/benchmarks/EMBEDDING-BENCHMARKS.md) | [query benchmarks](generated/benchmarks/QUERY-BENCHMARKS.md) | [incremental benchmarks](generated/benchmarks/INCREMENTAL-BENCHMARKS.md) | [resolution precision/recall](tests/benchmarks/resolution/)):
550550

551551
| Metric | Latest |
552552
|---|---|
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"title": "Expected Call Edges Manifest",
4+
"description": "Hand-annotated call edges for resolution precision/recall benchmarks",
5+
"type": "object",
6+
"required": ["language", "edges"],
7+
"properties": {
8+
"language": {
9+
"type": "string",
10+
"description": "Language identifier matching the fixture directory name"
11+
},
12+
"description": {
13+
"type": "string"
14+
},
15+
"edges": {
16+
"type": "array",
17+
"items": {
18+
"type": "object",
19+
"required": ["source", "target", "kind", "mode"],
20+
"properties": {
21+
"source": {
22+
"type": "object",
23+
"required": ["name", "file"],
24+
"properties": {
25+
"name": { "type": "string", "description": "Function/method name" },
26+
"file": { "type": "string", "description": "Filename (basename only)" }
27+
}
28+
},
29+
"target": {
30+
"type": "object",
31+
"required": ["name", "file"],
32+
"properties": {
33+
"name": { "type": "string", "description": "Function/method name" },
34+
"file": { "type": "string", "description": "Filename (basename only)" }
35+
}
36+
},
37+
"kind": {
38+
"type": "string",
39+
"enum": ["calls"],
40+
"description": "Edge kind — currently only 'calls'"
41+
},
42+
"mode": {
43+
"type": "string",
44+
"enum": ["static", "receiver-typed", "interface-dispatched"],
45+
"description": "Resolution mode that should produce this edge"
46+
},
47+
"notes": {
48+
"type": "string",
49+
"description": "Human-readable explanation of why this edge is expected"
50+
}
51+
}
52+
}
53+
}
54+
}
55+
}
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
{
2+
"$schema": "../../expected-edges.schema.json",
3+
"language": "javascript",
4+
"description": "Hand-annotated call edges for JavaScript resolution benchmark",
5+
"edges": [
6+
{
7+
"source": { "name": "UserService.createUser", "file": "service.js" },
8+
"target": { "name": "normalize", "file": "validators.js" },
9+
"kind": "calls",
10+
"mode": "static",
11+
"notes": "Direct imported function call from method"
12+
},
13+
{
14+
"source": { "name": "UserService.createUser", "file": "service.js" },
15+
"target": { "name": "validate", "file": "validators.js" },
16+
"kind": "calls",
17+
"mode": "static",
18+
"notes": "Direct imported function call from method"
19+
},
20+
{
21+
"source": { "name": "UserService.createUser", "file": "service.js" },
22+
"target": { "name": "Logger.error", "file": "logger.js" },
23+
"kind": "calls",
24+
"mode": "receiver-typed",
25+
"notes": "this.logger.error() — receiver-typed via constructor assignment"
26+
},
27+
{
28+
"source": { "name": "UserService.createUser", "file": "service.js" },
29+
"target": { "name": "Logger.info", "file": "logger.js" },
30+
"kind": "calls",
31+
"mode": "receiver-typed",
32+
"notes": "this.logger.info() — receiver-typed via constructor assignment"
33+
},
34+
{
35+
"source": { "name": "UserService.deleteUser", "file": "service.js" },
36+
"target": { "name": "Logger.warn", "file": "logger.js" },
37+
"kind": "calls",
38+
"mode": "receiver-typed",
39+
"notes": "this.logger.warn() — receiver-typed via constructor assignment"
40+
},
41+
{
42+
"source": { "name": "Logger.info", "file": "logger.js" },
43+
"target": { "name": "Logger._write", "file": "logger.js" },
44+
"kind": "calls",
45+
"mode": "static",
46+
"notes": "this._write() — same-class method call"
47+
},
48+
{
49+
"source": { "name": "Logger.warn", "file": "logger.js" },
50+
"target": { "name": "Logger._write", "file": "logger.js" },
51+
"kind": "calls",
52+
"mode": "static",
53+
"notes": "this._write() — same-class method call"
54+
},
55+
{
56+
"source": { "name": "Logger.error", "file": "logger.js" },
57+
"target": { "name": "Logger._write", "file": "logger.js" },
58+
"kind": "calls",
59+
"mode": "static",
60+
"notes": "this._write() — same-class method call"
61+
},
62+
{
63+
"source": { "name": "validate", "file": "validators.js" },
64+
"target": { "name": "checkLength", "file": "validators.js" },
65+
"kind": "calls",
66+
"mode": "static",
67+
"notes": "Same-file function call"
68+
},
69+
{
70+
"source": { "name": "normalize", "file": "validators.js" },
71+
"target": { "name": "trimWhitespace", "file": "validators.js" },
72+
"kind": "calls",
73+
"mode": "static",
74+
"notes": "Same-file function call"
75+
},
76+
{
77+
"source": { "name": "main", "file": "index.js" },
78+
"target": { "name": "buildService", "file": "service.js" },
79+
"kind": "calls",
80+
"mode": "static",
81+
"notes": "Direct imported function call"
82+
},
83+
{
84+
"source": { "name": "main", "file": "index.js" },
85+
"target": { "name": "UserService.createUser", "file": "service.js" },
86+
"kind": "calls",
87+
"mode": "receiver-typed",
88+
"notes": "svc.createUser() — receiver typed via buildService() return"
89+
},
90+
{
91+
"source": { "name": "main", "file": "index.js" },
92+
"target": { "name": "validate", "file": "validators.js" },
93+
"kind": "calls",
94+
"mode": "static",
95+
"notes": "Direct imported function call"
96+
},
97+
{
98+
"source": { "name": "main", "file": "index.js" },
99+
"target": { "name": "UserService.deleteUser", "file": "service.js" },
100+
"kind": "calls",
101+
"mode": "receiver-typed",
102+
"notes": "svc.deleteUser() — receiver typed via buildService() return"
103+
},
104+
{
105+
"source": { "name": "directInstantiation", "file": "index.js" },
106+
"target": { "name": "UserService.createUser", "file": "service.js" },
107+
"kind": "calls",
108+
"mode": "receiver-typed",
109+
"notes": "svc.createUser() — receiver typed via new UserService()"
110+
}
111+
]
112+
}
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import { buildService, UserService } from './service.js';
2+
import { validate } from './validators.js';
3+
4+
export function main() {
5+
const svc = buildService();
6+
const result = svc.createUser({ name: 'Alice' });
7+
if (result && validate(result)) {
8+
svc.deleteUser(1);
9+
}
10+
}
11+
12+
export function directInstantiation() {
13+
const svc = new UserService();
14+
return svc.createUser({ name: 'Bob' });
15+
}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
export class Logger {
2+
constructor(prefix) {
3+
this.prefix = prefix;
4+
}
5+
6+
info(msg) {
7+
this._write('INFO', msg);
8+
}
9+
10+
warn(msg) {
11+
this._write('WARN', msg);
12+
}
13+
14+
error(msg) {
15+
this._write('ERROR', msg);
16+
}
17+
18+
_write(level, msg) {
19+
console.log(`[${this.prefix}] ${level}: ${msg}`);
20+
}
21+
}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import { Logger } from './logger.js';
2+
import { normalize, validate } from './validators.js';
3+
4+
export class UserService {
5+
constructor() {
6+
this.logger = new Logger('UserService');
7+
}
8+
9+
createUser(data) {
10+
const clean = normalize(data);
11+
if (!validate(clean)) {
12+
this.logger.error('Validation failed');
13+
return null;
14+
}
15+
this.logger.info('User created');
16+
return clean;
17+
}
18+
19+
deleteUser(id) {
20+
this.logger.warn(`Deleting user ${id}`);
21+
return true;
22+
}
23+
}
24+
25+
export function buildService() {
26+
return new UserService();
27+
}
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
export function validate(data) {
2+
return data != null && typeof data.name === 'string' && checkLength(data.name);
3+
}
4+
5+
export function normalize(data) {
6+
return { ...data, name: trimWhitespace(data.name) };
7+
}
8+
9+
function checkLength(str) {
10+
return str.length > 0 && str.length < 256;
11+
}
12+
13+
function trimWhitespace(str) {
14+
return str.trim();
15+
}

0 commit comments

Comments
 (0)