feat: data-driven report/metrics config, list-overrides & C/C++/Go/C#/Markdown plugins#14
Merged
Merged
Conversation
…c thresholds, transparent formulas Make the metric/report pipeline fully data-driven from project config and render every metric's derivation in the viewer. Config (code-ranker.toml): - list-override DSL for inherited lists (add/remove/replace/after/before/ prepend/clear), generic in deep_merge and used for the report views. - [report] columns/card/stats overrides at both language and project level, layered catalog -> language -> project; surfaces custom metrics in the UI. - [rules.thresholds.file] now accepts custom [metrics.<key>] keys, validated at load; `check` fires on them with the metric's concern group. - [presets.<ID>] project Prompt-Generator presets, merged over the plugin catalog (same id overrides, new id appends); drives --preset / scorecard. Metrics: - MetricDef gains warning/info two-tier severity thresholds and a `calc` (defaulted from the CEL formula for node-scope) for the live derivation line. - top<N> / top<N>_<reducer> aggregate reducers. - emit the Halstead/AST base counts (eta1/eta2/n1/n2/spaces/branches/span_sloc) on every node so every derived formula renders a "formula = numbers" line; formula_js added/rewritten for effort/length/vocabulary/cyclomatic/mi/mi_sei. - per-language [specs.eta1..n2] descriptions enumerating the exact operator/ operand tokens each grammar counts (shared ecmascript_metric_specs for JS/TS). Viewer: - tooltip box is content-sized (max-width min(560px,92vw)); long formulas / token lists wrap instead of clipping; position() clamps to the viewport. Docs: new docs/customization/ (README + custom-field-example.toml worked example); config.md, node_schema.md, metric-tiers.md, PRD, viewer DESIGN synced. Goldens regenerated for all four languages.
Implements the two deferred map asks from §3 of the customization guide: size the SVG nodes by any metric, and filter the map to a metric's population — both fully data-driven, with the built-in loc/hk size modes and the cycle filter no longer hardcoded in the HTML. Config: - [report] gains `size` and `filter` list-overrides (same DSL as columns/card/ stats); built-in defaults live in builtin.toml `[mapview]` (size = sloc/hk, filter = cycle). build_ui emits `ui.size_metrics` / `ui.filter_metrics`, pruned to keys present on an internal node. Viewer: - size/filter buttons are built per render (renderMapControls) from those ui lists — index.html no longer hardcodes SLOC/HK/cycle; clicks handled by delegation on .size-controls. - metricNodeVal/metricNodeDiam are generic over the active attribute key; a custom metric scales against the median of the rendered population (metricSizeBase, cached in window._sizeBaseCache); loc/hk keep calibrated bases. - the cycle filter generalises to window.nodeFilter: `cycle` uses the cycle set, any other key keeps nodes whose value is non-zero. Docs: customization §3 rewritten as implemented (+ §1.6/§2.2/quick-ref), viewer PRD/DESIGN updated for the data-driven controls. Goldens regenerated (ui block now carries size_metrics/filter_metrics). custom-field-example.toml renamed from code-ranker.toml and wires size=tsr / filter=tsr_big.
…family cognitive
Five new language plugins, registered in the CLI and exercised by e2e goldens:
- go — import graph from the go.mod module path
- c, cpp — #include dependency graph via a shared `cfamily` text-scan module;
metrics via the generic engine (params/name nest under the
function_declarator, handled by `args`/`unit_name` Dialect hooks)
- csharp — using/namespace dependency graph
- markdown — grammar-free: link graph + measured doc metrics (headings,
max_depth, code_lines, links, broken_links) with a broken_links gate
Engine: add `Dialect::args` and `Dialect::unit_name` hooks (default to the prior
behaviour) so languages whose parameters/name nest under a declarator can override
them without touching the shared walks.
Fix C/C++ cognitive complexity dropping `&&`/`||`: tree-sitter-c/-cpp declare the
name `binary_expression` on two symbols (ordinary + preprocessor `#if`), so the
single-id `[roles.one]` lookup resolved to a symbol absent from normal-code trees
and the boolean-run branch was dead. Resolve it as a `[roles.group]` SET instead.
Regression tests + c/cpp goldens updated (cognitive 3->4, stats 2->2.5).
Split `config.rs` (the per-language config reader, every plugin's fan_in hub, HK
~1.15M) into a `config/` facade over parse/views/specs/lookup submodules; the
resolver follows the `pub use` re-exports to the defining file, so fan_in now lands
per-concern (hottest config file ~25K). Lower the dogfood HK ceiling 2M -> 1M.
Tests: +unit tests across the new plugins and config (skip-node guards, non-UTF8
skips, scanner edge forms, resolution fallbacks); deliberate gaps marked inline
with `// COVERAGE:` notes. Workspace coverage 94.79%.
Docs: sync PRD/DESIGN/e2e and the CLI DESIGN to nine plugins + eight wired
grammars; add `contrib/unit-tests.md` raise-coverage playbook and link it from
the `/ud` command.
code-rankerVerdict vs No new violations vs the baseline. 🎉 📦 Full HTML report: see the code-ranker-report artifact on this run. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #14 +/- ##
==========================================
+ Coverage 96.30% 97.50% +1.19%
==========================================
Files 63 119 +56
Lines 9070 13932 +4862
==========================================
+ Hits 8735 13584 +4849
- Misses 335 348 +13 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…t mod.rs
Audit every language against languages/README.md and resolve the findings.
All behaviour-preserving — goldens unchanged; make all green (coverage 94.73%).
go - go.mod `module ` directive + `go.mod` filename -> [structure]
(module_directive/module_file); drop dead `import_spec` key.
markdown- code-fence + link markers -> [markers] (fences/link_open/url_scheme/
mailto_prefix); drop unused detect_markers.
ecmascript - `@/` source-root import alias -> `alias_prefix` config; fix stale
`ecmascript.toml` comments (file is config.toml) in js/ts/dialect.
cfamily - `#include` keyword -> [structure].include_directive (c & cpp);
drop unused detect_markers in c/cpp/csharp.
rust:
- mod.rs is now wiring-only (445 -> 213 lines): extract toolchain probing
(toolchain.rs), the cargo-metadata driver (analyze.rs), the test-attr
predicate (test_attr.rs, shared with module_graph/walk.rs -- was duplicated)
and the test-stripper + file metrics (strip.rs).
- crate-root tie-break filename `lib.rs` -> config (`crate_root_file`).
- resolve.rs dedup keys on the EdgeKind enum (now Hash) instead of its Debug
string.
c/cpp/csharp: delegate metrics()/function_units() to file_metrics/function_nodes
free fns (match the go/python pattern; was inline in the trait methods).
Verified: cargo test -p code-ranker-plugins (197) + --test e2e (37) + make all.
…ete gate File-walk ignore (the user-facing `[ignore]` config): - Add `gitignore` / `ignore_files` / `hidden` to `[ignore]` (CLI IgnoreConfig + `--set` overrides), all default true, threaded via PluginInput. - New shared `crate::walk` (the `ignore` crate) replaces six duplicated `walkdir` loops across go/python/markdown/csharp/c-cpp/ecmascript; honours .gitignore (+ global + .git/info/exclude) / .ignore / hidden, scoped to the analyzed root (`parents(false)`) so an enclosing repo's rules never leak in. Git-faithful (only inside a repo) → e2e samples (non-git) unaffected, goldens unchanged. The Rust plugin resolves files via cargo metadata, so it is exempt. - +4 unit tests for the walker (gitignore / .ignore / hidden / skip_dirs). JS/TS ext->grammar cleanup: - JavaScript: every extension maps to one grammar, so the per-ext match is redundant — collapse to a constant; the `extensions` config is the single source of truth. Incidentally fixes `.cjs` being dropped in `analyze` (it was in `extensions` + metrics but missing from the analyze grammar match). - TypeScript: dedupe the three identical match arms into one `grammar_for`; the only literal left is `tsx` (it alone selects a different grammar TYPE). Dependency hygiene: - Remove unused deps surfaced by cargo-machete: `walkdir` (workspace + cli + plugins), `which` (cli), `anyhow` (graph). - Add a `machete` target to `make lint` (hence `make all`) and a `cargo machete` step to CI `Test & lint` — fails on any unused dependency. Detect-only; `make machete-fix` removes. False positives whitelist under `[package.metadata.cargo-machete]`. Docs: config.md ([ignore] keys), PRD §5 + DESIGN (shared walker + .gitignore), contrib/ci.md (machete in the lint chain + CI job). Verified: make all green (workspace 201 + e2e 37, clippy, machete, lychee, markdownlint, self-check, coverage 94.66%).
…lf-registering plugins Config defaults now live entirely in embedded TOML and are deep-merged with the user's config, so a project overrides only what it spells out and inherits the rest. No default value is hardcoded in Rust. - CLI: new embedded config/defaults.toml is the single source of project-config defaults; load() deep-merges discovered/--config over it. Config::default() is just defaults.toml parsed; dropped the hardcoded IgnoreConfig/CycleRules Default impls and the report.rs path consts. - Plugins: explicit inheritance chain defaults.toml ⊕ [base] ⊕ <lang> via load_chain — ecmascript is now the real base for js/ts, new cfamily/config.toml the base for c/cpp (each carries only its diffs). - Graph/plugins: field-omission defaults (value_type/omit_at, roles one_named) sourced from [defaults] in builtin.toml / defaults.toml instead of Rust literals. - New `report --export-full-config PATH`: dumps the full effective config ([project] = defaults ⊕ --config, [plugin] = merged language config). Plugins self-register via inventory::submit!; the CLI works only through code_ranker_plugin_api::registry() + the LanguagePlugin trait and never names a language. Moved the generic TOML utilities (deep_merge → toml_merge, the list-override DSL) into code-ranker-plugin-api so nothing reaches into the plugins crate; its only mention in the CLI is the `extern crate code_ranker_plugins as _;` link-anchor that lets the linker collect the submissions. Docs (PRD/DESIGN, cli CLI/config/DESIGN, customization) synced to all of the above.
…nalyze level arg, PluginInput.options The trait's is_test_path was never called in production: test-file dropping happens inside each plugin's analyze (via free *_is_test_path predicates used by the walk; Rust drops tests/benches via cargo-metadata target kinds). Remove the trait method + 8 impls; keep the free functions (still tested directly). Rust's orphaned test_dirs config + its test go too. analyze's level param was always "files" and ignored by every impl (functions are served by function_units), so drop it. PluginInput.options + the Options alias were never read (only constructed empty), so drop them. Tidy: ecmascript_is_test_path no longer re-exported (now pub(super)); stale doc refs updated in CLI config + root PRD/DESIGN.
Overview reveal-depth now reaches a files level one step past the deepest folder grouping: the deepest single folder level is kept as clusters (subgraph cluster_files_N) but each is drawn with its real file nodes inside, edges remapped file-to-file. Driven by maxUnderCrateDepth / filesDig / isFilesDig; the dig ceiling auto-extends to it. DIG_MAX 6->8. Fan-in/out connector arrows now land on a circular node's ellipse border via nodeEdgePoint (measuring the <ellipse>, not the group bbox) instead of the circumscribed square's corner; box nodes keep the bbox-edge behaviour. Docs: viewer DESIGN/PRD updated; the P2 'individual files inline in the overview' item is now implemented.
plugin.rs mixed audiences: the parser contract (trait + PluginInput + its
self-registration wiring) and two foreign concerns — prompt-generator data
(Preset) and a project-detection helper (detect_with_marker). That made it the
HK hub: 19 modules reached in, 6 only for Preset, never to parse.
Split off the foreign concerns along the audience seam:
plugin.rs -> trait LanguagePlugin + PluginInput + PluginRegistration/registry
preset.rs -> Preset (prompt-generator domain; read by reporting)
detection.rs -> detect_with_marker (generic project detection)
PluginInput and the registry stay with the trait: PluginInput is the contract's
own argument type, and PluginRegistration/registry() are typed on dyn
LanguagePlugin — the trait's own discovery wiring, same actors. All re-exported
at the crate root, so call sites read code_ranker_plugin_api::{Preset,
detect_with_marker, PluginRegistration, registry}.
HK(plugin.rs): 613.7K -> 311.0K (sloc 68->60, fan_in 19->12). preset.rs has
fan_in 18 but fan_out 0, so its HK is 0 — the coupling dissolves, not relocates.
Docs: DESIGN.md (module layout, Preset location, trait contract); ai-skill.md
(absolute link to the HK principle); principles/rust/henry-kafura-coupling.md
(diagnose-and-split workflow: measure one file, list fan_in/fan_out, find mixed
scenarios, split, verify with a before/after diff report). No behaviour change;
goldens unchanged.
… TOML
Hardcoded human text in the CLI violated 'all vocab/prompt text lives in
config'. Move it to the data-driven catalogs, behind a new spec field, and have
the CLI read it from the snapshot — so the CLI names no metric and stores no
prose, and a language can reword diagnostics with no Rust change.
- New `remediation` spec field (the check `fix` line) on AttributeSpec /
CycleKindSpec / FieldDef / MetricDef.
- builtin.toml gains the metric `why`/`fix` (remediation on cyclomatic/cognitive),
the coupling specs (fan_in/fan_out/hk/cycle, migrated out of coupling_specs()
in lib.rs), a [cycles.*] vocab catalog, and a [prompt] scaffolding block.
- The orchestrator overlays cycle vocab onto each level's cycle_kinds; structural
metrics (loc/items/unsafe/headings/links/…) carry description+remediation in
their language configs.
- CLI: delete the RULES prose catalog; rule_doc() resolves why/fix/title from the
snapshot's node_attributes + cycle_kinds (works on snapshot input too); threaded
into human + SARIF diagnostics. Concern codes (CPX/CPL/SIZ) + rule_tuning stay.
- Prompt scaffolding: a PromptTemplate carried in the snapshot, sourced from
builtin.toml [prompt], read by BOTH the CLI compose_prompt and the viewer JS
composePrompt — single source, {id} substituted at render.
- Regenerated all 9-language e2e goldens; synced DESIGN/PRD/node_schema/ERRORS/
viewer/customization docs.
make all + make e2e green (coverage 94.99%); behaviour unchanged.
Add [rules.checks.<id>] — a CEL boolean `when` predicate evaluated per file as a
second pass over the fully-built graph, producing `check.<id>` violations like a
threshold/cycle rule. The predicate sees node attributes, derived path fields
(path/name/stem/ext/dir), graph edges (deps/rdeps + depends_on/depended_on_by),
file collections (files/siblings + file_exists), string fns (ends_with/…/matches)
and CEL list macros. [rules.defs] adds reusable named helpers expanded into `when`
(cycle-checked). No engine/CEL change for callers — rides the existing check path.
Unify the report/ui vocabulary end to end: LevelUi/JSON fields drop the `_metrics`
suffix (card/size/filter/sort/summary) so config [report] keys = JSON ui keys = JS;
the metric catalog merges [tableview]/[cardview]/[mapview]/[report.json.aggregate]
into one [report] (+ [report.stats]) mirroring the project override, and `featured`
becomes `card`. The SVG status bar now reads ui.card (was hardcoded hk/sloc); dead
tooltip renderers removed. Snapshot schema_version 2 → 3 (field rename is breaking).
Fix: external-crate dependency labels resolved to their cargo-registry path, so
depends_on("ext:<crate>") never matched — external nodes now keep their ext:<name>
id for matching.
Docs: customization/README §1.8, cli config.md/ERRORS.md/PRD.md, root PRD/DESIGN,
node_schema, e2e synced; runnable examples linters-example.toml + architecture-linters.toml.
Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
…/types/traits)
The Rust plugin now emits, per file node, six comma-joined string attributes from
its production AST (test items excluded): `derives` / `macros` / `attrs` /
`imports` ("uses X" sets) and `types` / `traits` (names defined in the file). The
module-graph syn walk already visited derives/uses/attributes; a FactsCollector
gathers them (imports reuses the bare-path collector) and collapse emits them.
This makes the symbol-level dylint rules — which edges alone could not reach —
expressible as config [rules.checks] via contains()/matches(): no serde/ToSchema
derive in a layer (DE0101/0102), no api_dto attr (DE0104), no schema_for! macro
(DE0110), no print macros (DE1301), DTO/Client naming (DE0201/0503). Added to
docs/customization/architecture-linters.toml.
String-valued so they stay off the numeric table/scorecard surfaces; specced in
rust/config.toml [node_attributes]. Rust-only (like unsafe/items) — rust golden
regenerated; other languages unaffected.
Docs: node_schema.md, cli config.md, customization/README.
Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
…r deps
Make the two CEL contexts (metric formulas, check predicates) speak the same
language, and document it.
Engine:
- Drop the custom string stdlib (ends_with/starts_with/contains/matches) — cel-rust
ships these natively (contains/startsWith/endsWith/matches regex, as method or
function). Removes the direct `regex` dependency (kept transitively via `cel`).
- Math host functions (pow/log2/sqrt/…) now registered in checks too, not just
metrics — shared `register_math`.
- Path fields (path/name/stem/ext/dir) now available in metric formulas, not just
checks — so a formula can branch on location (`path.contains("/generated/") ? 0 : hk`).
Shared path logic extracted to a `nodepath` module.
- `agg(metric, reducer, population)` now usable inside check predicates for
relative thresholds (node vs project distribution, e.g.
`cyclomatic.double() > agg('cyclomatic','p90','not_empty')`), memoized per run
so each distinct aggregate is reduced once, not once per file.
Docs:
- New docs/customization/cel-reference.md — full agent-oriented CEL reference
(language, cel-rust stdlib, what's in scope in metrics vs checks).
- Remove the linter example configs (architecture-linters/linters-example); the
config-only-linter direction is not being pursued. README/config.md updated to
native CEL names and the new reference.
Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
`--config <file>` is repeatable: previously only the first file was read and the rest silently ignored (`files.first()`). Now every file is deep-merged over the built-in defaults in command-line order, last wins; inline `KEY=VALUE` applies after all files; passing any file still disables auto-discovery of `code-ranker.toml`. `discover_user_table` → `discover_user_tables` returns the ordered layers; the log line shows the merge order (`a.toml ⊕ b.toml`). Docs: - config.md — "Multiple --config files" section; a "full layer stack (with sources)" table linking each built-in layer on GitHub (plugin base ⊕ family ⊕ <lang>, metric catalog, project defaults); the big example renamed to `code-ranker-rust-example.toml` with a note that auto-discovery needs the literal name `code-ranker.toml` and that a real config is partial. - CLI.md — `--config` flag + precedence list updated for repeat/order. Coverage: the new explicit-multi-file branch is unit-tested (`load_layers_multiple_config_files_in_order_last_wins`). The retyped discovery lines (cwd/workspace `code-ranker.toml`, `Cargo.toml` metadata) stay uncovered by design — they read the process cwd / real filesystem and can't be isolated in a unit test (the repo's own ./code-ranker.toml wins cwd discovery first). Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
…-lens presets - check: add --focus <PATH> — gate a subset of files (whole project still analyzed) - check: add --output-format prompt — emit an AI fix-prompt from the gate's own violations (one command gates and, on failure, prints the prompt) - report: replace --preset with --metric (narrow the scorecard by ranking axis); --output.prompt is auto-targeted and requires --top 1 - presets: remove the 4 metric-lens presets (HK/SLOC/FANIN/FANOUT) and the `slug` field — a preset's doc_url now resolves from its `id` - principles: per-metric fix-prompt docs (HK/Fan-in/Fan-out/Cognitive/Cyclomatic/ Cycles) with industry-case filenames; merge mutual/chain → Cycles; remediation now points to the metric's principle doc - docs: move what-is-cycle → docs/cycles.md; delete redundant metric-thresholds & module-size; add USE-CASES.md; sync CLI/PRD/DESIGN/config/viewer/customization Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
…splits
Tighten the dogfood gate in code-ranker.toml (hk, cyclomatic, cognitive) and
bring the codebase under it without behaviour change:
- Extract large inline `#[cfg(test)]` blocks into sibling `*_test.rs` files
(wired via `#[cfg(test)] #[path = …] mod tests;`), satisfying the new
`inline_tests_too_large` custom check.
- Split files over the per-file cyclomatic budget by relocating cohesive
function groups into sibling submodules:
registry.rs -> registry/model.rs (leaf data types) + registry/eval.rs
check.rs -> check/values.rs
pipeline.rs -> pipeline/helpers.rs
ecmascript/structure.rs -> ecmascript/imports.rs
Shared types/statics moved DOWN into leaf modules (model.rs; MODULE/
file_to_mod_path into imports.rs) so the parent depends on the leaf one-way —
avoiding the parent<->child ADP cycle a naive extraction would create.
Docs:
- principles/rust/HK.md: add "When a hub is legitimate (accept, don't game)".
- principles/rust/Cyclomatic.md & Cognitive.md: were stray copies of KISS.md;
rewrite as the real metric principles, incl. the split/cycle-trap workflow
that the cyclomatic/cognitive `remediation` links now point to usefully.
make all green: build, test (cli/graph/plugins/e2e), clippy, lint-md,
self-check (0 violations), coverage >=90%.
Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
…g; tighten gate
Viewer — restore the per-level switcher (was a `updateFilesTab(){}` stub):
- `[levels] functions = true` already produces a `functions` graph level in the
JSON; the HTML viewer now exposes it. `updateFilesTab` builds a `.report-switch`
tab row + clones the `files` `.view` per extra level (rendering is already
level-agnostic). Hidden for single-level reports, so they look unchanged.
check — `--suggest-config` is now fully data-driven: the threshold metrics come
from `config::metrics` × the snapshot's attribute specs (numeric, not
higher-better), in report-column order — no hardcoded metric array.
Bring the codebase under the tightened dogfood gate (hk=500K, cyclomatic=100,
cognitive=70, sloc=400) via behaviour-preserving, cycle-safe module splits:
cognitive: scorecard / collapse / python structure (lift nested blocks)
cyclomatic: config/load→load/overrides, walk→visitors, check→check/sarif
sloc: pipeline→pipeline/assemble, walk→visitors, checks→checks/text,
python/structure→python/imports (shared MODULE moved down to a leaf)
Docs: viewer PRD/DESIGN document the level switcher; config.md notes the viewer
surfaces the `functions` level.
make all + make e2e green; self-check 0 violations; coverage ≥90%.
Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
…--focus-path
Advisory tiers now derive from the `check` gate, not language calibration:
- Remove the language-calibrated `[thresholds.*]` system (plugin `thresholds()`,
`ThresholdCfg`, rust `[thresholds]`). `AttributeSpec.thresholds` is now built
from `[rules.thresholds.file]`: the limit is the `warning` tier, a `[metrics.<k>]`
`info` is kept only when below it. Scorecard/viewer/prompt show exactly what the
gate enforces; an unconfigured metric has no breaches (drop the `{0,0}`/HK-fallback
that read "everything breaches").
- Tune the project's own `hk` gate 500K→350K (ratchet just above plugin.rs + ~1
language of growth); document the rationale.
Recommendation focus, mirroring `check`'s flag family:
- Replace report `--metric`/`--focus` with `--focus-rule` (a metric `hk`, a full
rule id `threshold.file.hk`, or a principle id `LSP`; metrics matched by value)
and add `--focus-path` (restrict ranked modules to a subtree; cycles stay global).
Applies to both scorecard and prompt.
- A metric focus frames output by the metric itself (no SOLID wrapper) via a
synthesized preset; fix the duplicate metric description in the metric-lens prompt.
Docs & principles:
- Delete `principles/rust/Cycles.md` (a verbatim copy of `ADP.md`); repoint the
cycle metric remediation → `ADP.md`; drop stale cross-refs.
- Add `docs/installation.md` (extracted from README) with a "which channel" guide;
slim the README install section; sync CLI/USE-CASES/PRD/DESIGN/config/ai-skill.
Move pipeline tests to `pipeline_test.rs` (inline block exceeded the TST gate).
Regenerate goldens; add unit tests for resolve_focus / synth_metric_preset /
in_focus / metric-lens prompt / principle-focus scorecard.
Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
… doc inheritance
Rename the principle/metric doc corpus principles/ → languages/ and add a
shared base/ corpus that languages without their own docs inherit, mirroring
the defaults.toml ⊕ <lang>.toml config inheritance.
- specs.rs: doc_url resolves to {doc_base}/{doc_lang}/{id}.md for the ids a
language overrides (doc_overrides = "*" or [ids]), else {doc_base}/base/{id}.md
- defaults.toml: doc_base → .../main/languages; rust/python/typescript/javascript
declare doc_overrides="*"; go/c/cpp/csharp/markdown inherit base/ (fixes their
previously dead /principles/<lang>/ doc links)
- builtin.toml remediation URLs → languages/base/ (neutral metric docs)
- base/ seeded from rust/, then fully de-Rust-ified into language-neutral prose
- goldens, dialect/config tests, docs (PRD/DESIGN/CLI/viewer/node_schema/
metric-correctness/config-resolution), and Makefile/markdownlint/CI globs updated
Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
…_cel/formula_pretty/formula_js Both config surfaces named the same three concepts differently: built-in cel/formula_human/formula_js vs user-config formula/formula_pretty/calc, and the bare `formula` key meant the executable CEL while the wire `formula` meant the display string. Collapse to one family on both surfaces: formula_cel executable CEL formula formula_pretty human-readable display formula formula_js JS the viewer re-runs for the live derivation line The wire encoding (AttributeSpec.formula/calc) is intentionally unchanged, so JSON reports, the viewer, and golden fixtures are untouched. Pure rename — behaviour and emitted values are identical.
…c inheritance Two related strands on the tier-2/doc tooling. hk → CEL (TIER1 → graph → TIER2): - Move Henry–Kafura from a hardcoded Rust pass to a data-driven `[fields.hk]` `formula_cel = "sloc * pow(fan_in * fan_out, 2.0)"`. `hk.rs::annotate_hk` → `annotate_coupling` (fan_in/fan_out/fan_out_external only); a new post-graph `write_derived` stage evaluates graph-derived `[fields.*]` over node attrs once coupling is present. Engines split pre-graph (`DERIVED`, raw tier-1) vs post-graph (`GRAPH_DERIVED`), partitioned by whether a formula references a graph key. - Values are bit-identical (e2e goldens unchanged except markdown, which has no `sloc` → no `hk`, regenerated surgically). - Cycle-safe split: the metric-writing machinery moves to `builtin/write.rs` so `builtin.rs` drops back under the `hk` regression-ratchet gate. Manifest-based doc inheritance: - `compose.rs` reworked: a `languages/<lang>/<ID>.md` manifest lists sections in order — `<!-- doc:base "X" [from "P1"] [to "P2"] -->` pulls (a slice of) a base section, inline `## ` sections are written verbatim, and the manifest may carry its own H1/TL;DR head (else inherits base, suffixed `(in <Lang>)`). Replaces the `_overlay/` patch model. A doc is a manifest iff it contains a `doc:base` include. - All `rust/` principle+metric docs migrated to manifests; `python/DRY`, `typescript/OCP` too. Theory/boilerplate sections reuse base; language-specific sections (examples, crate-level, extra refs) stay inline. - Prompt scaffolding prose now lives in `metrics/prompt.md` (internal, not a published corpus doc). Docs synced: templates.md, DESIGN.md, CLI.md, metric-tiers/correctness. make all + e2e green; self-check passes.
…↔doc alignment
Prompt scaffolding & rendering (CLI `recommend/prompt.rs` + viewer `export-popup.js`,
kept in parity):
- doc-note now points at the offline `code-ranker report --doc {id}` command instead
of a network URL — works without a connection. `{id}` substituted in `doc_note`
too (`metrics/prompt.md`).
- Connection lists filter to FLOW edges only (`uses`) — `contains`/`reexports` are
noise in the crossroads and don't drive HK/coupling (matches the viewer's
`edgeIsFlow`).
- Single-focus (`--top 1`): drop the repeated focus path — `in` shows `dependant:line`,
`out` shows `target (uses, line N)`; heading becomes `## Target module`. Several
foci keep the full `source:line → target`.
- Metric values print exact integers (e.g. `295488`), never abbreviated — `abbreviate`
is now a viewer-only display flag. The per-module `**Formula:**` line is dropped
(it lives in `--doc`).
prompt ↔ doc consistency:
- HK doc H1 aligned to the preset title `HK — Henry–Kafura` (base + rust manifest).
- HK `description` reworded to echo the doc TL;DR, so the prompt Summary and the doc
correspond.
- HK doc workflow uses the CLI (`--output.scorecard --focus-rule hk`, `--prompt HK`)
instead of `jq`-parsing JSON.
- Drop the duplicated own-head from manifests whose TL;DR equalled base (HK, Cognitive,
DRY, Fan-in, Fan-out) — they inherit the base head + `(in <Lang>)`.
Docs synced (templates.md, ai-skill.md, DESIGN.md). Goldens patched (doc_note +
description). make all + e2e green.
…flows
- code-ranker-cli/{CLI,DESIGN,ERRORS}.md: document the `--prompt <ID>` / `--doc <ID>`
named prompt/doc output and the maintenance-only `docs` subcommand; note the
offline `--doc` doc-read path; tighten the `hk` description.
- languages/base/{Cyclomatic,Cognitive}.md: replace the stale "dump a JSON snapshot
and inspect it" step with the native `report --output.scorecard --focus-rule
<metric>` triage (no jq), plus the function-level + HTML-viewer path for the
per-function breakdown — consistent with HK.md's "no snapshot to parse" workflow.
Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
`collapse.rs` showed 0% line coverage because it is only exercised by the subprocess-based e2e tests (llvm-cov can't see a spawned binary). Add direct unit tests for the pure `collapse_to_files` transform — module/inline folding, external-crate nodes, edge re-pointing (drop crate→crate and self edges), the file-backed source-of-truth rule, and reexport visibility — lifting it to ~92%. Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
- Cargo.toml/Cargo.lock: workspace version 3.0.0-alpha.1 → 3.0.0 (via `make bump`; all crates inherit `version.workspace`). - docs: drop the stale `--version 1.1.0` pin from the cargo install commands (README + installation.md) and point Docker pulls at `:latest`, so the install docs don't go stale on every bump; refresh the example-snapshot version (`1.0.0-alpha.4` → `3.0.0`) in PRD/DESIGN; drop the now-inconsistent "Pin a specific version" line from the README status. - Makefile: `make bump` now lists doc files that mention a version (cargo `--version`, alpha/beta/rc, and `code-ranker@`/`:` npm+docker tags) so version refs the bump can't auto-edit are surfaced for manual review. Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
`node-popup.js` interpolated `esc(n.id)` / `esc(node.id)` into `data-diag-node` / `data-node-id` attribute values, but `esc` (escHtml) does not escape `"` — a node whose id contains a double quote (e.g. an oddly-named analyzed file) could break out of the attribute. Use `escA` (escAttr), which also escapes `"`, for both attributes. Clears CodeQL js/incomplete-html-attribute-sanitization (#18, #19). (The high js/xss-through-dom on tooltip.js is a verified false positive — every input is escHtml-escaped before innerHTML — dismissed in code scanning.) Claude-Session: https://claude.ai/code/session_013R7NfedZh9uEkUQrSRGWR7
Add unit tests for previously-uncovered branches: - templates.rs: resolve_doc (base/manifest/override/metric/error), build_corpus, lang_display - config/load.rs: code-ranker.toml + Cargo.toml metadata auto-discovery - overrides.rs: templates.languages/prompt inline keys + remaining ignore flags - assemble.rs: default-Level fallback, grouping-with-function - plugins/specs.rs: selective doc_overrides array form - recommend/prompt.rs: single-focus in/out edge abbreviation - check/values.rs + check.rs: early-return/false branches in print helpers
| @@ -0,0 +1,338 @@ | |||
| //! The embedded doc corpus and per-file overrides. | |||
The dogfood gate flags >N inline test lines per file; move the new report.rs / templates.rs test modules into report_test.rs / templates_test.rs (matching check_test.rs et al.).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Makes the metric/report pipeline fully data-driven from project config, adds five new language plugins, and extends the viewer.
Config
add/remove/replace/after/before/prepend/clear), generic indeep_merge.[report]overrides forcolumns/card/stats/size/filterat language and project level (layered catalog → language → project); surfaces custom metrics in the UI.[rules.thresholds.file]accepts custom[metrics.<key>]keys, validated at load;checkfires on them.[presets.<ID>]project Prompt-Generator presets merged over the plugin catalog.Metrics
MetricDefgains warning/info two-tier severity and acalcfor the live-derivation line.top<N>/top<N>_<reducer>aggregate reducers.New language plugins
go.modmodule path#includedependency graph via a sharedcfamilytext-scan module; fixes cognitive complexity dropping&&/||(tree-sitter declaresbinary_expressionon two symbols)using/namespacedependency graphbroken_linksgateEngine: adds
Dialect::argsandDialect::unit_namehooks.config.rsis split into aconfig/facade (parse/views/specs/lookup).Viewer
Tests & docs
docs/customization/andcontrib/unit-tests.md.