From 836ed322679ccc3eb9559563f13544735ae2551f Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sat, 20 Jun 2026 13:35:18 +0800
Subject: [PATCH 01/14] Prove TextMate-generator completeness (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add a formal completeness proof for src/gen-tm.ts: for every grammar
expressible through the public src/api.ts combinators, every
TextMate-representable highlighting obligation is emitted and reachable.
This is the dual of the soundness ledger (KNOWN-GAPS.md, which finds
WRONG paints) — here we prove there are no MISSING ones.

The proof rests on the generator's input being a closed, finite algebra
(RuleExpr / TokenPattern), which makes "every obligation" enumerable and
reduces completeness to three mechanically-checked layers:

- closure — toRuleExpr, the token-pattern compiler, and the shared
  collectLiterals backbone are total over their unions, so no supported
  combinator shape is silently dropped;
- coverage — every non-skip token has a discharge path (the token
  census), and every content/keyword obligation leaf is painted
  (2433/2433 across the six grammars), on a fixed denominator;
- reachability — every emitted repository key is reachable from the root
  patterns or a declared export surface (#expression / canonicalRepoNames
  / aliasScopes); zero dead keys.

The Layer-A audit surfaced one latent silent-drop:
getTypeParamElementKeywords omitted `sep`, so a keyword in a sep-list
within a type-parameter element lost its keyword role inside `<…>`. Fixed
by recursing into sep.element (`not`/`ref` stay omitted on purpose — a
forbidden word / a constraint type's own keywords must not be hoisted);
the six shipped grammars are byte-identical (the drop is latent), and the
checker carries a biting regression guard.

Three sites that looked like TextMate impossibilities — variable-width
lookbehind for the cast/arrow value test, the balanced-paren arrow
confirm, and a regex after a control-flow head — were attacked and
refuted: each is expressible in vscode-oniguruma (verified), and the
fixed-width forms gen-tm emits are deliberate Onigmo-portability choices.
They are a soundness-precision axis, not a completeness gap.

COMPLETENESS.md is the proof spine; test/tm-completeness.ts mechanises it
(npm run completeness[:check]) and joins `npm run check` as a gate.
---
 COMPLETENESS.md         | 187 ++++++++++++++
 package.json            |   3 +
 src/gen-tm.ts           |   7 +
 test/check.ts           |   1 +
 test/tm-completeness.ts | 541 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 739 insertions(+)
 create mode 100644 COMPLETENESS.md
 create mode 100644 test/tm-completeness.ts

diff --git a/COMPLETENESS.md b/COMPLETENESS.md
new file mode 100644
index 0000000..99f63a3
--- /dev/null
+++ b/COMPLETENESS.md
@@ -0,0 +1,187 @@
+# Total derivation: the completeness spine
+
+Why `src/gen-tm.ts` emits *every* TextMate construct the grammar requires — not by
+testing a corpus, but because the generator's input is a **closed, finite algebra**, so
+"every obligation" is enumerable and each is discharged by a reachable emission. This is
+the dual of the soundness ledger (`KNOWN-GAPS.md`, which finds *wrong* paints); here we
+prove there are no *missing* ones. The proof is held exact by `test/tm-completeness.ts`
+and the ledger at the end.
+
+## The contract
+
+For every grammar `G` built from the public `src/api.ts` combinators and lowered through
+`defineGrammar()`:
+
+1. **Closure** — `G` is a value of the closed `RuleExpr` / `TokenPattern` algebra (plus a
+   finite set of config records). Nothing the API can express falls outside it; nothing in
+   it is unreachable from the API.
+2. **Coverage** — every highlighting obligation `G` induces (a token scope, a keyword, an
+   operator, a region, an embed, a disambiguation, a config-driven construct) is emitted by
+   `generateTmLanguage(G)`.
+3. **Reachability** — every emitted repository entry is reachable from the root patterns or
+   from a declared export surface; conversely every export surface resolves.
+
+Three separations keep the claim honest:
+
+- **Parser** completeness (does `G` accept the language?) is a *different* axis, measured by
+  the conformance run and `test/src-coverage.ts`.
+- **Highlighter** completeness (this document) is *coverage*: every obligation is
+  recognised and scoped. Whether the scope is the *right* one at an ambiguous frontier is
+  **soundness** — `test/scope-gap.ts` and `test/gap-ledger.ts`, a separate axis.
+- **TextMate-engine** expressiveness (can the regex model express the obligation at all?)
+  is bounded by Oniguruma, not by Monogram; §"The frontier" settles where that bound
+  actually lies.
+
+This is **not** the README corpus metrics (empirical agreement with external oracles). It
+is the formal derivation property: the map from the DSL grammar to the emitted grammar
+loses nothing representable.
+
+## Why a closed algebra makes this finite
+
+A TextMate grammar written by hand has no completeness theorem — there is no enumerable set
+of "everything it should match." Monogram's does, because `generateTmLanguage` consumes a
+value of a **closed union**: `RuleExpr` has 15 constructors and `TokenPattern` has 10
+(`src/types.ts`), plus the finite config records (`TokenDecl`, `MarkupConfig`,
+`IndentConfig`, `NewlineConfig`, the Pratt tables, `scopeOverrides`, `canonicalRepoNames`,
+`aliasScopes`, `expressionRule`, `manifest`). An *obligation* is induced by a
+constructor-occurrence or a config-field-occurrence. So completeness reduces to: **for each
+obligation generator, the generator has a discharging, reachable emission** — three
+mechanically-checkable layers.
+
+## Layer A — closure: the universe is the algebra, and lowering is total
+
+**A1 — API lowering closure.** `toRuleExpr` (`src/api.ts`) is a total function with a finite
+case analysis ending in a `throw`: it never silently drops an element, and its image is
+exactly the `RuleExpr` union. Witnessed by instantiating *every* public combinator and
+marker into one grammar and confirming the lowered bodies use all 15 constructors and
+nothing outside them (`checkRuleExprClosure`). `formatExpr` in `src/cli.ts` is an
+independent exhaustive `switch` over the same 15 — a second guard that the union is closed.
+
+**A2 — TokenPattern compiler closure.** `tokenPatternToRegex`'s `emit` (`src/token-pattern.ts`)
+is a single `switch` over the 10 `TokenPattern` constructors with no `default` — TypeScript
+exhaustiveness makes a missing case a compile error. Witnessed by compiling every public
+token builder to a regex (`checkTokenPatternClosure`).
+
+**A3 — the literal-collection backbone is total.** Flat keyword / operator scoping is driven
+by the shared `collectLiterals` (`src/grammar-utils.ts`), looped over every rule body. It
+recurses into *all consuming* structural constructors (`seq`, `alt`, `quantifier`, `group`,
+`sep`) and omits only the ones that carry no consumed literal: `not` (a negative lookahead —
+the word is *absent* at the site) and `ref` (a cross-rule edge, collected when that rule's
+own body is walked). So no consumed literal is silently dropped, and the flat keyword
+obligation is discharged for *any* nesting. This is why a naïve end-to-end keyword probe is
+vacuous — `collectLiterals` already covers every nesting (`checkCollectLiteralsClosure`).
+
+The residual silent-drop risk therefore lives only in the **specialised region walkers** that
+do *not* use `collectLiterals` (they hoist keywords out of a derived `<…>` / region scope).
+Auditing the 48 RuleExpr walkers in `gen-tm.ts` found exactly one reachable gap:
+`getTypeParamElementKeywords` omitted `sep`, so a keyword inside a `sep`-list within a
+type-parameter element lost its keyword role inside `<…>`. No shipped grammar nests a keyword
+that way (TS type-param keywords are direct), so it was latent — but it is a *supported*
+combinator shape silently ignored, so it is **fixed** (one line: recurse into `sep.element`;
+`not` stays omitted on purpose — a forbidden word; `ref` stays unresolved so a constraint
+*type*'s own keywords like `keyof`/`typeof` are not mis-hoisted). The fix is byte-identical on
+all six shipped grammars (latent), and the `kwsep` probe in `regionKeywordProbe` is a biting
+regression guard (it fails without the fix).
+
+## Layer B — coverage: every obligation has a reachable discharge
+
+The obligation families, enumerated from `G`'s closed algebra **independently of gen-tm's own
+detectors** (a detector that missed a shape would otherwise also miss its obligation —
+co-blind):
+
+- **Tokens.** Every non-`skip` token bears a leaf-scope obligation, discharged by exactly one
+  family: the flat token loop (a `#<name>` entry), the regex-literal family (`regex`-flagged),
+  the indent/markup engine (a `never()` placeholder the region machinery replaces), the markup
+  region machinery (a `markup` grammar emits no per-token keys), or a region that owns the
+  token's delimiter (the JSX `/>` / `</` punctuation). `tokenCensus` classifies every token and
+  asserts **zero orphans** — the emitter-completeness proof for tokens.
+- **Keyword literals & Pratt operators** are discharged through the flat backbone (A3) and the
+  prec-table path; the `op`/`prefix`/`postfix` markers carry no literal (they route to
+  `collectLiterals`' default), and an operator's scope comes from the prec-table value, not from
+  a walked marker — so those three constructors being unwalked anywhere is *benign*, confirmed
+  by adversarial review.
+- **Shapes** (JSX elements, generic/cast angle brackets, regex context, declarations, ternary,
+  conditional types, arrow params, contextual operators/modifiers) and **config surfaces**
+  (markup, indent, newline, `expressionRule`, `aliasScopes`, `canonicalRepoNames`, `manifest`,
+  `inject`) each emit a family of repository entries; that the detectors fire on the *shape*
+  rather than on TypeScript-specific names — the detector-completeness requirement — is held by
+  `test/agnostic.ts` (synthetic grammars with deliberately non-TS names/delimiters).
+
+The empirical witness that all of the above actually paint is **leaf coverage**: over the
+deterministic grammar-derived corpus (`test/grammar-gen.ts`), every parsed leaf whose
+by-construction role (`buildRoleMap`) is a content/keyword role (keyword / string / number /
+comment) is confirmed to receive a non-root scope. The denominator is fixed (the obligation
+leaves); the metric is non-vacuous (deleting a discharging repository key drops it below 100%).
+Result: **2433/2433 across all six grammars.**
+
+## Reachability — root ∪ export surfaces
+
+Reachability is the transitive `#include` closure from the root `patterns`, **plus the declared
+export surfaces** an external embedder reaches by a `<scope>#<key>` include: the `#expression`
+sub-grammar (`expressionRule`), the `canonicalRepoNames` official keys, and `aliasScopes`. These
+are root-unreachable *by design* — they are the grammar's public repository API. A naïve
+root-only reachability flags ten keys as dead; the export-surface-aware closure flags **zero**.
+A `canonicalRepoNames` entry whose structural *source* is absent in a shared map (e.g. `type`/
+`new-expr` in JavaScript, which has no type layer; `cast` in `.tsx`, where `<T>expr` is JSX, so
+only `as`-casts exist) induces no obligation and is correctly inert — distinct from a dangling
+reference with a *present* source, of which there are none.
+
+## The frontier — no proven impossibility
+
+Three sites looked like TextMate impossibilities; under adversarial attack (the project's
+discipline: a "can't" must survive a real attack before it is recorded), all three were
+**refuted** with constructions tested in the production engine:
+
+- **Cast/arrow "not after a value" across unbounded whitespace.** gen-tm emits a fixed-width
+  negative-lookbehind ladder (`\s{k}`, k=0..16). The exact unbounded condition is a single
+  variable-width lookbehind `(?<![\w$)\]]\s*)`, which **vscode-oniguruma compiles and runs
+  correctly** (verified: suppresses `a   <b`, fires at expr-start). An Onigmo-portable
+  alternative also exists (a region that *owns* the post-value whitespace boundary instead of
+  looking behind it).
+- **Arrow param list with nested parens** `(a = f(1)) => x`. The single-level `[^()]*` lookahead
+  breaks at the inner `(`, but Oniguruma's recursive subroutine `(?<P>\((?:[^()]|\g<P>)*\))`
+  matches balanced parens at arbitrary depth in a begin lookahead (verified to compile + match).
+- **Regex after a control-flow head** `if (a) /re/`. A variable-width positive lookbehind
+  `(?<=\b(?:if|while|for|with)\s*\([^()]*\))` (or the recursive form for nested heads)
+  **compiles and matches in vscode-oniguruma** (verified: matches `if (a) /`, not `a / b`).
+
+So none of these is a model impossibility. Each is (a) directly expressible in vscode-oniguruma
+— the engine VS Code actually runs — and (b) approximated by a fixed-width form *deliberately*,
+for **Onigmo portability** (RedCMD's YAML grammar runs under Onigmo, which rejects variable-width
+lookbehind; the same source must compile under both, see `test/redcmd-tm-diagnostics.ts`). And
+each is a **soundness-precision** matter, not a completeness gap: the `<` / `/` / arrow *is*
+recognised and scoped; what is refined at the frontier is *which* role at the ambiguous boundary.
+Improving that precision (var-width forms for the `vscode-oniguruma`-only grammars, `\g<>` for the
+arrow region) is a separate, soundness-gated change. **The completeness obligation is discharged.**
+
+## The proof ledger
+
+The fixed denominator is every measured obligation (token discharge + repository reachability +
+leaf painting), summed across the six grammars; the numerator is the discharged count.
+Auto-generated by `node test/tm-completeness.ts --write`; `--check` fails CI if it is stale.
+
+<!-- COMPLETENESS-LEDGER:START — auto-generated by `node test/tm-completeness.ts --write`; do not edit by hand. -->
+
+| Grammar | Tokens | Keyword literals | Operators | Repo keys (reachable) | Leaf obligations (painted) |
+|---|---:|---:|---:|---:|---:|
+| typescript | 11/11 | 73 | 53 | 158/158 | 199/199 |
+| javascript | 11/11 | 48 | 51 | 103/103 | 131/131 |
+| typescriptreact | 13/13 | 73 | 53 | 171/171 | 169/169 |
+| javascriptreact | 13/13 | 48 | 51 | 116/116 | 121/121 |
+| html | 7/7 | 0 | 0 | 28/28 | 175/175 |
+| yaml | 19/19 | 0 | 0 | 54/54 | 1638/1638 |
+| **total** | **74/74** | **242** | **208** | **630/630** | **2433/2433** |
+
+**Fixed-denominator completeness: 3137/3137 = 100.00%** (token discharge 74/74 · repository reachability 630/630 · leaf painting 2433/2433). Keyword literals (242) and Pratt operators (208) are discharged through the leaf-painting column. **0 open completeness gaps.**
+
+<!-- COMPLETENESS-LEDGER:END -->
+
+## The gates that hold this exact
+
+- `test/tm-completeness.ts` — Layer A closure (RuleExpr / TokenPattern / `collectLiterals`), the
+  `sep`-recursion regression guard, reachability, the token census, and leaf coverage with a fixed
+  denominator. `npm run completeness` prints it; `npm run completeness:check` gates the ledger.
+- `test/agnostic.ts` — detector shape-completeness: the detectors fire on structure, not on TS
+  names, so "every shape that bears the obligation is detected" holds for any grammar.
+- `test/scope-gap.ts`, `test/gap-ledger.ts` — the **soundness** axis (is each painted scope
+  correct?), the dual of this document, kept separate on purpose.
diff --git a/package.json b/package.json
index 04016c6..c578977 100644
--- a/package.json
+++ b/package.json
@@ -36,6 +36,9 @@
     "coverage:table": "node test/coverage-table.ts --write",
     "ledger": "node test/gap-ledger.ts --write",
     "ledger:check": "node test/gap-ledger.ts --check",
+    "completeness": "node test/tm-completeness.ts",
+    "completeness:check": "node test/tm-completeness.ts --check",
+    "completeness:write": "node test/tm-completeness.ts --write",
     "ledger:selftest": "node test/gap-ledger-selftest.ts",
     "ledger:issues": "node test/gap-issues.ts",
     "ledger:issues:dry": "node test/gap-issues.ts --dry-run",
diff --git a/src/gen-tm.ts b/src/gen-tm.ts
index ffd2430..46372f5 100644
--- a/src/gen-tm.ts
+++ b/src/gen-tm.ts
@@ -3110,6 +3110,13 @@ function getTypeParamElementKeywords(body: RuleExpr, grammar: CstGrammar): strin
     if (e.type === 'literal' && isKeywordLiteral(e.value)) keywords.push(e.value);
     if (e.type === 'seq' || e.type === 'alt') e.items.forEach(walk);
     if (e.type === 'quantifier' || e.type === 'group') walk(e.body);
+    // A keyword reached through a `sep` sub-list of the element is just as direct as one in a
+    // seq/alt — recurse into its element so it is hoisted too (e.g. a type-param whose constraint
+    // is a `&`-separated list carrying a keyword). `not` stays omitted on purpose: a literal under
+    // a negative lookahead is a forbidden word, not present at the site, so it bears no scope; and
+    // `ref` stays unresolved (like collectLiterals) so a constraint TYPE's own keywords — `keyof`,
+    // `typeof` — are NOT mis-hoisted to type-parameter keyword scope.
+    if (e.type === 'sep') walk(e.element);
   }
   walk(elementBody);
   return [...new Set(keywords)];
diff --git a/test/check.ts b/test/check.ts
index bb32923..3aefb9e 100644
--- a/test/check.ts
+++ b/test/check.ts
@@ -36,6 +36,7 @@ const GATES: Gate[] = [
   { group: 'conformance', name: 'jsx', args: ['test/jsx-conformance.ts'] },
   { group: 'conformance', name: 'html', args: ['test/html-conformance.ts'] },
   { group: 'highlighter', name: 'tm-guards', args: ['test/tm-highlight-guards.ts'] },
+  { group: 'highlighter', name: 'tm-completeness', args: ['test/tm-completeness.ts', '--check'] },
   { group: 'highlighter', name: 'tm-diagnostics', args: ['test/redcmd-tm-diagnostics.ts'] },
   { group: 'highlighter', name: 'angle-depth', args: ['test/angle-depth-probe.ts'] },
   { group: 'highlighter', name: 'html-monarch', args: ['test/html-monarch.ts'] },
diff --git a/test/tm-completeness.ts b/test/tm-completeness.ts
new file mode 100644
index 0000000..a62cc2b
--- /dev/null
+++ b/test/tm-completeness.ts
@@ -0,0 +1,541 @@
+// ─────────────────────────────────────────────────────────────────────────────
+//  tm-completeness.ts — the COMPLETENESS checker + ledger for src/gen-tm.ts.
+//
+//  Issue #51: prove that the TextMate generator is COMPLETE — for every grammar
+//  shape that REQUIRES a TextMate construct, gen-tm emits it AND it is reachable.
+//  This is the dual of the soundness ledger (test/gap-ledger.ts, which finds
+//  WRONG paints): here we find UN-emitted / UN-reachable obligations.
+//
+//  The proof is structural, resting on the fact that the generator's INPUT is a
+//  CLOSED, FINITE algebra (RuleExpr / TokenPattern in src/types.ts) plus a finite
+//  set of config records (TokenDecl / Markup / Indent / Newline / …). Completeness
+//  reduces to three mechanically-checkable layers:
+//
+//    LAYER A — CLOSURE. The public api.ts combinators lower (toRuleExpr) onto
+//      exactly the RuleExpr union, and the token builders compile (tokenPatternToRegex)
+//      over exactly the TokenPattern union. Each lowering/compiler is TOTAL: a finite
+//      case analysis with no silent drop. Witnessed by instantiating every public
+//      combinator and asserting (a) it lowers/compiles without throwing and (b) the
+//      set of constructors it produces is the WHOLE union (nothing in the algebra is
+//      unreachable from the API; nothing the API emits is off-union).
+//
+//    LAYER B — OBLIGATION COVERAGE. From each grammar G we enumerate Obl(G): the
+//      finite, fixed-denominator multiset of highlighting obligations induced by G's
+//      tokens / literals / operators / shapes / config. The enumeration is an
+//      INDEPENDENT exhaustive walk of the closed algebra (NOT gen-tm's own detectors —
+//      a detector that misses a shape would otherwise also miss its obligation,
+//      co-blind). Each obligation must be discharged by an emitted construct that is
+//      reachable from the root patterns OR a declared export surface.
+//
+//    REACHABILITY. Every emitted repository key is reachable from root ∪ export
+//      surfaces (#expression, canonicalRepoNames official keys, aliasScopes); every
+//      export surface whose structural source is present resolves (no dangling).
+//
+//  Run (bare node):
+//    node test/tm-completeness.ts            # print the report
+//    node test/tm-completeness.ts --check    # CI gate: fail on any open gap or stale ledger
+//    node test/tm-completeness.ts --write     # (re)write COMPLETENESS.md ledger table
+// ─────────────────────────────────────────────────────────────────────────────
+import {
+  token, rule, defineGrammar, sep, opt, many, many1, alt, exclude, not, reservableNot,
+  tsRelax, capExpr, awaitCtx, yieldCtx, asyncGenCtx, resetCtx, op, prefix, postfix,
+  sameLine, noCommentBefore, noMultilineFlowBefore, notLeftLeaf,
+  oneOf, noneOf, seq, altPattern, optPattern, star, plus, repeat,
+  followedBy, notFollowedBy, precededBy, notPrecededBy, start, end, never, anyChar, range, none,
+} from '../src/api.ts';
+import { tokenPatternToRegex, tokenPatternIsNever, tokenPatternLiteralText } from '../src/token-pattern.ts';
+import { collectLiterals, isKeywordLiteral } from '../src/grammar-utils.ts';
+import type { RuleExpr, TokenPattern, CstGrammar } from '../src/types.ts';
+import { generateTmLanguage } from '../src/gen-tm.ts';
+import { createParser } from '../src/gen-parser.ts';
+import { generateInputs } from './grammar-gen.ts';
+import { buildRoleMap, leafRoles, spanBuckets, GEN_OPTS, type TmTok, type Bucket } from './generative-detect.ts';
+import { readFileSync, existsSync, writeFileSync } from 'node:fs';
+import { createRequire } from 'node:module';
+import vsctm from 'vscode-textmate';
+import onig from 'vscode-oniguruma';
+
+let pass = 0, failN = 0;
+const fails: string[] = [];
+const check = (label: string, cond: boolean, detail = '') => {
+  if (cond) pass++;
+  else { failN++; fails.push(`✗ ${label}${detail ? ` — ${detail}` : ''}`); }
+};
+
+// ════════════════════════════════════════════════════════════════════════════
+//  LAYER A — algebra closure
+// ════════════════════════════════════════════════════════════════════════════
+
+// The closed RuleExpr union, straight from src/types.ts (the proof's universe). If a
+// constructor is added there without being produced by some api.ts combinator, the
+// closure witness below will report it as an unreachable constructor.
+const RULE_EXPR_UNION = [
+  'seq', 'alt', 'literal', 'ref', 'quantifier', 'group', 'not',
+  'sameLine', 'noCommentBefore', 'noMultilineFlowBefore', 'notLeftLeaf',
+  'sep', 'op', 'prefix', 'postfix',
+] as const;
+
+const TOKEN_PATTERN_UNION = [
+  'anyChar', 'charClass', 'seq', 'alt', 'repeat', 'lookahead', 'lookbehind', 'anchor', 'never',
+  // (bare string is the tenth variant, handled before the object switch)
+] as const;
+
+// Walk a lowered RuleExpr, collecting every constructor tag it (transitively) uses.
+function collectExprTags(e: RuleExpr, out: Set<string>): void {
+  out.add(e.type);
+  switch (e.type) {
+    case 'seq': case 'alt': e.items.forEach(i => collectExprTags(i, out)); break;
+    case 'quantifier': case 'group': collectExprTags(e.body, out); break;
+    case 'not': collectExprTags(e.body, out); break;
+    case 'sep': collectExprTags(e.element, out); break;
+    // literal / ref / op / prefix / postfix / sameLine / noCommentBefore /
+    // noMultilineFlowBefore / notLeftLeaf are leaves — no children.
+  }
+}
+
+function checkRuleExprClosure(): void {
+  // ONE synthetic grammar whose rule bodies exercise EVERY public combinator and marker.
+  // Lowering it through defineGrammar() runs toRuleExpr on each; the produced constructor
+  // tags must cover the whole RuleExpr union, and lowering must not throw (totality).
+  const A = token('a');
+  const B = token('b');
+  // Every combinator/marker appears here at least once.
+  const Leaf = rule(() => [['lit']]);                                 // literal
+  const Refs = rule(($: any) => [[A, B, Leaf]]);                       // ref (token + rule)
+  const Quant = rule(() => [[opt('x'), many('y'), many1('z')]]);       // quantifier ?,*,+
+  const Alt = rule(() => [alt(['p'], ['q', 'r'])]);                    // alt + seq
+  const Sep = rule(($: any) => [[sep(A, ',')]]);                       // sep
+  const Group = rule(($: any) => [[                                    // group (4 flavours)
+    exclude('in', A),                                                  //   group.suppress
+    awaitCtx(A), yieldCtx(A), asyncGenCtx(A), resetCtx(A),             //   group.ctxMode
+    tsRelax(A, B),                                                     //   group.tsRelaxed
+    capExpr('||', A),                                                  //   group.capBelow
+  ]]);
+  const Nots = rule(($: any) => [[not(A), reservableNot(['kw'])]]);    // not (+ reservable)
+  const Markers = rule(($: any) => [[                                  // zero-width markers
+    sameLine, noCommentBefore, noMultilineFlowBefore, notLeftLeaf('void', 'null'), A,
+  ]]);
+  const Pratt = rule(($: any) => [[$, op, $], [prefix, $], [$, postfix]]); // op/prefix/postfix
+  const Entry = rule(($: any) => [[many(alt(Leaf, Refs, Quant, Alt, Sep, Group, Nots, Markers, Pratt))]]);
+
+  let threw = false; let g: CstGrammar;
+  try {
+    g = defineGrammar({
+      name: 'closure', tokens: { A, B },
+      rules: { Leaf, Refs, Quant, Alt, Sep, Group, Nots, Markers, Pratt, Entry }, entry: Entry,
+    });
+  } catch (e) { threw = true; g = null as any; }
+  check('Lemma A1: toRuleExpr is total (no throw lowering every combinator)', !threw, threw ? 'defineGrammar threw' : '');
+  if (threw) return;
+
+  const tags = new Set<string>();
+  for (const r of g.rules) collectExprTags(r.body, tags);
+  const missing = RULE_EXPR_UNION.filter(t => !tags.has(t));
+  const extra = [...tags].filter(t => !(RULE_EXPR_UNION as readonly string[]).includes(t));
+  check('Lemma A1: every RuleExpr constructor is reachable from a public combinator',
+    missing.length === 0, missing.length ? `unreached: ${missing.join(', ')}` : '');
+  check('Lemma A1: the API lowers onto NOTHING outside the RuleExpr union (image ⊆ algebra)',
+    extra.length === 0, extra.length ? `off-union: ${extra.join(', ')}` : '');
+}
+
+function checkTokenPatternClosure(): void {
+  // Instantiate every public token-pattern builder + the bare string. Each must compile
+  // (tokenPatternToRegex is total) and the produced constructor tags must cover the union.
+  const builders: [string, TokenPattern][] = [
+    ['string', 'abc'],
+    ['anyChar', anyChar()],
+    ['charClass(oneOf)', oneOf('a', 'b')],
+    ['charClass(noneOf)', noneOf('x')],
+    ['charClass(range)', range('a', 'z')],
+    ['seq', seq('a', 'b')],
+    ['alt', altPattern('a', 'b')],
+    ['repeat(star)', star('a')],
+    ['repeat(plus)', plus('a')],
+    ['repeat(opt)', optPattern('a')],
+    ['repeat(n)', repeat('a', 2, 4)],
+    ['lookahead(+)', followedBy('a')],
+    ['lookahead(-)', notFollowedBy('a')],
+    ['lookbehind(+)', precededBy('a')],
+    ['lookbehind(-)', notPrecededBy('a')],
+    ['anchor(start)', start()],
+    ['anchor(end)', end()],
+    ['never', never()],
+  ];
+  const tags = new Set<string>();
+  let allCompiled = true;
+  for (const [label, p] of builders) {
+    let src = '';
+    try { src = tokenPatternToRegex(p); } catch { allCompiled = false; check(`Lemma A2: tokenPatternToRegex compiles ${label}`, false, 'threw'); continue; }
+    check(`Lemma A2: tokenPatternToRegex compiles ${label} → non-empty regex`, typeof src === 'string', `got ${typeof src}`);
+    if (typeof p === 'string') tags.add('string'); else tags.add(p.type);
+  }
+  const missing = TOKEN_PATTERN_UNION.filter(t => !tags.has(t));
+  check('Lemma A2: every TokenPattern object constructor is produced by a public builder',
+    missing.length === 0, missing.length ? `unreached: ${missing.join(', ')}` : '');
+  check('Lemma A2: the bare-string TokenPattern variant compiles', tags.has('string'));
+  void allCompiled;
+}
+
+// ════════════════════════════════════════════════════════════════════════════
+//  REACHABILITY — every emitted repo key reachable from root ∪ export surfaces
+// ════════════════════════════════════════════════════════════════════════════
+
+interface TmGrammarJson { patterns?: unknown[]; repository?: Record<string, unknown>; scopeName?: string }
+
+// The DECLARED export surfaces of a grammar — repository keys an external embedder reaches
+// not from the root but by an explicit `<scope>#<key>` include: the #expression sub-grammar
+// (expressionRule) and the canonicalRepoNames OFFICIAL keys (and aliasScopes, which re-expose
+// the whole grammar). These are root-UNreachable BY DESIGN (a public repository API).
+function exportSurfaceKeys(g: CstGrammar): string[] {
+  const out: string[] = [];
+  if (g.expressionRule) out.push('expression');
+  for (const k of Object.keys(g.canonicalRepoNames ?? {})) out.push(k);
+  return out;
+}
+
+interface ReachResult { repoKeys: number; reached: number; dead: string[]; danglingWithSource: string[] }
+
+function checkReachability(g: CstGrammar, tm: TmGrammarJson): ReachResult {
+  const scope = tm.scopeName ?? g.scopeName ?? `source.${g.name}`;
+  const repo = tm.repository ?? {};
+  const reached = new Set<string>();
+  const queue: string[] = [];
+  const visit = (node: any): void => {
+    if (!node || typeof node !== 'object') return;
+    if (Array.isArray(node)) { node.forEach(visit); return; }
+    if (typeof node.include === 'string') {
+      const inc: string = node.include;
+      if (inc === '$self') { /* root */ }
+      else if (inc.startsWith('#')) queue.push(inc.slice(1));
+      else if (inc.startsWith(scope + '#')) queue.push(inc.slice(scope.length + 1));
+      // else external grammar — terminal
+    }
+    if (node.patterns) visit(node.patterns);
+    for (const capKey of ['captures', 'beginCaptures', 'endCaptures', 'whileCaptures'])
+      if (node[capKey]) for (const c of Object.values(node[capKey])) visit(c);
+  };
+  visit(tm.patterns ?? []);
+  const exports = exportSurfaceKeys(g);
+  // an export surface whose source is ABSENT in a SHARED canonical map (e.g. `type` in JS,
+  // which has no type layer) induces no obligation — record it separately, don't seed it dead.
+  const danglingWithSource: string[] = [];
+  for (const s of exports) { queue.push(s); }
+  while (queue.length) {
+    const key = queue.shift()!;
+    if (reached.has(key)) continue;
+    reached.add(key);
+    if (repo[key]) visit(repo[key]);
+  }
+  const allKeys = Object.keys(repo);
+  const dead = allKeys.filter(k => !reached.has(k));
+  // a reached key with no repo entry that is an EXPORT surface = a declared export with an
+  // absent structural source (inert in a shared map); flag only if it is NOT an export surface.
+  for (const k of reached) if (!repo[k] && !exports.includes(k)) danglingWithSource.push(k);
+  return { repoKeys: allKeys.length, reached: [...reached].filter(k => repo[k]).length, dead, danglingWithSource };
+}
+
+// ── Token emitter completeness: every non-skip token has a discharging emission path ──
+//  A token bears a leaf-scope obligation unless it is `skip` (trivia / whitespace). Each is
+//  discharged by exactly one family: the flat token loop (a `#<name>` repository entry), the
+//  regex-literal family (a `regex`-flagged token), the indent/markup ENGINE (a `never()`
+//  placeholder pattern the region machinery replaces), the markup region machinery (a markup
+//  grammar emits no per-token keys — generateMarkupTm owns text/tag/attr), or a region that
+//  owns the token's delimiter (the JSX `/>` / `</` punctuation, scoped inside the JSX patterns).
+//  An ORPHAN — a non-skip token with no discharge path — is an emitter-completeness gap.
+interface TokenCensus { total: number; skip: number; byPath: Record<string, number>; orphans: string[] }
+function tokenCensus(g: CstGrammar, tmJson: TmGrammarJson): TokenCensus {
+  const repo = tmJson.repository ?? {};
+  const full = JSON.stringify(tmJson);
+  const byPath: Record<string, number> = {};
+  const orphans: string[] = [];
+  let skip = 0;
+  const bump = (p: string) => byPath[p] = (byPath[p] ?? 0) + 1;
+  for (const t of g.tokens) {
+    if (t.flags.includes('skip')) { skip++; continue; }
+    if (repo[t.name.toLowerCase()]) { bump('flat'); continue; }
+    if (t.flags.includes('regex')) { bump('regex-family'); continue; }
+    if (tokenPatternIsNever(t)) { bump('engine-emitted'); continue; }
+    if (g.markup) { bump('markup-region'); continue; }                 // generateMarkupTm owns it
+    const delim = tokenPatternLiteralText(t);                          // a region owns this token's delimiter?
+    if (delim && full.includes(JSON.stringify(delim).slice(1, -1))) { bump('region-owned'); continue; }
+    orphans.push(`${t.name}[${t.flags.join(',') || '-'}]`);
+  }
+  return { total: g.tokens.length, skip, byPath, orphans };
+}
+
+// ════════════════════════════════════════════════════════════════════════════
+//  shared vscode-textmate tokenizer (one WASM load) — reused by Layer B coverage
+// ════════════════════════════════════════════════════════════════════════════
+const { INITIAL, Registry, parseRawGrammar } = vsctm;
+const { loadWASM, OnigScanner, OnigString } = onig;
+const require = createRequire(import.meta.url);
+const wasmBin = readFileSync(require.resolve('vscode-oniguruma/release/onig.wasm'));
+await loadWASM(wasmBin.buffer.slice(wasmBin.byteOffset, wasmBin.byteOffset + wasmBin.byteLength));
+
+async function loadTmFromObject(scopeName: string, grammars: Record<string, object>): Promise<vsctm.IGrammar | null> {
+  const reg = new Registry({
+    onigLib: Promise.resolve({ createOnigScanner: (p: string[]) => new OnigScanner(p), createOnigString: (s: string) => new OnigString(s) }),
+    loadGrammar: async (sn: string) => grammars[sn] ? parseRawGrammar(JSON.stringify(grammars[sn]), sn + '.json') : null,
+  });
+  return reg.loadGrammar(scopeName);
+}
+async function loadTmFromFiles(scopeName: string, files: Record<string, string>): Promise<vsctm.IGrammar | null> {
+  const cache: Record<string, string> = {};
+  const reg = new Registry({
+    onigLib: Promise.resolve({ createOnigScanner: (p: string[]) => new OnigScanner(p), createOnigString: (s: string) => new OnigString(s) }),
+    loadGrammar: async (sn: string) => { const p = files[sn]; if (!p) return null; const c = cache[sn] ?? (cache[sn] = readFileSync(p, 'utf8')); return parseRawGrammar(c, sn + '.json'); },
+  });
+  return reg.loadGrammar(scopeName);
+}
+function tmTokenize(grammar: vsctm.IGrammar, text: string): TmTok[] {
+  const toks: TmTok[] = []; let rs = INITIAL, off = 0;
+  for (const line of text.split('\n')) { const r = grammar.tokenizeLine(line, rs); for (const t of r.tokens) toks.push({ start: off + t.startIndex, end: off + t.endIndex, scopes: t.scopes }); rs = r.ruleStack; off += line.length + 1; }
+  return toks;
+}
+
+// ════════════════════════════════════════════════════════════════════════════
+//  LAYER B1 — empirical leaf coverage (fixed denominator)
+//
+//  Every CONTENT/keyword leaf (a leaf the grammar's OWN role map says must read as a
+//  keyword / string / number / comment) must be PAINTED — recognised and given a scope
+//  beyond the bare document root, never left as inert text. The denominator is the
+//  grammar-derived obligation leaves over the deterministic corpus; the role map and the
+//  corpus are the SAME independent infrastructure the soundness checks use (no co-bias
+//  with gen-tm's own detectors). A leaf painted SOME non-root scope discharges its
+//  recognise-and-scope obligation; whether that scope is the RIGHT one is soundness
+//  (test/scope-gap.ts + test/gap-ledger.ts), a separate axis.
+// ════════════════════════════════════════════════════════════════════════════
+const CONTENT_OBLIGATION = new Set<Bucket>(['keyword', 'string', 'number', 'comment']);
+
+interface CoverageResult { den: number; painted: number; uncovered: { text: string; want: string; ctx: string }[] }
+
+function leafCoverage(grammar: CstGrammar, tm: vsctm.IGrammar, opts = GEN_OPTS): CoverageResult {
+  const { parse } = createParser(grammar);
+  const roleOf = buildRoleMap(grammar);
+  const inputs = generateInputs(grammar, opts);
+  let den = 0, painted = 0; const uncovered: CoverageResult['uncovered'] = [];
+  for (const inp of inputs) {
+    let cst; try { cst = parse(inp.text); } catch { continue; }   // only entry-rule (full-document) inputs
+    let toks; try { toks = tmTokenize(tm, inp.text); } catch { continue; }
+    for (const lf of leafRoles(grammar, cst, inp.text, roleOf)) {
+      if (![...lf.expected].some(b => CONTENT_OBLIGATION.has(b))) continue;   // bears a content/keyword obligation
+      den++;
+      const got = spanBuckets(toks, inp.text, lf.start, lf.end);
+      if ([...got].some(b => b !== 'none')) painted++;                         // recognised + scoped
+      else if (uncovered.length < 20) uncovered.push({ text: lf.text, want: [...lf.expected].join('|'), ctx: inp.text.slice(Math.max(0, lf.start - 6), lf.end + 6).replace(/\n/g, '\\n') });
+    }
+  }
+  return { den, painted, uncovered };
+}
+
+// ════════════════════════════════════════════════════════════════════════════
+//  LAYER A (cont.) — the literal-collection backbone is total + drops nothing consumed
+//
+//  The flat keyword/operator scoping in gen-tm.ts is driven by the SHARED primitive
+//  collectLiterals (src/grammar-utils.ts), looped over every rule body. So flat keyword
+//  completeness reduces to: collectLiterals collects EVERY consumed literal — it recurses
+//  into all consuming structural constructors (seq/alt/quantifier/group/sep) and correctly
+//  omits only the non-consuming ones (`not` = negative lookahead, the literal must NOT be
+//  there) and `ref` (a cross-rule edge, collected when that rule's own body is walked).
+//  Witnessed by nesting a sentinel literal under each constructor. This is why a naive
+//  end-to-end keyword probe is VACUOUS — collectLiterals already covers every nesting; the
+//  ONLY residual silent-drop risk is in the SPECIALISED region walkers that do NOT use it
+//  (getTypeParamElementKeywords, lastModifiers), covered by the region probe below.
+// ════════════════════════════════════════════════════════════════════════════
+function checkCollectLiteralsClosure(): void {
+  const S = 'SENTINEL';
+  const ref = { type: 'ref', name: 'Other' } as RuleExpr;
+  const lit = { type: 'literal', value: S } as RuleExpr;
+  const wrap: [string, RuleExpr, boolean][] = [
+    // [label, expr nesting the sentinel, shouldCollect]
+    ['seq', { type: 'seq', items: [ref, lit] }, true],
+    ['alt', { type: 'alt', items: [ref, lit] }, true],
+    ['quantifier(*)', { type: 'quantifier', body: lit, kind: '*' }, true],
+    ['group', { type: 'group', body: lit }, true],
+    ['group(suppress)', { type: 'group', body: lit, suppress: ['in'] }, true],
+    ['group(ctxMode)', { type: 'group', body: lit, ctxMode: 'await' }, true],
+    ['sep.element', { type: 'sep', element: lit, delimiter: ',' }, true],
+    ['sep.delimiter', { type: 'sep', element: ref, delimiter: S }, true],
+    ['not (non-consuming → omit)', { type: 'not', body: lit }, false],
+  ];
+  for (const [label, expr, shouldCollect] of wrap) {
+    const got = collectLiterals(expr).includes(S);
+    check(`collectLiterals: a literal under \`${label}\` is ${shouldCollect ? 'collected' : 'correctly omitted'}`, got === shouldCollect,
+      `collected=${got}, expected=${shouldCollect}`);
+  }
+  // markers carry no consumed literal
+  for (const m of ['op', 'prefix', 'postfix', 'sameLine', 'noCommentBefore', 'noMultilineFlowBefore'] as const) {
+    check(`collectLiterals: marker \`${m}\` contributes no literal`, collectLiterals({ type: m } as RuleExpr).length === 0);
+  }
+}
+
+// ════════════════════════════════════════════════════════════════════════════
+//  LAYER B2 — region-internal keyword preservation (positive control)
+//
+//  Inside a derived `<…>` type-parameter region (scoped meta.type.parameters), a nested
+//  keyword would inherit the region scope and LOSE its keyword role unless the specialised
+//  walker getTypeParamElementKeywords lifts it out. That walker collects the element's DIRECT
+//  structural keywords (recursing seq / alt / quantifier / group) — exactly what `extends` /
+//  `const` / `in` / `out` need. It deliberately does NOT reach through `ref` (a constraint's
+//  TYPE, e.g. `keyof`/`typeof`, must NOT be hoisted to type-param keyword scope) — a boundary
+//  consistent with the flat scoping (collectLiterals also stops at `ref`). This probe asserts
+//  the well-defined obligation: a direct structural keyword IS hoisted, through each handled
+//  constructor. It BITES: if the walker stopped collecting a handled constructor, the keyword
+//  would read as plain meta.type content.
+// ════════════════════════════════════════════════════════════════════════════
+async function regionKeywordProbe(): Promise<void> {
+  const Ident = token(plus(range('a', 'z')), { identifier: true });
+  // a type-param element with keywords reached through each HANDLED constructor:
+  //   kwa via quantifier(opt), extends via opt+seq, kwsep DIRECT inside a `sep` sub-list.
+  // `kwsep` is the regression guard for the getTypeParamElementKeywords `sep` recursion: before
+  // that one-line completion it read as plain meta.type content (the latent silent drop).
+  const TypeParam = rule(() => [[opt('kwa'), Ident, opt('extends', sep('kwsep', '&'))]]);
+  const TypeArgs = rule(($: any) => [['<', sep(TypeParam, ','), '>']]);
+  const Decl = rule(($: any) => [['fn', Ident, opt(TypeArgs), '(', ')', '{', '}']]);
+  const Call = rule(($: any) => [[Ident, '<', sep(Ident, ','), '>', '(', ')']]);
+  const Expr = rule(() => [Ident, Call]);
+  const Stmt = rule(() => [Decl, Expr]);
+  const Prog = rule(() => [[many(Stmt)]]);
+  const g = defineGrammar({
+    name: 'rkw', scopeName: 'source.rkw', tokens: { Ident },
+    prec: [none('<', '>')], scopes: { 'storage.type.function': ['fn'], 'keyword.control': ['kwa', 'extends', 'kwsep'] },
+    rules: { TypeParam, TypeArgs, Decl, Call, Expr, Stmt, Prog }, entry: Prog,
+  });
+  const tm = await loadTmFromObject('source.rkw', { 'source.rkw': generateTmLanguage(g) as unknown as object });
+  if (!tm) { check('region-keyword probe: grammar loads', false); return; }
+  const witness = 'fn f<kwa T extends kwsep>(){}';
+  const toks = tmTokenize(tm, witness);
+  for (const kw of ['kwa', 'extends', 'kwsep']) {
+    const at = witness.indexOf(kw);
+    const got = spanBuckets(toks, witness, at, at + kw.length);
+    check(`region-keyword: structural keyword \`${kw}\` is hoisted to keyword scope inside \`<…>\``,
+      got.has('keyword'), `got {${[...got].join(',')}}`);
+  }
+}
+
+// ════════════════════════════════════════════════════════════════════════════
+//  driver
+// ════════════════════════════════════════════════════════════════════════════
+interface GrammarCfg { name: string; module: string; scopeName: string; tm: string; tmExtra?: Record<string, string> }
+const GRAMMARS: GrammarCfg[] = [
+  { name: 'typescript', module: '../typescript.ts', scopeName: 'source.ts', tm: 'typescript.tmLanguage.json' },
+  { name: 'javascript', module: '../javascript.ts', scopeName: 'source.js', tm: 'javascript.tmLanguage.json' },
+  { name: 'typescriptreact', module: '../typescriptreact.ts', scopeName: 'source.tsx', tm: 'typescriptreact.tmLanguage.json' },
+  { name: 'javascriptreact', module: '../javascriptreact.ts', scopeName: 'source.js.jsx', tm: 'javascriptreact.tmLanguage.json' },
+  { name: 'html', module: '../html.ts', scopeName: 'text.html.basic', tm: 'html.tmLanguage.json',
+    tmExtra: { 'source.js': 'javascript.tmLanguage.json', 'source.css': 'html.tmLanguage.json' } },
+  { name: 'yaml', module: '../yaml.ts', scopeName: 'source.yaml', tm: 'yaml.tmLanguage.json' },
+];
+
+// ── the fixed-denominator obligation census per grammar (the ledger row) ──
+interface LedgerRow {
+  name: string;
+  tokenObl: number; tokenDisch: number;        // non-skip tokens, each → a discharge path
+  litObl: number;                              // distinct keyword literals (painted ⇐ leaf coverage)
+  opObl: number;                               // distinct Pratt operators
+  keyObl: number; keyReach: number;            // repository keys, each → reachable
+  leafObl: number; leafPaint: number;          // empirical content/keyword leaves, each → painted
+}
+function ledgerRow(name: string, g: CstGrammar, tmJson: TmGrammarJson, r: ReachResult, tc: TokenCensus, cov: CoverageResult): LedgerRow {
+  const lits = new Set<string>();
+  for (const rule of g.rules) for (const l of collectLiterals(rule.body)) if (isKeywordLiteral(l)) lits.add(l);
+  const ops = new Set<string>();
+  for (const p of g.precs) for (const o of p.operators) ops.add(o.value);
+  for (const lp of g.ledPrecs ?? []) ops.add(lp.connector);
+  return {
+    name,
+    tokenObl: g.tokens.filter(t => !t.flags.includes('skip')).length, tokenDisch: g.tokens.filter(t => !t.flags.includes('skip')).length - tc.orphans.length,
+    litObl: lits.size, opObl: ops.size,
+    keyObl: r.repoKeys, keyReach: r.repoKeys - r.dead.length,
+    leafObl: cov.den, leafPaint: cov.painted,
+  };
+}
+
+// the auto-generated ledger block (a region in COMPLETENESS.md, like KNOWN-GAPS.md / the README issue table)
+function renderLedger(rows: LedgerRow[]): string {
+  const L: string[] = [];
+  L.push('<!-- COMPLETENESS-LEDGER:START — auto-generated by `node test/tm-completeness.ts --write`; do not edit by hand. -->');
+  L.push('');
+  L.push('| Grammar | Tokens | Keyword literals | Operators | Repo keys (reachable) | Leaf obligations (painted) |');
+  L.push('|---|---:|---:|---:|---:|---:|');
+  const sum = { t: 0, td: 0, lit: 0, op: 0, k: 0, kr: 0, lf: 0, lp: 0 };
+  for (const r of rows) {
+    L.push(`| ${r.name} | ${r.tokenDisch}/${r.tokenObl} | ${r.litObl} | ${r.opObl} | ${r.keyReach}/${r.keyObl} | ${r.leafPaint}/${r.leafObl} |`);
+    sum.t += r.tokenObl; sum.td += r.tokenDisch; sum.lit += r.litObl; sum.op += r.opObl;
+    sum.k += r.keyObl; sum.kr += r.keyReach; sum.lf += r.leafObl; sum.lp += r.leafPaint;
+  }
+  L.push(`| **total** | **${sum.td}/${sum.t}** | **${sum.lit}** | **${sum.op}** | **${sum.kr}/${sum.k}** | **${sum.lp}/${sum.lf}** |`);
+  L.push('');
+  // the fixed denominator = every measured obligation (token-discharge + key-reachability + leaf-painting)
+  const den = sum.t + sum.k + sum.lf, num = sum.td + sum.kr + sum.lp;
+  L.push(`**Fixed-denominator completeness: ${num}/${den} = ${(100 * num / den).toFixed(2)}%** ` +
+    `(token discharge ${sum.td}/${sum.t} · repository reachability ${sum.kr}/${sum.k} · leaf painting ${sum.lp}/${sum.lf}). ` +
+    `Keyword literals (${sum.lit}) and Pratt operators (${sum.op}) are discharged through the leaf-painting column. ` +
+    `${num === den ? '**0 open completeness gaps.**' : `**${den - num} OPEN GAP(S).**`}`);
+  L.push('');
+  L.push('<!-- COMPLETENESS-LEDGER:END -->');
+  return L.join('\n');
+}
+
+const LEDGER_FILE = 'COMPLETENESS.md';
+function spliceRegion(file: string, block: string): { changed: boolean; full: string } {
+  const start = '<!-- COMPLETENESS-LEDGER:START', end = '<!-- COMPLETENESS-LEDGER:END -->';
+  const cur = existsSync(file) ? readFileSync(file, 'utf8') : '';
+  const si = cur.indexOf(start), ei = cur.indexOf(end);
+  if (si < 0 || ei < 0) return { changed: cur !== '', full: cur };   // markers absent → leave the file alone
+  const full = cur.slice(0, si) + block + cur.slice(ei + end.length);
+  return { changed: full !== cur, full };
+}
+
+async function main(): Promise<void> {
+  const WRITE = process.argv.includes('--write');
+  const CHECK = process.argv.includes('--check');
+
+  console.log('── Layer A: algebra closure ──');
+  checkRuleExprClosure();
+  checkTokenPatternClosure();
+
+  console.log('── Layer A: no consumed literal is silently dropped (collectLiterals backbone) ──');
+  checkCollectLiteralsClosure();
+  await regionKeywordProbe();
+
+  console.log('── Reachability · token completeness · Layer B1 leaf coverage ──');
+  const rows: LedgerRow[] = [];
+  for (const cfg of GRAMMARS) {
+    if (!existsSync(cfg.tm)) { console.log(`  ${cfg.name}: (no emitted grammar)`); continue; }
+    const g = (await import(cfg.module)).default as CstGrammar;
+    const tmJson = JSON.parse(readFileSync(cfg.tm, 'utf8')) as TmGrammarJson;
+    const r = checkReachability(g, tmJson);
+    check(`reachability(${cfg.name}): no dead repository keys`, r.dead.length === 0, r.dead.join(', '));
+    check(`reachability(${cfg.name}): no dangling self-#refs with present source`, r.danglingWithSource.length === 0, r.danglingWithSource.join(', '));
+    const tc = tokenCensus(g, tmJson);
+    check(`token-completeness(${cfg.name}): every non-skip token has a discharge path`, tc.orphans.length === 0, `orphans: ${tc.orphans.join(' ')}`);
+    const tm = await loadTmFromFiles(cfg.scopeName, { [cfg.scopeName]: cfg.tm, ...(cfg.tmExtra ?? {}) });
+    let cov: CoverageResult = { den: 0, painted: 0, uncovered: [] };
+    if (tm) cov = leafCoverage(g, tm);
+    check(`coverage(${cfg.name}): every content/keyword obligation leaf is painted`, cov.painted === cov.den,
+      cov.uncovered.map(u => `"${u.text}"(${u.want})`).slice(0, 8).join(' '));
+    rows.push(ledgerRow(cfg.name, g, tmJson, r, tc, cov));
+    const pct = cov.den ? (100 * cov.painted / cov.den).toFixed(2) : '—';
+    console.log(`  ${cfg.name.padEnd(17)} repo ${String(r.repoKeys).padStart(3)} · dead ${r.dead.length} · tokens ${tc.total - tc.skip - tc.orphans.length}/${tc.total - tc.skip} · leaf-coverage ${cov.painted}/${cov.den} = ${pct}%`);
+    if (cov.uncovered.length) for (const u of cov.uncovered.slice(0, 6)) console.log(`      UNCOVERED "${u.text}" want ${u.want} ctx …${u.ctx}…`);
+  }
+
+  const block = renderLedger(rows);
+  if (WRITE) {
+    const { changed, full } = spliceRegion(LEDGER_FILE, block);
+    if (existsSync(LEDGER_FILE) && full.includes('COMPLETENESS-LEDGER:START')) { writeFileSync(LEDGER_FILE, full); console.log(`\n${changed ? '✓ updated' : '· unchanged'} ${LEDGER_FILE} ledger region`); }
+    else console.log(`\n(no ${LEDGER_FILE} ledger markers yet — block below)\n\n${block}`);
+  }
+  if (CHECK) {
+    const { changed } = spliceRegion(LEDGER_FILE, block);
+    check(`${LEDGER_FILE} ledger region is up to date`, !changed || !existsSync(LEDGER_FILE), `run: node test/tm-completeness.ts --write`);
+  }
+
+  console.log('');
+  for (const f of fails) console.log('  ' + f);
+  console.log(`\n${failN === 0 ? `✓ ${pass}/${pass} completeness checks pass` : `✗ ${failN} FAILED (${pass} passed)`}`);
+  process.exit(failN === 0 ? 0 : 1);
+}
+
+if ((import.meta as any).main) await main();

From 3b0e1dd114ef38d3bb9c144405e824afc2c6bc15 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sat, 20 Jun 2026 14:55:53 +0800
Subject: [PATCH 02/14] Mutation-test the completeness detector (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A passing checker means nothing if the checker is blind. Add
test/tm-mutation.ts: it injects a catalogue of known gaps into the emitted
grammar (drop a key / all of a token's includes, neuter a scope to the bare
root, add a dead key, a dangling include, mis-scope a token to a wrong role,
reorder two disambiguation patterns) and records which detector layer kills
each — measuring the detector's power instead of asserting it.

Measured: every PRESENCE / REACHABILITY gap is killed corpus-free (16/16,
12/16 by reachability / token-census / the new flat-token neuter check);
WRONG-ROLE gaps are caught only by a differential witness (presence ≠
correctness); ORDERING gaps are a measured blind spot — TextMate is
order-sensitive and pattern rank lives in the emitted artifact, not the
grammar algebra, so no corpus-free structural check reaches it. This is the
honest boundary, now empirical: the structural proof covers presence +
reachability; ordering / correctness are the soundness axis, reached only by
evaluation. The over-claim of an a-priori "no gap can hide" over the whole
gap space is dropped — COMPLETENESS.md states the bounded, measured claim.

The harness motivated one detector strengthening: tokenCensus now flags a
flat token NEUTERED to the bare root scope (an entry that exists but paints
inert document text), moving that gap class from differential-only to
corpus-free. Wired as a meta-gate in `npm run check`.
---
 COMPLETENESS.md         |  37 ++++++-
 package.json            |   1 +
 test/check.ts           |   1 +
 test/tm-completeness.ts |  31 +++---
 test/tm-mutation.ts     | 207 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 262 insertions(+), 15 deletions(-)
 create mode 100644 test/tm-mutation.ts

diff --git a/COMPLETENESS.md b/COMPLETENESS.md
index 99f63a3..321680b 100644
--- a/COMPLETENESS.md
+++ b/COMPLETENESS.md
@@ -111,8 +111,35 @@ The empirical witness that all of the above actually paint is **leaf coverage**:
 deterministic grammar-derived corpus (`test/grammar-gen.ts`), every parsed leaf whose
 by-construction role (`buildRoleMap`) is a content/keyword role (keyword / string / number /
 comment) is confirmed to receive a non-root scope. The denominator is fixed (the obligation
-leaves); the metric is non-vacuous (deleting a discharging repository key drops it below 100%).
-Result: **2433/2433 across all six grammars.**
+leaves). Result: **2433/2433 across all six grammars.**
+
+## Measuring the detector — mutation testing
+
+A passing checker is worthless if the checker is *blind* — the corpus-trap this project has been
+bitten by. So the guarantee is not asserted, it is **measured**: `test/tm-mutation.ts` injects a
+catalogue of known gaps into the emitted grammar (drop a key, drop all of a token's includes,
+neuter a scope to the bare root, add a dead key, a dangling include, mis-scope a token to a wrong
+role, reorder two disambiguation patterns) and records which detector layer — if any — kills each.
+The honest, measured result:
+
+- **Presence gaps** (a token / scope / key dropped or neutered): **16/16 killed · 12/16 by a
+  CORPUS-FREE structural detector** (reachability dead/dangling · the token census · the flat-token
+  neuter check). The remaining four — a *region* token neutered — are caught by a targeted
+  differential witness, not corpus-free. **No presence gap survives; this is the gate.**
+- **Wrong-role gaps** (a token still painted, but the wrong role): caught by the differential
+  (a bucket change at the witness), *not* by the structural detector — a token that *is* painted
+  satisfies presence. This is the completeness/soundness seam: presence ≠ correctness.
+- **Ordering gaps** (two patterns reordered so a looser rule shadows a tighter one): a **measured
+  blind spot**. TextMate is order-sensitive, and which pattern wins is a property of the emitted
+  artifact's *sequence*, not the grammar's algebra — so no corpus-free structural check reaches it,
+  and a scope-preserving reorder slips even the bucket-level differential.
+
+So the claim this document makes is bounded and measured: **every presence / reachability gap is
+caught corpus-free** (mutation-proven, the gate); **wrong-role and ordering gaps are the soundness /
+interaction axis**, reached only by evaluation (the differential, or `test/gap-ledger.ts`), never by
+a grammar-algebraic proof. An a-priori "no gap can hide" over the *whole* gap space is not available
+— ordering and correctness obligations live in the emitted artifact and slide toward regex-vs-CFG
+undecidability — and this document does not claim it.
 
 ## Reachability — root ∪ export surfaces
 
@@ -179,8 +206,10 @@ Auto-generated by `node test/tm-completeness.ts --write`; `--check` fails CI if
 ## The gates that hold this exact
 
 - `test/tm-completeness.ts` — Layer A closure (RuleExpr / TokenPattern / `collectLiterals`), the
-  `sep`-recursion regression guard, reachability, the token census, and leaf coverage with a fixed
-  denominator. `npm run completeness` prints it; `npm run completeness:check` gates the ledger.
+  `sep`-recursion regression guard, reachability, the token census (orphans + neuter), and leaf
+  coverage with a fixed denominator. `npm run completeness`; `npm run completeness:check` gates the ledger.
+- `test/tm-mutation.ts` — the **meta-gate**: injects known gaps and asserts every presence gap is
+  killed with no false alarms, measuring (not asserting) the detector's power. `npm run completeness:mutation`.
 - `test/agnostic.ts` — detector shape-completeness: the detectors fire on structure, not on TS
   names, so "every shape that bears the obligation is detected" holds for any grammar.
 - `test/scope-gap.ts`, `test/gap-ledger.ts` — the **soundness** axis (is each painted scope
diff --git a/package.json b/package.json
index c578977..8fe6555 100644
--- a/package.json
+++ b/package.json
@@ -39,6 +39,7 @@
     "completeness": "node test/tm-completeness.ts",
     "completeness:check": "node test/tm-completeness.ts --check",
     "completeness:write": "node test/tm-completeness.ts --write",
+    "completeness:mutation": "node test/tm-mutation.ts",
     "ledger:selftest": "node test/gap-ledger-selftest.ts",
     "ledger:issues": "node test/gap-issues.ts",
     "ledger:issues:dry": "node test/gap-issues.ts --dry-run",
diff --git a/test/check.ts b/test/check.ts
index 3aefb9e..1658343 100644
--- a/test/check.ts
+++ b/test/check.ts
@@ -37,6 +37,7 @@ const GATES: Gate[] = [
   { group: 'conformance', name: 'html', args: ['test/html-conformance.ts'] },
   { group: 'highlighter', name: 'tm-guards', args: ['test/tm-highlight-guards.ts'] },
   { group: 'highlighter', name: 'tm-completeness', args: ['test/tm-completeness.ts', '--check'] },
+  { group: 'highlighter', name: 'tm-mutation', args: ['test/tm-mutation.ts'] },
   { group: 'highlighter', name: 'tm-diagnostics', args: ['test/redcmd-tm-diagnostics.ts'] },
   { group: 'highlighter', name: 'angle-depth', args: ['test/angle-depth-probe.ts'] },
   { group: 'highlighter', name: 'html-monarch', args: ['test/html-monarch.ts'] },
diff --git a/test/tm-completeness.ts b/test/tm-completeness.ts
index a62cc2b..72b7773 100644
--- a/test/tm-completeness.ts
+++ b/test/tm-completeness.ts
@@ -180,7 +180,7 @@ function checkTokenPatternClosure(): void {
 //  REACHABILITY — every emitted repo key reachable from root ∪ export surfaces
 // ════════════════════════════════════════════════════════════════════════════
 
-interface TmGrammarJson { patterns?: unknown[]; repository?: Record<string, unknown>; scopeName?: string }
+export interface TmGrammarJson { patterns?: unknown[]; repository?: Record<string, unknown>; scopeName?: string }
 
 // The DECLARED export surfaces of a grammar — repository keys an external embedder reaches
 // not from the root but by an explicit `<scope>#<key>` include: the #expression sub-grammar
@@ -193,9 +193,9 @@ function exportSurfaceKeys(g: CstGrammar): string[] {
   return out;
 }
 
-interface ReachResult { repoKeys: number; reached: number; dead: string[]; danglingWithSource: string[] }
+export interface ReachResult { repoKeys: number; reached: number; dead: string[]; danglingWithSource: string[] }
 
-function checkReachability(g: CstGrammar, tm: TmGrammarJson): ReachResult {
+export function checkReachability(g: CstGrammar, tm: TmGrammarJson): ReachResult {
   const scope = tm.scopeName ?? g.scopeName ?? `source.${g.name}`;
   const repo = tm.repository ?? {};
   const reached = new Set<string>();
@@ -242,17 +242,25 @@ function checkReachability(g: CstGrammar, tm: TmGrammarJson): ReachResult {
 //  grammar emits no per-token keys — generateMarkupTm owns text/tag/attr), or a region that
 //  owns the token's delimiter (the JSX `/>` / `</` punctuation, scoped inside the JSX patterns).
 //  An ORPHAN — a non-skip token with no discharge path — is an emitter-completeness gap.
-interface TokenCensus { total: number; skip: number; byPath: Record<string, number>; orphans: string[] }
-function tokenCensus(g: CstGrammar, tmJson: TmGrammarJson): TokenCensus {
+export interface TokenCensus { total: number; skip: number; byPath: Record<string, number>; orphans: string[]; neutered: string[] }
+export function tokenCensus(g: CstGrammar, tmJson: TmGrammarJson): TokenCensus {
   const repo = tmJson.repository ?? {};
+  const root = tmJson.scopeName ?? `source.${g.name}`;
   const full = JSON.stringify(tmJson);
   const byPath: Record<string, number> = {};
   const orphans: string[] = [];
+  const neutered: string[] = [];   // a flat token whose entry exists but paints only the bare root (no visual scope)
   let skip = 0;
   const bump = (p: string) => byPath[p] = (byPath[p] ?? 0) + 1;
+  // a flat `{name, match}` entry discharges its scope obligation only if `name` is a real
+  // visual scope — not the bare document root and not empty. An entry whose name was reduced
+  // to the root scope is a "neuter" gap (the token tokenises but reads as inert document text),
+  // structurally visible without any corpus.
+  const flatNeutered = (e: any): boolean => !e.begin && !e.patterns && (!e.name || String(e.name).split(' ').every((s: string) => s === root || !s));
   for (const t of g.tokens) {
     if (t.flags.includes('skip')) { skip++; continue; }
-    if (repo[t.name.toLowerCase()]) { bump('flat'); continue; }
+    const flat = repo[t.name.toLowerCase()];
+    if (flat) { if (flatNeutered(flat)) neutered.push(`${t.name}→${(flat as any).name ?? '∅'}`); else bump('flat'); continue; }
     if (t.flags.includes('regex')) { bump('regex-family'); continue; }
     if (tokenPatternIsNever(t)) { bump('engine-emitted'); continue; }
     if (g.markup) { bump('markup-region'); continue; }                 // generateMarkupTm owns it
@@ -260,7 +268,7 @@ function tokenCensus(g: CstGrammar, tmJson: TmGrammarJson): TokenCensus {
     if (delim && full.includes(JSON.stringify(delim).slice(1, -1))) { bump('region-owned'); continue; }
     orphans.push(`${t.name}[${t.flags.join(',') || '-'}]`);
   }
-  return { total: g.tokens.length, skip, byPath, orphans };
+  return { total: g.tokens.length, skip, byPath, orphans, neutered };
 }
 
 // ════════════════════════════════════════════════════════════════════════════
@@ -272,7 +280,7 @@ const require = createRequire(import.meta.url);
 const wasmBin = readFileSync(require.resolve('vscode-oniguruma/release/onig.wasm'));
 await loadWASM(wasmBin.buffer.slice(wasmBin.byteOffset, wasmBin.byteOffset + wasmBin.byteLength));
 
-async function loadTmFromObject(scopeName: string, grammars: Record<string, object>): Promise<vsctm.IGrammar | null> {
+export async function loadTmFromObject(scopeName: string, grammars: Record<string, object>): Promise<vsctm.IGrammar | null> {
   const reg = new Registry({
     onigLib: Promise.resolve({ createOnigScanner: (p: string[]) => new OnigScanner(p), createOnigString: (s: string) => new OnigString(s) }),
     loadGrammar: async (sn: string) => grammars[sn] ? parseRawGrammar(JSON.stringify(grammars[sn]), sn + '.json') : null,
@@ -287,7 +295,7 @@ async function loadTmFromFiles(scopeName: string, files: Record<string, string>)
   });
   return reg.loadGrammar(scopeName);
 }
-function tmTokenize(grammar: vsctm.IGrammar, text: string): TmTok[] {
+export function tmTokenize(grammar: vsctm.IGrammar, text: string): TmTok[] {
   const toks: TmTok[] = []; let rs = INITIAL, off = 0;
   for (const line of text.split('\n')) { const r = grammar.tokenizeLine(line, rs); for (const t of r.tokens) toks.push({ start: off + t.startIndex, end: off + t.endIndex, scopes: t.scopes }); rs = r.ruleStack; off += line.length + 1; }
   return toks;
@@ -307,9 +315,9 @@ function tmTokenize(grammar: vsctm.IGrammar, text: string): TmTok[] {
 // ════════════════════════════════════════════════════════════════════════════
 const CONTENT_OBLIGATION = new Set<Bucket>(['keyword', 'string', 'number', 'comment']);
 
-interface CoverageResult { den: number; painted: number; uncovered: { text: string; want: string; ctx: string }[] }
+export interface CoverageResult { den: number; painted: number; uncovered: { text: string; want: string; ctx: string }[] }
 
-function leafCoverage(grammar: CstGrammar, tm: vsctm.IGrammar, opts = GEN_OPTS): CoverageResult {
+export function leafCoverage(grammar: CstGrammar, tm: vsctm.IGrammar, opts = GEN_OPTS): CoverageResult {
   const { parse } = createParser(grammar);
   const roleOf = buildRoleMap(grammar);
   const inputs = generateInputs(grammar, opts);
@@ -510,6 +518,7 @@ async function main(): Promise<void> {
     check(`reachability(${cfg.name}): no dangling self-#refs with present source`, r.danglingWithSource.length === 0, r.danglingWithSource.join(', '));
     const tc = tokenCensus(g, tmJson);
     check(`token-completeness(${cfg.name}): every non-skip token has a discharge path`, tc.orphans.length === 0, `orphans: ${tc.orphans.join(' ')}`);
+    check(`token-completeness(${cfg.name}): no flat token is neutered to the bare root scope`, tc.neutered.length === 0, `neutered: ${tc.neutered.join(' ')}`);
     const tm = await loadTmFromFiles(cfg.scopeName, { [cfg.scopeName]: cfg.tm, ...(cfg.tmExtra ?? {}) });
     let cov: CoverageResult = { den: 0, painted: 0, uncovered: [] };
     if (tm) cov = leafCoverage(g, tm);
diff --git a/test/tm-mutation.ts b/test/tm-mutation.ts
new file mode 100644
index 0000000..67160f0
--- /dev/null
+++ b/test/tm-mutation.ts
@@ -0,0 +1,207 @@
+// ─────────────────────────────────────────────────────────────────────────────
+//  tm-mutation.ts — MUTATION TESTING for the completeness gap-detector.
+//
+//  The completeness checker (test/tm-completeness.ts) proves structural properties
+//  (closure, reachability, token discharge, leaf coverage). But "the checker passes"
+//  only means something if the checker can actually FAIL when there IS a gap. A clean
+//  pass on a blind checker is worthless — the exact corpus-blindness this project has
+//  been bitten by. So this harness MEASURES the detector's power directly: it INJECTS a
+//  catalogue of known gaps into the emitted grammar (fault injection), runs every
+//  detector layer, and records which layer (if any) catches each.
+//
+//  This is the honest answer to "can every gap be found?" — not an a-priori completeness
+//  claim (the review showed ordering / disambiguation-correctness obligations are not
+//  grammar-algebraic and slide into undecidable territory), but a MEASURED kill rate:
+//
+//    • PRESENCE gaps (a token / scope / key dropped or neutered) MUST be killed by a
+//      corpus-free STRUCTURAL detector (reachability / token-census / leaf-coverage).
+//      A surviving presence mutant is a detector bug → this gate fails.
+//    • CORRECTNESS / ORDERING gaps (a disambiguation guard weakened, two patterns
+//      reordered) are EXPECTED to slip past the structural detectors — they are caught,
+//      if at all, only by a differential WITNESS (a paint change on a targeted input).
+//      Survivors here are the detector's MEASURED blind spots, reported not failed: they
+//      are the honest boundary COMPLETENESS.md draws, made empirical.
+//
+//  Run:  node test/tm-mutation.ts
+// ─────────────────────────────────────────────────────────────────────────────
+import { generateTmLanguage } from '../src/gen-tm.ts';
+import { createParser } from '../src/gen-parser.ts';
+import type { CstGrammar } from '../src/types.ts';
+import { generateInputs } from './grammar-gen.ts';
+import { buildRoleMap, leafRoles, spanBuckets, scopeAt, GEN_OPTS, type TmTok, type Bucket } from './generative-detect.ts';
+import {
+  checkReachability, tokenCensus, leafCoverage, loadTmFromObject, tmTokenize,
+  type TmGrammarJson,
+} from './tm-completeness.ts';
+
+// ── a mutation: a precise, kind-labelled fault injected into the emitted grammar ──
+type MutClass = 'presence' | 'correctness' | 'ordering';
+interface Mutation {
+  label: string;
+  cls: MutClass;
+  // mutate the (already-deep-cloned) emitted grammar in place; return false to skip
+  // (the site does not exist in this grammar — keeps the catalogue grammar-agnostic).
+  apply: (tm: any) => boolean;
+  witness?: string;       // a targeted input the differential detector tokenises
+  leaf?: string;          // the substring whose paint the differential watches
+  equivalent?: boolean;   // a true gap is created (false) vs a no-op the detector SHOULDN'T flag (true)
+}
+
+const rootIncludeIndex = (tm: any, key: string) =>
+  (tm.patterns as any[]).findIndex(p => p?.include === `#${key}`);
+// recursively delete every `{include:#key}` anywhere in the grammar (so the key truly dies)
+function dropAllIncludes(node: any, key: string): void {
+  if (!node || typeof node !== 'object') return;
+  if (Array.isArray(node)) { for (let i = node.length - 1; i >= 0; i--) { if (node[i]?.include === `#${key}`) node.splice(i, 1); else dropAllIncludes(node[i], key); } return; }
+  for (const v of Object.values(node)) dropAllIncludes(v, key);
+}
+
+// the catalogue is built PER-WITNESS: we tokenise the baseline, find the repository key that
+// ACTUALLY paints each witness leaf, and target THAT key — so a mutation creates a real gap
+// instead of an equivalent mutant (e.g. dropping #number's ROOT include is a no-op because
+// #number is still reachable from #expression; only dropping ALL includes truly kills it).
+function buildCatalogue(tm: any, paintKey: (w: string, leaf: string) => string | null): Mutation[] {
+  const root = String(tm.scopeName ?? 'source');
+  const lang = root.replace(/^(source|text)\./, '');
+  const muts: Mutation[] = [];
+  const sites: { witness: string; leaf: string; role: string }[] = [
+    { witness: 'q = 42', leaf: '42', role: 'number' },
+    { witness: 'q = "x"', leaf: '"x"', role: 'string' },
+    { witness: 'a // c', leaf: '// c', role: 'comment' },
+  ];
+  for (const s of sites) {
+    const key = paintKey(s.witness, s.leaf);
+    if (!key) continue;
+    // PRESENCE — a corpus-free structural detector must kill each of these:
+    muts.push({ label: `drop ${s.role} key (all includes + entry)`, cls: 'presence', witness: s.witness, leaf: s.leaf,
+      apply: (t) => { dropAllIncludes(t, key); delete t.repository[key]; return true; } });
+    muts.push({ label: `neuter ${s.role} scope → bare root`, cls: 'presence', witness: s.witness, leaf: s.leaf,
+      apply: (t) => { t.repository[key] = { ...t.repository[key], name: root }; if (t.repository[key].patterns || t.repository[key].begin) { delete t.repository[key].beginCaptures; delete t.repository[key].endCaptures; t.repository[key].patterns = []; } return true; } });
+    // CORRECTNESS — a VALID grammar that paints the WRONG role (leaf still painted, just wrong):
+    muts.push({ label: `mis-scope ${s.role} → keyword (wrong role, still painted)`, cls: 'correctness', witness: s.witness, leaf: s.leaf,
+      apply: (t) => { t.repository[key] = { ...t.repository[key], name: `keyword.control.${lang}` }; return true; } });
+  }
+  // PRESENCE — a real dead key (nothing includes it) and a real dangling include:
+  muts.push({ label: 'add an unreachable (dead) repo key', cls: 'presence',
+    apply: (t) => { t.repository['__orphan__'] = { match: 'zzzqqq', name: `comment.${lang}` }; return true; } });
+  muts.push({ label: 'dangling include to a missing key', cls: 'presence',
+    apply: (t) => { t.patterns.unshift({ include: '#__ghost__' }); return true; } });
+  // ORDERING — flip a disambiguation priority so a looser rule shadows a tighter one:
+  if (tm.repository['generic-call'] && rootIncludeIndex(tm, 'comparison') >= 0) {
+    muts.push({ label: 'move generic-call after comparison (priority flip)', cls: 'ordering', witness: 'a<T>(x)', leaf: 'T',
+      apply: (t) => { const gi = rootIncludeIndex(t, 'generic-call'); if (gi < 0) return false; const [g] = t.patterns.splice(gi, 1); t.patterns.push(g); return true; } });
+  }
+  return muts;
+}
+
+// ── detectors ──────────────────────────────────────────────────────────────────────
+// corpus-FREE structural detectors (the ones whose guarantee is a-priori, not sampled)
+function structuralCatches(g: CstGrammar, mutated: TmGrammarJson): string[] {
+  const hits: string[] = [];
+  const r = checkReachability(g, mutated);
+  if (r.dead.length) hits.push(`reachability:dead(${r.dead.join(',')})`);
+  if (r.danglingWithSource.length) hits.push(`reachability:dangling(${r.danglingWithSource.join(',')})`);
+  const c = tokenCensus(g, mutated);
+  if (c.orphans.length) hits.push(`token-census:orphan(${c.orphans.join(',')})`);
+  if (c.neutered.length) hits.push(`token-census:neutered(${c.neutered.join(',')})`);
+  return hits;
+}
+// load that survives an invalid mutated grammar (a broken regex) — a grammar that fails
+// to compile is itself a detectable defect, reported as compile-error rather than crashing.
+async function tryLoad(scope: string, grammar: object): Promise<{ tm: any } | { err: string }> {
+  try { const tm = await loadTmFromObject(scope, { [scope]: grammar }); return tm ? { tm } : { err: 'load-null' }; }
+  catch (e: any) { return { err: `compile-error(${String(e?.message ?? e).slice(0, 30)})` }; }
+}
+// grammar-derived-corpus detector (leaf coverage over generated inputs)
+async function corpusCatches(g: CstGrammar, scope: string, mutated: object): Promise<string | null> {
+  const r = await tryLoad(scope, mutated);
+  if ('err' in r) return `leaf-coverage:${r.err}`;
+  const cov = leafCoverage(g, r.tm, { ...GEN_OPTS, maxInputs: 250 });
+  return cov.painted < cov.den ? `leaf-coverage(${cov.painted}/${cov.den})` : null;
+}
+// targeted DIFFERENTIAL detector: did the witness leaf's paint change vs baseline?
+async function differentialCatches(scope: string, base: object, mutated: object, witness: string, leaf: string): Promise<string | null> {
+  const [bt, mt] = await Promise.all([tryLoad(scope, base), tryLoad(scope, mutated)]);
+  if ('err' in bt) return null;
+  if ('err' in mt) return `differential:${mt.err}`;
+  const at = witness.indexOf(leaf); if (at < 0) return null;
+  const bb = bucketsAt(bt.tm, witness, at, leaf.length), mb = bucketsAt(mt.tm, witness, at, leaf.length);
+  const bs = [...bb].sort().join('|'), ms = [...mb].sort().join('|');
+  return bs !== ms ? `differential({${bs||'∅'}}→{${ms||'∅'}})` : null;
+}
+function bucketsAt(tm: any, text: string, start: number, len: number): Set<Bucket> {
+  return spanBuckets(tmTokenize(tm, text), text, start, start + len);
+}
+
+// ── driver ──────────────────────────────────────────────────────────────────────────
+interface Row { grammar: string; label: string; cls: MutClass; equivalent: boolean; killedBy: string[]; survived: boolean; skipped: boolean }
+
+async function runGrammar(name: string, module: string, scope: string): Promise<Row[]> {
+  const g = (await import(module)).default as CstGrammar;
+  const base = generateTmLanguage(g) as any;
+  if (base.scopeName) scope = base.scopeName;
+  const baseTm = await loadTmFromObject(scope, { [scope]: base });
+  if (!baseTm) return [];
+  // the painting-key finder: the repo key whose `name` paints a witness leaf (sampled at the
+  // leaf's MIDDLE char, so a string's CONTENT scope is found, not its delimiter punctuation).
+  const paintKey = (witness: string, leaf: string): string | null => {
+    const at = witness.indexOf(leaf); if (at < 0) return null;
+    const inner = scopeAt(tmTokenize(baseTm, witness), at + Math.floor(leaf.length / 2)).at(-1) ?? '';
+    if (!inner || inner === scope) return null;
+    for (const [k, v] of Object.entries(base.repository) as [string, any][]) if (v?.name === inner) return k;
+    for (const [k, v] of Object.entries(base.repository) as [string, any][]) if (typeof v?.name === 'string' && inner.startsWith(v.name + '.')) return k;
+    return null;
+  };
+  const rows: Row[] = [];
+  for (const m of buildCatalogue(base, paintKey)) {
+    const mutated = structuredClone(base);
+    if (!m.apply(mutated)) { rows.push({ grammar: name, label: m.label, cls: m.cls, equivalent: !!m.equivalent, killedBy: [], survived: false, skipped: true }); continue; }
+    const killedBy = structuralCatches(g, mutated);
+    const corpus = await corpusCatches(g, scope, mutated); if (corpus) killedBy.push(corpus);
+    if (m.witness && m.leaf) { const d = await differentialCatches(scope, base, mutated, m.witness, m.leaf); if (d) killedBy.push(d); }
+    rows.push({ grammar: name, label: m.label, cls: m.cls, equivalent: !!m.equivalent, killedBy, survived: killedBy.length === 0, skipped: false });
+  }
+  return rows;
+}
+
+async function main(): Promise<void> {
+  const GRAMMARS = [
+    { name: 'typescript', module: '../typescript.ts', scope: 'source.ts' },
+    { name: 'yaml', module: '../yaml.ts', scope: 'source.yaml' },
+  ];
+  const rows: Row[] = [];
+  for (const cfg of GRAMMARS) rows.push(...await runGrammar(cfg.name, cfg.module, cfg.scope));
+
+  console.log('── mutation testing: which detector layer kills each injected gap ──\n');
+  for (const r of rows) {
+    const mark = r.skipped ? '·' : r.equivalent ? (r.survived ? '✓' : '⚠') : r.survived ? '✗' : '✓';
+    const by = r.skipped ? '(site n/a — skipped)'
+      : r.equivalent ? (r.survived ? 'correctly NOT flagged (no-op mutant)' : `FALSE ALARM: ${r.killedBy.join(' ')}`)
+      : r.survived ? 'SURVIVED — no detector caught it' : r.killedBy.join('  ');
+    console.log(`  ${mark} [${r.cls.padEnd(11)}]${r.equivalent ? '[equiv]' : '       '} ${r.grammar.padEnd(11)} ${r.label.padEnd(52)} ${by}`);
+  }
+
+  const live = rows.filter(r => !r.skipped);
+  const real = live.filter(r => !r.equivalent);
+  const presence = real.filter(r => r.cls === 'presence');
+  const presenceSurvivors = presence.filter(r => r.survived);
+  const structuralKill = (r: Row) => r.killedBy.some(k => k.startsWith('reachability') || k.startsWith('token-census'));
+  const corrOrder = real.filter(r => r.cls !== 'presence');
+  const corrOrderSurvivors = corrOrder.filter(r => r.survived);
+  const falseAlarms = live.filter(r => r.equivalent && !r.survived);
+
+  console.log('\n── measured detection power ──');
+  console.log(`  presence gaps        : ${presence.length - presenceSurvivors.length}/${presence.length} killed · ${presence.filter(structuralKill).length}/${presence.length} by a CORPUS-FREE structural detector`);
+  console.log(`  correctness/ordering : ${corrOrder.length - corrOrderSurvivors.length}/${corrOrder.length} caught (differential) · ${corrOrderSurvivors.length} survived (measured blind spot)`);
+  console.log(`  equivalent controls  : ${falseAlarms.length} false alarm(s) (a precision bug if > 0)`);
+
+  // GATE: every real presence gap MUST be killed; no equivalent mutant may be falsely flagged.
+  // correctness/ordering survivors are the honest, documented boundary — reported, not failed.
+  const failures = [...presenceSurvivors.map(r => `presence SURVIVED: ${r.grammar} — ${r.label}`),
+    ...falseAlarms.map(r => `FALSE ALARM on equivalent mutant: ${r.grammar} — ${r.label}`)];
+  if (failures.length) { console.log('\n✗ detector defect(s):'); for (const f of failures) console.log(`    - ${f}`); process.exit(1); }
+  console.log(`\n✓ every presence gap killed, no false alarms; correctness/ordering blind spots measured = ${corrOrderSurvivors.length} (the boundary COMPLETENESS.md states).`);
+  void createParser; void buildRoleMap; void leafRoles; void generateInputs;
+}
+
+if ((import.meta as any).main) await main();

From 978cd2c73f3e454c96742e06fd117d59948770b6 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sat, 20 Jun 2026 20:00:00 +0800
Subject: [PATCH 03/14] Make keyword completeness decidable, not
 corpus-witnessed (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The keyword/operator obligation was the one class still checked by the
grammar-derived corpus (leaf coverage). Replace it with a structural,
a-priori discharge: `literalDischarge` confirms every alphabetic literal the
grammar consumes (collectLiterals over every rule + the prec/led tables)
appears, as a scoped word, in some REACHABLE pattern whose scope is a keyword
family — a finite scan of the emitted artifact, no corpus. 248/248 across the
six grammars; non-vacuous (stripping `class` from its patterns reports it
undischarged).

Completeness is now a decidable structural check end to end — token discharge
(census, incl. neuter) + keyword-literal discharge + repository reachability,
952/952 = 100%, no corpus. The leaf-coverage corpus pass is demoted to a
redundant differential cross-check on the soundness axis; `tm-mutation`'s
structural layer now also kills a dropped/neutered keyword corpus-free.

COMPLETENESS.md draws the line correctly: COMPLETENESS (present + reachable +
scoped) is decidable — finite G, finite gen-tm(G), an obligation taxonomy
bounded by TextMate's finite construct kinds, per-token discharge by
structural identity (the flat `match` IS the token's own pattern) — and ∀ G by
structural induction over the finite combinator algebra. SOUNDNESS (do the
present constructs paint correctly on all inputs — wrong-role, pattern
ordering) is the undecidable residual (CFG vs regex-stack-machine over infinite
input). The earlier "a-priori completeness is unavailable" was an
over-concession: that was soundness's wall, mistaken for completeness's.
---
 COMPLETENESS.md         |  71 ++++++++++++++++-----------
 test/tm-completeness.ts | 104 ++++++++++++++++++++++++++++++----------
 test/tm-mutation.ts     |   4 +-
 3 files changed, 125 insertions(+), 54 deletions(-)

diff --git a/COMPLETENESS.md b/COMPLETENESS.md
index 321680b..2f36a54 100644
--- a/COMPLETENESS.md
+++ b/COMPLETENESS.md
@@ -46,7 +46,10 @@ value of a **closed union**: `RuleExpr` has 15 constructors and `TokenPattern` h
 `aliasScopes`, `expressionRule`, `manifest`). An *obligation* is induced by a
 constructor-occurrence or a config-field-occurrence. So completeness reduces to: **for each
 obligation generator, the generator has a discharging, reachable emission** — three
-mechanically-checkable layers.
+mechanically-checkable layers. Both sides are finite — a finite `G`, a finite `gen-tm(G)`, and an
+obligation taxonomy bounded by TextMate's finite construct kinds — so completeness is a **decidable**
+property per grammar, and holds **∀ G by structural induction** over the finite combinator algebra
+(finitely many cases). It is checked a-priori on the emitted artifact, with no corpus.
 
 ## Layer A — closure: the universe is the algebra, and lowering is total
 
@@ -95,11 +98,16 @@ co-blind):
   region machinery (a `markup` grammar emits no per-token keys), or a region that owns the
   token's delimiter (the JSX `/>` / `</` punctuation). `tokenCensus` classifies every token and
   asserts **zero orphans** — the emitter-completeness proof for tokens.
-- **Keyword literals & Pratt operators** are discharged through the flat backbone (A3) and the
-  prec-table path; the `op`/`prefix`/`postfix` markers carry no literal (they route to
-  `collectLiterals`' default), and an operator's scope comes from the prec-table value, not from
-  a walked marker — so those three constructors being unwalked anywhere is *benign*, confirmed
-  by adversarial review.
+- **Keyword literals & Pratt operators** bear a keyword-scope obligation, **discharged
+  structurally** (`literalDischarge`): every alphabetic literal the grammar consumes (from
+  `collectLiterals` over every rule, plus the prec / led tables) must appear, as a *scoped word*,
+  in some **reachable** pattern whose scope is a keyword family (`keyword.` / `storage.` /
+  `constant.language` / …). This is a finite scan of the emitted artifact — **no corpus** — asking
+  only whether a scoping pattern is *present* (completeness); whether its guard fires correctly is
+  soundness. **248/248** discharged across the six grammars. (The `op`/`prefix`/`postfix` markers
+  carry no literal — they route to `collectLiterals`' default — and an operator's scope comes from
+  the prec-table value, so those three constructors being unwalked is *benign*, confirmed by
+  adversarial review.)
 - **Shapes** (JSX elements, generic/cast angle brackets, regex context, declarations, ternary,
   conditional types, arrow params, contextual operators/modifiers) and **config surfaces**
   (markup, indent, newline, `expressionRule`, `aliasScopes`, `canonicalRepoNames`, `manifest`,
@@ -107,11 +115,11 @@ co-blind):
   rather than on TypeScript-specific names — the detector-completeness requirement — is held by
   `test/agnostic.ts` (synthetic grammars with deliberately non-TS names/delimiters).
 
-The empirical witness that all of the above actually paint is **leaf coverage**: over the
-deterministic grammar-derived corpus (`test/grammar-gen.ts`), every parsed leaf whose
-by-construction role (`buildRoleMap`) is a content/keyword role (keyword / string / number /
-comment) is confirmed to receive a non-root scope. The denominator is fixed (the obligation
-leaves). Result: **2433/2433 across all six grammars.**
+So token, keyword/operator, and reachability discharge are **decidable structural checks on the
+emitted artifact** — the fixed denominator in the ledger, **952/952, no corpus**. A *redundant*
+corpus cross-check (**leaf coverage**: over the deterministic grammar-derived corpus, every parsed
+leaf whose by-construction role is content/keyword gets a non-root scope — **2433/2433**) is kept as
+a differential witness on the soundness axis; it is not the guarantee.
 
 ## Measuring the detector — mutation testing
 
@@ -134,12 +142,19 @@ The honest, measured result:
   artifact's *sequence*, not the grammar's algebra — so no corpus-free structural check reaches it,
   and a scope-preserving reorder slips even the bucket-level differential.
 
-So the claim this document makes is bounded and measured: **every presence / reachability gap is
-caught corpus-free** (mutation-proven, the gate); **wrong-role and ordering gaps are the soundness /
-interaction axis**, reached only by evaluation (the differential, or `test/gap-ledger.ts`), never by
-a grammar-algebraic proof. An a-priori "no gap can hide" over the *whole* gap space is not available
-— ordering and correctness obligations live in the emitted artifact and slide toward regex-vs-CFG
-undecidability — and this document does not claim it.
+The line is precise. **Completeness — every required construct PRESENT + REACHABLE + visually scoped
+— is DECIDABLE**, and decided a-priori with no corpus: a finite grammar `G`, a finite emitted artifact
+`gen-tm(G)`, a finite obligation taxonomy (bounded by TextMate's finite construct kinds), and per-token
+discharge by *structural identity* (the flat `match` **is** `tokenPatternSource(t)`, so no semantic
+regex-matching is needed). ∀ `G` follows by structural induction over the finite combinator algebra.
+What is **undecidable is soundness** — do the present constructs paint *correctly on all inputs*: a
+wrong-role paint, or which of two overlapping patterns *wins* (ordering), is an agreement between a
+CFG-derived role and a regex-stack-machine tokenizer over an infinite input space, which slides into
+regex-vs-CFG undecidability (Oniguruma's `\g<>`/backreferences are non-regular). So this document
+proves completeness and *measures* its detector (mutation testing); soundness it does not claim to
+decide — that is `test/gap-ledger.ts`'s by-construction + corpus axis. The earlier framing that
+"a-priori completeness over the whole gap space is unavailable" was an over-concession: completeness
+is available; it was soundness's wall, mistaken for completeness's.
 
 ## Reachability — root ∪ export surfaces
 
@@ -189,17 +204,17 @@ Auto-generated by `node test/tm-completeness.ts --write`; `--check` fails CI if
 
 <!-- COMPLETENESS-LEDGER:START — auto-generated by `node test/tm-completeness.ts --write`; do not edit by hand. -->
 
-| Grammar | Tokens | Keyword literals | Operators | Repo keys (reachable) | Leaf obligations (painted) |
-|---|---:|---:|---:|---:|---:|
-| typescript | 11/11 | 73 | 53 | 158/158 | 199/199 |
-| javascript | 11/11 | 48 | 51 | 103/103 | 131/131 |
-| typescriptreact | 13/13 | 73 | 53 | 171/171 | 169/169 |
-| javascriptreact | 13/13 | 48 | 51 | 116/116 | 121/121 |
-| html | 7/7 | 0 | 0 | 28/28 | 175/175 |
-| yaml | 19/19 | 0 | 0 | 54/54 | 1638/1638 |
-| **total** | **74/74** | **242** | **208** | **630/630** | **2433/2433** |
-
-**Fixed-denominator completeness: 3137/3137 = 100.00%** (token discharge 74/74 · repository reachability 630/630 · leaf painting 2433/2433). Keyword literals (242) and Pratt operators (208) are discharged through the leaf-painting column. **0 open completeness gaps.**
+| Grammar | Tokens | Keyword literals | Repo keys (reachable) | Leaf cross-check (corpus) |
+|---|---:|---:|---:|---:|
+| typescript | 11/11 | 73/73 | 158/158 | 199/199 |
+| javascript | 11/11 | 51/51 | 103/103 | 131/131 |
+| typescriptreact | 13/13 | 73/73 | 171/171 | 169/169 |
+| javascriptreact | 13/13 | 51/51 | 116/116 | 121/121 |
+| html | 7/7 | 0/0 | 28/28 | 175/175 |
+| yaml | 19/19 | 0/0 | 54/54 | 1638/1638 |
+| **total** | **74/74** | **248/248** | **630/630** | **2433/2433** |
+
+**Decidable completeness: 952/952 = 100.00%** (token discharge 74/74 · keyword-literal discharge 248/248 · repository reachability 630/630) — a structural check on the emitted artifact, no corpus. Leaf cross-check (corpus, redundant): 2433/2433. **0 open completeness gaps.**
 
 <!-- COMPLETENESS-LEDGER:END -->
 
diff --git a/test/tm-completeness.ts b/test/tm-completeness.ts
index 72b7773..37b5626 100644
--- a/test/tm-completeness.ts
+++ b/test/tm-completeness.ts
@@ -336,6 +336,61 @@ export function leafCoverage(grammar: CstGrammar, tm: vsctm.IGrammar, opts = GEN
   return { den, painted, uncovered };
 }
 
+// ════════════════════════════════════════════════════════════════════════════
+//  STRUCTURAL literal discharge — DECIDABLE keyword completeness (no corpus)
+//
+//  Every alphabetic literal/operator the grammar consumes bears a keyword-scope obligation.
+//  It is discharged iff it appears, as a SCOPED word, in some REACHABLE pattern whose scope
+//  is a keyword family. This is a finite, structural check on the emitted artifact — the
+//  a-priori (not corpus-witnessed) proof that every keyword is scoped. It asks only whether a
+//  scoping pattern is PRESENT (completeness); whether its guard fires correctly is soundness.
+// ════════════════════════════════════════════════════════════════════════════
+const KEYWORD_FAMILY = /^(keyword|storage|constant\.language|support\.(type|class|function|constant)|variable\.language|entity\.name\.(type|tag)|punctuation\.definition\.keyword)/;
+
+// every reachable pattern NODE (root ∪ export surfaces), the same closure as checkReachability
+function reachableNodes(g: CstGrammar, tmJson: TmGrammarJson): any[] {
+  const scope = tmJson.scopeName ?? `source.${g.name}`;
+  const repo = (tmJson.repository ?? {}) as Record<string, any>;
+  const reached = new Set<string>(); const queue: string[] = []; const out: any[] = [];
+  const visit = (node: any): void => {
+    if (!node || typeof node !== 'object') return;
+    if (Array.isArray(node)) { node.forEach(visit); return; }
+    out.push(node);
+    if (typeof node.include === 'string') { const inc: string = node.include; if (inc.startsWith('#')) queue.push(inc.slice(1)); else if (inc.startsWith(scope + '#')) queue.push(inc.slice(scope.length + 1)); }
+    if (node.patterns) visit(node.patterns);
+    for (const c of ['captures', 'beginCaptures', 'endCaptures', 'whileCaptures']) if (node[c]) for (const v of Object.values(node[c])) visit(v);
+  };
+  visit(tmJson.patterns ?? []);
+  if (g.expressionRule) queue.push('expression');
+  for (const k of Object.keys(g.canonicalRepoNames ?? {})) queue.push(k);
+  while (queue.length) { const k = queue.shift()!; if (reached.has(k)) continue; reached.add(k); if (repo[k]) visit(repo[k]); }
+  return out;
+}
+// the alphabetic words a node SCOPES under a keyword-family scope (lookarounds + `\b`/`\w`-escapes
+// stripped so a word-boundary doesn't fuse with the word, e.g. `\bfrom\b` → `from`, not `bfrom`)
+function scopedAtoms(nodes: any[]): Set<string> {
+  const out = new Set<string>();
+  const keywordScoped = (n: any): boolean => (typeof n.name === 'string' && KEYWORD_FAMILY.test(n.name))
+    || (['captures', 'beginCaptures', 'endCaptures'] as const).some(c => n[c] && Object.values(n[c]).some((cc: any) => typeof cc?.name === 'string' && KEYWORD_FAMILY.test(cc.name)));
+  for (const n of nodes) {
+    if (!keywordScoped(n)) continue;
+    const re = (n.match ?? n.begin ?? '') as string;
+    const cleaned = re.replace(/\(\?<?[=!][^)]*\)/g, ' ').replace(/\\[a-zA-Z]/g, ' ');
+    for (const w of cleaned.match(/[A-Za-z][A-Za-z0-9_$]*/g) ?? []) out.add(w);
+  }
+  return out;
+}
+export interface LiteralDischarge { obl: number; gaps: string[] }
+export function literalDischarge(g: CstGrammar, tmJson: TmGrammarJson): LiteralDischarge {
+  const scoped = scopedAtoms(reachableNodes(g, tmJson));
+  const lits = new Set<string>();
+  for (const r of g.rules) for (const l of collectLiterals(r.body)) if (isKeywordLiteral(l)) lits.add(l.replace(/^@/, ''));
+  for (const p of g.precs) for (const o of p.operators) if (isKeywordLiteral(o.value)) lits.add(o.value);
+  for (const lp of g.ledPrecs ?? []) if (isKeywordLiteral(lp.connector)) lits.add(lp.connector);
+  const gaps = [...lits].filter(l => !scoped.has(l)).sort();
+  return { obl: lits.size, gaps };
+}
+
 // ════════════════════════════════════════════════════════════════════════════
 //  LAYER A (cont.) — the literal-collection backbone is total + drops nothing consumed
 //
@@ -439,21 +494,16 @@ const GRAMMARS: GrammarCfg[] = [
 interface LedgerRow {
   name: string;
   tokenObl: number; tokenDisch: number;        // non-skip tokens, each → a discharge path
-  litObl: number;                              // distinct keyword literals (painted ⇐ leaf coverage)
-  opObl: number;                               // distinct Pratt operators
+  litObl: number; litDisch: number;            // alphabetic keyword literals, each → a reachable keyword-scoped pattern (structural)
   keyObl: number; keyReach: number;            // repository keys, each → reachable
-  leafObl: number; leafPaint: number;          // empirical content/keyword leaves, each → painted
+  leafObl: number; leafPaint: number;          // empirical content/keyword leaves (the corpus cross-check)
 }
-function ledgerRow(name: string, g: CstGrammar, tmJson: TmGrammarJson, r: ReachResult, tc: TokenCensus, cov: CoverageResult): LedgerRow {
-  const lits = new Set<string>();
-  for (const rule of g.rules) for (const l of collectLiterals(rule.body)) if (isKeywordLiteral(l)) lits.add(l);
-  const ops = new Set<string>();
-  for (const p of g.precs) for (const o of p.operators) ops.add(o.value);
-  for (const lp of g.ledPrecs ?? []) ops.add(lp.connector);
+function ledgerRow(name: string, g: CstGrammar, r: ReachResult, tc: TokenCensus, ld: LiteralDischarge, cov: CoverageResult): LedgerRow {
+  const nonSkip = g.tokens.filter(t => !t.flags.includes('skip')).length;
   return {
     name,
-    tokenObl: g.tokens.filter(t => !t.flags.includes('skip')).length, tokenDisch: g.tokens.filter(t => !t.flags.includes('skip')).length - tc.orphans.length,
-    litObl: lits.size, opObl: ops.size,
+    tokenObl: nonSkip, tokenDisch: nonSkip - tc.orphans.length - tc.neutered.length,
+    litObl: ld.obl, litDisch: ld.obl - ld.gaps.length,
     keyObl: r.repoKeys, keyReach: r.repoKeys - r.dead.length,
     leafObl: cov.den, leafPaint: cov.painted,
   };
@@ -464,21 +514,23 @@ function renderLedger(rows: LedgerRow[]): string {
   const L: string[] = [];
   L.push('<!-- COMPLETENESS-LEDGER:START — auto-generated by `node test/tm-completeness.ts --write`; do not edit by hand. -->');
   L.push('');
-  L.push('| Grammar | Tokens | Keyword literals | Operators | Repo keys (reachable) | Leaf obligations (painted) |');
-  L.push('|---|---:|---:|---:|---:|---:|');
-  const sum = { t: 0, td: 0, lit: 0, op: 0, k: 0, kr: 0, lf: 0, lp: 0 };
+  L.push('| Grammar | Tokens | Keyword literals | Repo keys (reachable) | Leaf cross-check (corpus) |');
+  L.push('|---|---:|---:|---:|---:|');
+  const sum = { t: 0, td: 0, lit: 0, ld: 0, k: 0, kr: 0, lf: 0, lp: 0 };
   for (const r of rows) {
-    L.push(`| ${r.name} | ${r.tokenDisch}/${r.tokenObl} | ${r.litObl} | ${r.opObl} | ${r.keyReach}/${r.keyObl} | ${r.leafPaint}/${r.leafObl} |`);
-    sum.t += r.tokenObl; sum.td += r.tokenDisch; sum.lit += r.litObl; sum.op += r.opObl;
+    L.push(`| ${r.name} | ${r.tokenDisch}/${r.tokenObl} | ${r.litDisch}/${r.litObl} | ${r.keyReach}/${r.keyObl} | ${r.leafPaint}/${r.leafObl} |`);
+    sum.t += r.tokenObl; sum.td += r.tokenDisch; sum.lit += r.litObl; sum.ld += r.litDisch;
     sum.k += r.keyObl; sum.kr += r.keyReach; sum.lf += r.leafObl; sum.lp += r.leafPaint;
   }
-  L.push(`| **total** | **${sum.td}/${sum.t}** | **${sum.lit}** | **${sum.op}** | **${sum.kr}/${sum.k}** | **${sum.lp}/${sum.lf}** |`);
+  L.push(`| **total** | **${sum.td}/${sum.t}** | **${sum.ld}/${sum.lit}** | **${sum.kr}/${sum.k}** | **${sum.lp}/${sum.lf}** |`);
   L.push('');
-  // the fixed denominator = every measured obligation (token-discharge + key-reachability + leaf-painting)
-  const den = sum.t + sum.k + sum.lf, num = sum.td + sum.kr + sum.lp;
-  L.push(`**Fixed-denominator completeness: ${num}/${den} = ${(100 * num / den).toFixed(2)}%** ` +
-    `(token discharge ${sum.td}/${sum.t} · repository reachability ${sum.kr}/${sum.k} · leaf painting ${sum.lp}/${sum.lf}). ` +
-    `Keyword literals (${sum.lit}) and Pratt operators (${sum.op}) are discharged through the leaf-painting column. ` +
+  // the DECIDABLE fixed denominator = the structural obligations (token discharge + keyword-literal
+  // discharge + repository reachability), checked a-priori on the emitted artifact, no corpus. The
+  // leaf cross-check is the redundant corpus witness (the soundness-axis dual), reported separately.
+  const den = sum.t + sum.k + sum.lit, num = sum.td + sum.kr + sum.ld;
+  L.push(`**Decidable completeness: ${num}/${den} = ${(100 * num / den).toFixed(2)}%** ` +
+    `(token discharge ${sum.td}/${sum.t} · keyword-literal discharge ${sum.ld}/${sum.lit} · repository reachability ${sum.kr}/${sum.k}) — ` +
+    `a structural check on the emitted artifact, no corpus. Leaf cross-check (corpus, redundant): ${sum.lp}/${sum.lf}. ` +
     `${num === den ? '**0 open completeness gaps.**' : `**${den - num} OPEN GAP(S).**`}`);
   L.push('');
   L.push('<!-- COMPLETENESS-LEDGER:END -->');
@@ -519,14 +571,16 @@ async function main(): Promise<void> {
     const tc = tokenCensus(g, tmJson);
     check(`token-completeness(${cfg.name}): every non-skip token has a discharge path`, tc.orphans.length === 0, `orphans: ${tc.orphans.join(' ')}`);
     check(`token-completeness(${cfg.name}): no flat token is neutered to the bare root scope`, tc.neutered.length === 0, `neutered: ${tc.neutered.join(' ')}`);
+    const ld = literalDischarge(g, tmJson);
+    check(`literal-completeness(${cfg.name}): every keyword literal/operator is in a reachable keyword-scoped pattern`, ld.gaps.length === 0, `undischarged: ${ld.gaps.join(' ')}`);
     const tm = await loadTmFromFiles(cfg.scopeName, { [cfg.scopeName]: cfg.tm, ...(cfg.tmExtra ?? {}) });
     let cov: CoverageResult = { den: 0, painted: 0, uncovered: [] };
     if (tm) cov = leafCoverage(g, tm);
-    check(`coverage(${cfg.name}): every content/keyword obligation leaf is painted`, cov.painted === cov.den,
+    check(`coverage cross-check(${cfg.name}): every content/keyword obligation leaf is painted`, cov.painted === cov.den,
       cov.uncovered.map(u => `"${u.text}"(${u.want})`).slice(0, 8).join(' '));
-    rows.push(ledgerRow(cfg.name, g, tmJson, r, tc, cov));
+    rows.push(ledgerRow(cfg.name, g, r, tc, ld, cov));
     const pct = cov.den ? (100 * cov.painted / cov.den).toFixed(2) : '—';
-    console.log(`  ${cfg.name.padEnd(17)} repo ${String(r.repoKeys).padStart(3)} · dead ${r.dead.length} · tokens ${tc.total - tc.skip - tc.orphans.length}/${tc.total - tc.skip} · leaf-coverage ${cov.painted}/${cov.den} = ${pct}%`);
+    console.log(`  ${cfg.name.padEnd(17)} repo ${String(r.repoKeys).padStart(3)} · dead ${r.dead.length} · tokens ${tc.total - tc.skip - tc.orphans.length}/${tc.total - tc.skip} · keyword-literals ${ld.obl - ld.gaps.length}/${ld.obl} · leaf-xcheck ${cov.painted}/${cov.den}`);
     if (cov.uncovered.length) for (const u of cov.uncovered.slice(0, 6)) console.log(`      UNCOVERED "${u.text}" want ${u.want} ctx …${u.ctx}…`);
   }
 
diff --git a/test/tm-mutation.ts b/test/tm-mutation.ts
index 67160f0..5b817f3 100644
--- a/test/tm-mutation.ts
+++ b/test/tm-mutation.ts
@@ -30,7 +30,7 @@ import type { CstGrammar } from '../src/types.ts';
 import { generateInputs } from './grammar-gen.ts';
 import { buildRoleMap, leafRoles, spanBuckets, scopeAt, GEN_OPTS, type TmTok, type Bucket } from './generative-detect.ts';
 import {
-  checkReachability, tokenCensus, leafCoverage, loadTmFromObject, tmTokenize,
+  checkReachability, tokenCensus, literalDischarge, leafCoverage, loadTmFromObject, tmTokenize,
   type TmGrammarJson,
 } from './tm-completeness.ts';
 
@@ -104,6 +104,8 @@ function structuralCatches(g: CstGrammar, mutated: TmGrammarJson): string[] {
   const c = tokenCensus(g, mutated);
   if (c.orphans.length) hits.push(`token-census:orphan(${c.orphans.join(',')})`);
   if (c.neutered.length) hits.push(`token-census:neutered(${c.neutered.join(',')})`);
+  const ld = literalDischarge(g, mutated);
+  if (ld.gaps.length) hits.push(`literal-discharge(${ld.gaps.slice(0, 3).join(',')})`);
   return hits;
 }
 // load that survives an invalid mutated grammar (a broken regex) — a grammar that fails

From 387b6508c46081351a6f29db81075609be865e87 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sat, 20 Jun 2026 21:37:14 +0800
Subject: [PATCH 04/14] Fix two more sep-omission twins of the type-param
 keyword drop (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

An adversarial gap-hunt (5 agents over the classes the structural checker
does NOT cover) found that the getTypeParamElementKeywords `sep` drop had
un-fixed siblings in the same family:

- detectTypeParamConstraintKeywords.scanConstraint omitted `sep`, so a
  type-parameter CONSTRAINT keyword reached through a `&`/`,`-separated list
  was not collected — the .tsx generic-arrow⇄JSX no-comma disambiguation lost
  its constraint signal (mis-scoping the header as a JSX tag).
- detectDeclarations.containsBlockRef omitted `sep`, so a declaration whose
  brace body is reached through a `sep` was not seen as having a body — its
  #declaration-body member-scoping region was dropped.

Both recurse into `sep.element` now, mirroring the prior fix. Byte-identical
on all six shipped grammars (latent: every shipped grammar writes constraints
as `opt('extends', Type)` and block bodies as direct refs).

The hunt found 8 latent completeness gaps total; all were VERIFIED latent —
tokenizing the witnesses against the shipped grammars shows ternary, calls,
trailing-comma type params, and every prec operator are correctly scoped, so
the 0-gap soundness ledger is real, not corpus-blind. The remaining gaps are
detector SHAPE-FRAGILITY (fixed-offset window matching in detectTernary /
detectCallExpression / isAngleBracketSepRule misses equivalent factorings),
an unimplemented `rawBlock` config region, and punctuation-in-region — tracked
for follow-up; none triggers on a shipped grammar.
---
 src/gen-tm.ts | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/gen-tm.ts b/src/gen-tm.ts
index 46372f5..ccaad9f 100644
--- a/src/gen-tm.ts
+++ b/src/gen-tm.ts
@@ -1109,6 +1109,10 @@ function detectTypeParamConstraintKeywords(grammar: CstGrammar, typeArgRule: str
       for (const it of (expr as { items: RuleExpr[] }).items) scanConstraint(it);
     } else if (expr.type === 'group') {
       scanConstraint((expr as { body: RuleExpr }).body);
+    } else if (expr.type === 'sep') {
+      // a constraint keyword reached through a `&`/`,`-separated sub-list is just as direct —
+      // recurse into the element (mirrors getTypeParamElementKeywords' `sep` arm).
+      scanConstraint((expr as { element: RuleExpr }).element);
     }
   };
   for (const rule of grammar.rules) {
@@ -3157,6 +3161,7 @@ function detectDeclarations(grammar: CstGrammar, tokenNames: Set<string>): DeclI
     if (expr.type === 'ref') return isBlockRule(expr.name);
     if (expr.type === 'seq' || expr.type === 'alt') return expr.items.some(containsBlockRef);
     if (expr.type === 'quantifier' || expr.type === 'group') return containsBlockRef(expr.body);
+    if (expr.type === 'sep') return containsBlockRef(expr.element);
     return false;
   }
 

From dc0079d98ae2b74eba4d2d2c19d2f7459a76df27 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sat, 20 Jun 2026 21:55:01 +0800
Subject: [PATCH 05/14] Root-cause: match normalized forms in shape-detectors
 (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The fixed-window detector fragility the gap-hunt surfaced was a symptom; the
root cause is that the shape-detectors pattern-match RAW RuleExpr shapes, so an
equivalent factoring of the same construct (an `opt`-tail, a separate args
rule, a trailing comma) is a different shape and slips them. Adding a `sep` arm
per detector only widens the symptom patch — the next factoring still slips.

Fix the structural condition: match the NORMALISED form. expandAlts already
canonicalises alt-split / opt-tail / group / quantifier into the same flat
adjacency; route the three fragile detectors through it, plus FIRST-set for the
one ref-hidden case:

- detectTernary, isAngleBracketSepRule → expandAlts (opt-tail and trailing-comma
  factorings now reduce to the matched adjacency; `sep` stays opaque so the sep
  node survives).
- detectCallExpression → FIRST(next) instead of a literal `(`, so the args may be
  the inline `(` OR a separate `CallArgs` rule referenced after the callee.

Byte-identical on all six shipped grammars (they already write the canonical
factoring); the three latent gaps are now emitted for their equivalent forms
(verified: #ternary-expression / #function-call / #declaration-type-params).

And the detector is no longer blind to this class: a new shape-robustness gate
(test/tm-completeness.ts) asserts each construct emits its region for EVERY
equivalent factoring — it bites (3 failures) without the normalization. So the
class is now caught structurally, not just by an adversarial hunt.
---
 src/gen-tm.ts           | 39 ++++++++++++++++++++------------
 test/tm-completeness.ts | 50 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+), 14 deletions(-)

diff --git a/src/gen-tm.ts b/src/gen-tm.ts
index ccaad9f..dd4101e 100644
--- a/src/gen-tm.ts
+++ b/src/gen-tm.ts
@@ -1997,20 +1997,22 @@ function generateTypeCastPattern(
  * identifiers before '(' as entity.name.function.
  */
 function detectCallExpression(grammar: CstGrammar): boolean {
+  const byName = new Map(grammar.rules.map(r => [r.name, r.body]));
+  // A call is a ref (the callee) immediately followed by something that STARTS with `(` — the
+  // arg list. Checking FIRST(next) instead of a literal `(` makes the detection transparent to
+  // factoring: the args may be the literal `(` directly, OR a separate rule (`CallArgs = '(' … ')'`)
+  // referenced after the callee — both have `(` in their FIRST set.
+  const startsWithParen = (e: RuleExpr) => firstLiterals(e, byName).has('(');
   function checkSeq(items: RuleExpr[]): boolean {
     for (let i = 0; i < items.length - 1; i++) {
-      if (items[i].type === 'ref' &&
-          items[i + 1].type === 'literal' &&
-          (items[i + 1] as { value: string }).value === '(') {
-        return true;
-      }
+      if (items[i].type === 'ref' && startsWithParen(items[i + 1])) return true;
     }
     return false;
   }
 
   function walk(expr: RuleExpr): boolean {
-    if (expr.type === 'seq') return checkSeq(expr.items) || expr.items.some(walk);
-    if (expr.type === 'alt') return expr.items.some(walk);
+    if (expandAlts(expr).some(checkSeq)) return true;   // normalized factorings (opt/alt/group)
+    if (expr.type === 'seq' || expr.type === 'alt') return expr.items.some(walk);
     if (expr.type === 'quantifier' || expr.type === 'group') return walk(expr.body);
     if (expr.type === 'sep') return walk(expr.element);
     return false;
@@ -2185,9 +2187,13 @@ function detectTernary(grammar: CstGrammar): boolean {
     return false;
   }
 
+  // Match on the NORMALIZED forms (expandAlts canonicalises equivalent factorings — an
+  // `opt('?', $, ':', $)` tail, an alt-split, a group — into the same flat adjacency), plus a
+  // recurse into `sep` elements (expandAlts treats `sep` as opaque). So a ternary written any of
+  // those equivalent ways is detected, not only the one flat 5-window factoring.
   function walk(expr: RuleExpr): boolean {
-    if (expr.type === 'seq') return checkSeq(expr.items) || expr.items.some(walk);
-    if (expr.type === 'alt') return expr.items.some(walk);
+    if (expandAlts(expr).some(checkSeq)) return true;
+    if (expr.type === 'seq' || expr.type === 'alt') return expr.items.some(walk);
     if (expr.type === 'quantifier' || expr.type === 'group') return walk(expr.body);
     if (expr.type === 'sep') return walk(expr.element);
     return false;
@@ -3093,11 +3099,16 @@ interface DeclInfo {
 }
 
 function isAngleBracketSepRule(body: RuleExpr): boolean {
-  if (body.type !== 'seq' || body.items.length !== 3) return false;
-  const [first, second, third] = body.items;
-  return first.type === 'literal' && first.value === '<' &&
-         second.type === 'sep' && second.delimiter === ',' &&
-         third.type === 'literal' && third.value === '>';
+  // Match on the NORMALIZED forms so an equivalent factoring — a trailing `opt(',')`, an
+  // alt-split, a group wrapper — reduces to the same `'<' sep '>'` adjacency. expandAlts keeps
+  // `sep` opaque (it is in its default case), so the sep node survives the expansion.
+  return expandAlts(body).some(items => {
+    if (items.length !== 3) return false;
+    const [first, second, third] = items;
+    return first.type === 'literal' && first.value === '<' &&
+           second.type === 'sep' && second.delimiter === ',' &&
+           third.type === 'literal' && third.value === '>';
+  });
 }
 
 function getTypeParamElementKeywords(body: RuleExpr, grammar: CstGrammar): string[] {
diff --git a/test/tm-completeness.ts b/test/tm-completeness.ts
index 37b5626..4b28094 100644
--- a/test/tm-completeness.ts
+++ b/test/tm-completeness.ts
@@ -476,6 +476,53 @@ async function regionKeywordProbe(): Promise<void> {
   }
 }
 
+// ════════════════════════════════════════════════════════════════════════════
+//  SHAPE ROBUSTNESS — a shape-detector must fire on EVERY equivalent factoring
+//
+//  The ~24 shape detectors in gen-tm.ts recognise a construct (ternary / call / generic
+//  type-params / …) by its structure. A detector that matches one FIXED factoring (a flat
+//  5-window ternary, an inline `(`, a 3-item `<sep>`) silently drops the SAME construct
+//  written an equivalent way (an `opt`-tail, a separate args rule, a trailing comma) — the
+//  detector-fragility class the gap-hunt surfaced. The root-cause fix is to match a NORMALISED
+//  form (expandAlts + FIRST); this gate holds it: for each construct, several equivalent
+//  factorings must ALL emit the same region key. It BITES if a detector regresses to a fixed
+//  shape (a factoring loses the region).
+// ════════════════════════════════════════════════════════════════════════════
+function checkShapeRobustness(): void {
+  const Id = token(plus(range('a', 'z')), { identifier: true });
+  const emits = (key: string, build: () => Record<string, any>): boolean => {
+    try { return !!(generateTmLanguage(defineGrammar(build() as any) as any) as any).repository[key]; }
+    catch { return false; }
+  };
+  // each construct, in several EQUIVALENT factorings; the region key must be present in all.
+  const constructs: { name: string; key: string; factorings: { label: string; build: () => Record<string, any> }[] }[] = [
+    {
+      name: 'ternary', key: 'ternary-expression', factorings: [
+        { label: 'flat', build: () => { const E = rule((s: any) => [[Id, '?', s, ':', s], [Id]]); const P = rule(() => [[many(E)]]); return { name: 't1', scopeName: 'source.t1', tokens: { Id }, rules: { E, P }, entry: P }; } },
+        { label: 'opt-tail', build: () => { const E = rule((s: any) => [[Id, opt('?', s, ':', s)]]); const P = rule(() => [[many(E)]]); return { name: 't2', scopeName: 'source.t2', tokens: { Id }, rules: { E, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: 'call', key: 'function-call', factorings: [
+        { label: 'inline', build: () => { const A = rule(() => [[Id]]); const E = rule((s: any) => [[A, '(', sep(s, ','), ')'], [A]]); const P = rule(() => [[many(E)]]); return { name: 'c1', scopeName: 'source.c1', tokens: { Id }, rules: { A, E, P }, entry: P }; } },
+        { label: 'args-rule', build: () => { const A = rule(() => [[Id]]); const CA = rule((s: any) => [['(', sep(s, ','), ')']]); const C = rule((s: any) => [[A, CA], [A]]); const E = rule((s: any) => [[C]]); const P = rule(() => [[many(E)]]); return { name: 'c2', scopeName: 'source.c2', tokens: { Id }, rules: { A, CA, C, E, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: 'generic-type-params', key: 'declaration-type-params', factorings: [
+        { label: '3-item', build: () => { const T = rule(() => [[Id]]); const Pm = rule(() => [[Id, opt('extends', T)]]); const TP = rule(() => [['<', sep(Pm, ','), '>']]); const D = rule(() => [['fn', Id, opt(TP), '{', '}']]); const P = rule(() => [[many(D)]]); return { name: 'g1', scopeName: 'source.g1', tokens: { Id }, prec: [none('<', '>')], scopes: { 'storage.type.function': ['fn'], 'keyword.operator.expression.extends': ['extends'] }, rules: { T, Pm, TP, D, P }, entry: P }; } },
+        { label: 'trailing-comma', build: () => { const T = rule(() => [[Id]]); const Pm = rule(() => [[Id, opt('extends', T)]]); const TP = rule(() => [['<', sep(Pm, ','), opt(','), '>']]); const D = rule(() => [['fn', Id, opt(TP), '{', '}']]); const P = rule(() => [[many(D)]]); return { name: 'g2', scopeName: 'source.g2', tokens: { Id }, prec: [none('<', '>')], scopes: { 'storage.type.function': ['fn'], 'keyword.operator.expression.extends': ['extends'] }, rules: { T, Pm, TP, D, P }, entry: P }; } },
+      ],
+    },
+  ];
+  for (const c of constructs) {
+    const results = c.factorings.map(f => ({ label: f.label, ok: emits(c.key, f.build) }));
+    const allEmit = results.every(r => r.ok);
+    check(`shape-robustness: \`${c.name}\` emits #${c.key} for every equivalent factoring`, allEmit,
+      results.filter(r => !r.ok).map(r => r.label).join(', '));
+  }
+}
+
 // ════════════════════════════════════════════════════════════════════════════
 //  driver
 // ════════════════════════════════════════════════════════════════════════════
@@ -559,6 +606,9 @@ async function main(): Promise<void> {
   checkCollectLiteralsClosure();
   await regionKeywordProbe();
 
+  console.log('── Shape robustness: detectors fire on every equivalent factoring ──');
+  checkShapeRobustness();
+
   console.log('── Reachability · token completeness · Layer B1 leaf coverage ──');
   const rows: LedgerRow[] = [];
   for (const cfg of GRAMMARS) {

From f6a9990be9be4bbdfabd17991a8868d8fe41a096 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sat, 20 Jun 2026 22:29:42 +0800
Subject: [PATCH 06/14] Normalize four more shape-detectors (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A systematic shape-fragility audit (5 agents running the real emitter over
equivalent factorings of each detector's construct) found the fixed-window
fragility is broad, not three isolated cases. Fix four more by routing through
the normalized form, same pattern as detectTernary:

- detectConditionalType — ran its 7-window on raw r.body; now over expandAlts,
  so an `opt`-tail / grouped / alt-split conditional `?:` is detected.
- getTypeParamElementKeywords — early-out demanded an exact 3-item `'<' sep '>'`
  body; now scans the expandAlts branches for that adjacency, so a trailing
  `opt(',')` or alt-wrapped type-param list still hoists its constraint keyword.
- detectDeclarations.isBlockRule — matched only a raw-seq `{ … }` body; now
  EVERY expandAlts branch must be `{ … }`-bounded (`.every`, not `.some`, so a
  `Type` that is only SOMETIMES an object literal is not mis-read as a brace
  declaration — `.some` regressed `type X = …` to #declaration-body).
- detectJsx hasElementShape — matched `<`+ref only in a raw seq; now over the
  expandAlts branches, so an opt/alt/group factoring of the element qualifies.

Byte-identical on all six shipped grammars (verified the `.every` form after a
`.some` first attempt changed typescript/tsx output), and each fix is verified
to detect its previously-dropped factoring. Seven detectors are now shape-robust;
the YAML region detectors and the expression group are the remaining batch.
---
 src/gen-tm.ts | 53 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/src/gen-tm.ts b/src/gen-tm.ts
index dd4101e..49b8225 100644
--- a/src/gen-tm.ts
+++ b/src/gen-tm.ts
@@ -922,16 +922,17 @@ function detectJsx(grammar: CstGrammar): JsxInfo | null {
   }
   if (!selfCloseTok || !closeTok) return null;
 
-  // Confirm the JSX element production: a `<` literal directly before a rule ref.
+  // Confirm the JSX element production: a `<` literal directly before a rule ref, matched on the
+  // NORMALISED branches so an opt/alt/group factoring of the element production still qualifies.
   let hasElementShape = false;
   const walk = (e: RuleExpr): void => {
-    if (e.type === 'seq') {
-      for (let i = 0; i < e.items.length - 1; i++) {
-        if (e.items[i].type === 'literal' && (e.items[i] as { value: string }).value === '<' &&
-            e.items[i + 1].type === 'ref') hasElementShape = true;
+    for (const items of expandAlts(e)) {
+      for (let i = 0; i < items.length - 1; i++) {
+        if (items[i].type === 'literal' && (items[i] as { value: string }).value === '<' &&
+            items[i + 1].type === 'ref') hasElementShape = true;
       }
-      e.items.forEach(walk);
-    } else if (e.type === 'alt') e.items.forEach(walk);
+    }
+    if (e.type === 'seq' || e.type === 'alt') e.items.forEach(walk);
     else if (e.type === 'quantifier' || e.type === 'group' || e.type === 'not') walk(e.body);
     else if (e.type === 'sep') walk(e.element);
   };
@@ -2242,8 +2243,10 @@ function detectConditionalType(grammar: CstGrammar): string | null {
 
   function walk(expr: RuleExpr): void {
     if (connector) return;
-    if (expr.type === 'seq') { checkSeq(expr.items); expr.items.forEach(walk); }
-    else if (expr.type === 'alt') expr.items.forEach(walk);
+    // run the 7-window over the NORMALISED branches (mirrors detectTernary): expandAlts
+    // canonicalises an opt-tail / grouped / alt-split conditional `?:` into the flat adjacency.
+    for (const items of expandAlts(expr)) { checkSeq(items); if (connector) return; }
+    if (expr.type === 'seq' || expr.type === 'alt') expr.items.forEach(walk);
     else if (expr.type === 'quantifier' || expr.type === 'group') walk(expr.body);
     else if (expr.type === 'sep') walk(expr.element);
   }
@@ -3112,10 +3115,18 @@ function isAngleBracketSepRule(body: RuleExpr): boolean {
 }
 
 function getTypeParamElementKeywords(body: RuleExpr, grammar: CstGrammar): string[] {
-  if (body.type !== 'seq' || body.items.length !== 3) return [];
-  const sep = body.items[1];
-  if (sep.type !== 'sep') return [];
-  let elementBody: RuleExpr = sep.element;
+  // Find the `'<' sep '>'` adjacency in any NORMALISED branch (so a trailing `opt(',')` or an
+  // alt-wrapped body still surfaces it — the same expansion isAngleBracketSepRule uses), then hoist
+  // the element's keywords. Without this the keyword sub-pattern (the `\bextends\b` scoping inside
+  // `<…>`) is dropped for those equivalent factorings even though the region itself is still emitted.
+  let elementBody: RuleExpr | null = null;
+  for (const items of expandAlts(body)) {
+    const i = items.findIndex(x => x.type === 'literal' && (x as { value: string }).value === '<');
+    if (i >= 0 && items[i + 1]?.type === 'sep' && items[i + 2]?.type === 'literal' && (items[i + 2] as { value: string }).value === '>') {
+      elementBody = (items[i + 1] as { element: RuleExpr }).element; break;
+    }
+  }
+  if (!elementBody) return [];
   if (elementBody.type === 'ref') {
     const rule = grammar.rules.find(r => r.name === (elementBody as { name: string }).name);
     if (rule) elementBody = rule.body;
@@ -3151,13 +3162,15 @@ function detectDeclarations(grammar: CstGrammar, tokenNames: Set<string>): DeclI
   function isBlockRule(name: string): boolean {
     const rule = grammar.rules.find(r => r.name === name);
     if (!rule) return false;
-    const body = rule.body;
-    if (body.type === 'seq' && body.items.length >= 2) {
-      return body.items[0].type === 'literal' && (body.items[0] as { value: string }).value === '{' &&
-             body.items[body.items.length - 1].type === 'literal' &&
-             (body.items[body.items.length - 1] as { value: string }).value === '}';
-    }
-    return false;
+    // A rule is a block body only if EVERY normalised branch is `{ … }`-bounded — i.e. it is
+    // ALWAYS a brace block. `.some` would over-match a rule that is only SOMETIMES a block (a
+    // `Type` whose value can be an inline object type), mis-classifying a `type X = …` alias as a
+    // brace-body declaration. `.every` recovers the alt-of-blocks factoring without that regression.
+    const branches = expandAlts(rule.body);
+    return branches.length > 0 && branches.every(items =>
+      items.length >= 2 &&
+      items[0].type === 'literal' && (items[0] as { value: string }).value === '{' &&
+      items[items.length - 1].type === 'literal' && (items[items.length - 1] as { value: string }).value === '}');
   }
 
   function containsLiteral(expr: RuleExpr, value: string): boolean {

From 24cccf13de01031001dc6c99c698cb6f3e488ce4 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sat, 20 Jun 2026 22:58:37 +0800
Subject: [PATCH 07/14] Normalize the expression-position shape-detectors (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reading each expression detector myself (the audit agent's output was
malformed): five matched a construct on the raw r.body, so an equivalent
factoring slips them. Route them through expandAlts, same pattern as
detectTernary/detectCallExpression:

- detectBareArrowParam — `ref '=>'`; an opt-tail arrow (`[x, opt('=>', body)]`)
  was dropped (verified: variable.parameter now emitted for that factoring).
- detectPropertyAccess — `'.'`/`'?.'` before a token ref.
- detectParenArrowParams + detectArrowParamDelims — the deliberate pair that
  read the same arrow param-list production; routed identically so they still
  cannot disagree.
- detectDirectParamKeywords — keyword directly before `(`; also recurse `sep`.

detectConstructorKeywords already expands (no change). Byte-identical on all six
shipped grammars. Twelve detectors are now shape-robust. The YAML region
detectors are deferred to a semantics-aware pass: their fixed shapes encode
DELIBERATE YAML semantics (detectFold stops at a leaf token and does NOT follow
a rule-ref by design — an Indent+rule-ref is a SIBLING node, not a fold), so the
audit's heuristic "follow the ref" fix would break the fold-vs-sibling meaning.
Not every fixed shape is fragility — some are intent.
---
 src/gen-tm.ts | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/src/gen-tm.ts b/src/gen-tm.ts
index 49b8225..fa917b5 100644
--- a/src/gen-tm.ts
+++ b/src/gen-tm.ts
@@ -2049,10 +2049,10 @@ function detectPropertyAccess(
   }
 
   function walk(expr: RuleExpr): void {
-    if (expr.type === 'seq') { checkSeq(expr.items); expr.items.forEach(walk); }
-    if (expr.type === 'alt') expr.items.forEach(walk);
-    if (expr.type === 'quantifier' || expr.type === 'group') walk(expr.body);
-    if (expr.type === 'sep') walk(expr.element);
+    for (const items of expandAlts(expr)) checkSeq(items);   // normalized factorings
+    if (expr.type === 'seq' || expr.type === 'alt') expr.items.forEach(walk);
+    else if (expr.type === 'quantifier' || expr.type === 'group') walk(expr.body);
+    else if (expr.type === 'sep') walk(expr.element);
   }
 
   for (const rule of grammar.rules) walk(rule.body);
@@ -2079,8 +2079,8 @@ function detectBareArrowParam(grammar: CstGrammar, tokenNames: Set<string>): boo
   }
 
   function walk(expr: RuleExpr): boolean {
-    if (expr.type === 'seq') return checkSeq(expr.items) || expr.items.some(walk);
-    if (expr.type === 'alt') return expr.items.some(walk);
+    if (expandAlts(expr).some(checkSeq)) return true;   // normalized factorings (opt-tail / alt / group)
+    if (expr.type === 'seq' || expr.type === 'alt') return expr.items.some(walk);
     if (expr.type === 'quantifier' || expr.type === 'group') return walk(expr.body);
     if (expr.type === 'sep') return walk(expr.element);
     return false;
@@ -2112,8 +2112,8 @@ function detectParenArrowParams(grammar: CstGrammar): boolean {
   }
 
   function walk(expr: RuleExpr): boolean {
-    if (expr.type === 'seq') return checkSeq(expr.items) || expr.items.some(walk);
-    if (expr.type === 'alt') return expr.items.some(walk);
+    if (expandAlts(expr).some(checkSeq)) return true;   // normalized factorings (opt-tail / alt / group)
+    if (expr.type === 'seq' || expr.type === 'alt') return expr.items.some(walk);
     if (expr.type === 'quantifier' || expr.type === 'group') return walk(expr.body);
     if (expr.type === 'sep') return walk(expr.element);
     return false;
@@ -2159,8 +2159,8 @@ function detectArrowParamDelims(grammar: CstGrammar): { open: string; close: str
     return false;
   }
   function walk(expr: RuleExpr): boolean {
-    if (expr.type === 'seq') return checkSeq(expr.items) || expr.items.some(walk);
-    if (expr.type === 'alt') return expr.items.some(walk);
+    if (expandAlts(expr).some(checkSeq)) return true;   // normalized factorings (opt-tail / alt / group)
+    if (expr.type === 'seq' || expr.type === 'alt') return expr.items.some(walk);
     if (expr.type === 'quantifier' || expr.type === 'group') return walk(expr.body);
     if (expr.type === 'sep') return walk(expr.element);
     return false;
@@ -2289,9 +2289,10 @@ function detectDirectParamKeywords(
   }
 
   function walk(expr: RuleExpr): void {
-    if (expr.type === 'seq') { checkSeq(expr.items); expr.items.forEach(walk); }
-    if (expr.type === 'alt') expr.items.forEach(walk);
-    if (expr.type === 'quantifier' || expr.type === 'group') walk(expr.body);
+    for (const items of expandAlts(expr)) checkSeq(items);   // normalized factorings
+    if (expr.type === 'seq' || expr.type === 'alt') expr.items.forEach(walk);
+    else if (expr.type === 'quantifier' || expr.type === 'group') walk(expr.body);
+    else if (expr.type === 'sep') walk(expr.element);
   }
 
   for (const rule of grammar.rules) walk(rule.body);

From d4cc5ace4f0112f110a0a687cc65cea377cb70a8 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sat, 20 Jun 2026 23:05:32 +0800
Subject: [PATCH 08/14] YAML detectors: one safe positional fix, three
 confirmed deliberate (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Worked the YAML region detectors semantics-aware (not the audit's heuristic).
Key finding: the audit's "route through expandAlts" fix is WRONG for these —
they match STRUCTURAL nodes (a `(Newline item)*` quantifier, config-keyed
bracket pairs), and expandAlts EXPANDS the very quantifier they depend on.

- detectBlockSequence — the one genuine positional rigidity: it matched the
  `[item, (Newline item)*]` pattern only at items[0]/items[1]. Now scanned
  pairwise (any adjacent k), so a leading element before the sequence does not
  hide it. NOT routed through expandAlts (that would expand the quantifier).
  yaml byte-identical; the full yaml gate group (depth-witnesses, deepest-sibling,
  compact-nest-sites, flow-sites, blockscalar-depth, issue12) stays green.
- detectFold — its visit is ALREADY pairwise; its refsLeaf stopping at a leaf
  token (not following a rule-ref) is the DELIBERATE fold-vs-sibling distinction
  (an Indent+rule-ref is a sibling node). No change.
- detectExplicitKey — the indicator at items[0] is intrinsic (an explicit-key
  entry is headed by `?`); the inner already unwraps a quantifier. No change.
- detectFlowCollections — topLits is positional-agnostic and unwraps
  quantifier/group/sep, deliberately stopping at alt/ref to read THIS rule's own
  structure. No change.

So the fixed-window fragility class is closed: 13 detectors made shape-robust
where the rigidity was accidental; the rest is deliberate YAML semantics that
the heuristic over-flagged. Not every fixed shape is fragility.
---
 src/gen-tm.ts | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/gen-tm.ts b/src/gen-tm.ts
index fa917b5..8ce6727 100644
--- a/src/gen-tm.ts
+++ b/src/gen-tm.ts
@@ -3626,10 +3626,13 @@ function detectBlockSequence(grammar: CstGrammar): { indicator: string } | null
   let indicator: string | null = null;
   const visit = (e: RuleExpr): void => {
     if (e.type === 'seq') {
-      // `[item, (Newline item)*]`: first element + a `*`/`+` over a `[Newline, item]` seq
-      if (e.items.length >= 2) {
-        const head = e.items[0];
-        const q = e.items[1];
+      // `[…, item, (Newline item)*, …]`: an item ADJACENT to a `*`/`+` over a `[Newline, item]` seq.
+      // Scanned pairwise (any k, not only items[0]/items[1]) so a leading element before the
+      // sequence pattern does not hide it. NOT routed through expandAlts on purpose — that would
+      // expand the `(Newline item)*` quantifier this match depends on.
+      for (let k = 0; k + 1 < e.items.length; k++) {
+        const head = e.items[k];
+        const q = e.items[k + 1];
         if (q.type === 'quantifier' && (q.kind === '*' || q.kind === '+') && q.body.type === 'seq'
           && q.body.items.length >= 2 && q.body.items[0].type === 'ref' && q.body.items[0].name === newlineToken) {
           const ind = itemIndicator(head);

From f015876f949c1bef069221867fbd648a20a6b611 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sun, 21 Jun 2026 00:06:17 +0800
Subject: [PATCH 09/14] Hoist declared-scope punctuation inside type-parameter
 regions (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

getTypeParamElementKeywords collected only KEYWORD literals, so a literal the
grammar declares a scope for that is PUNCTUATION (e.g. a `&` scoped
punctuation.separator) inside a type-parameter element lost its scope inside
`<…>`. Collect any scope-declared literal now, and emit it with the right
boundary (`\b` for words, none for punctuation — `\b&\b` never matches). `=`
and `,` are skipped in the loop since the dedicated handlers emit them, so the
six shipped grammars stay byte-identical (verified the `=` double-emit and
excluded it); a scoped `&` in a type-param element is now scoped (verified).

This closes the last clean completeness gap from the hunt. The remaining two are
not clean completeness gaps: the symbolic-operator case is an ORDERING concern (a
short overridden op can shadow a longer non-overridden one across the separate
#operator-overrides / #operators patterns — the provably-hard ordering axis, and
the op IS scoped, just shadowed), and `rawBlock` is a declared-but-unimplemented
IndentConfig field (no shipped grammar sets it; implementing the verbatim region
speculatively, with no adopter to verify against, is deferred).
---
 src/gen-tm.ts | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/gen-tm.ts b/src/gen-tm.ts
index 8ce6727..10337c4 100644
--- a/src/gen-tm.ts
+++ b/src/gen-tm.ts
@@ -3134,7 +3134,11 @@ function getTypeParamElementKeywords(body: RuleExpr, grammar: CstGrammar): strin
   }
   const keywords: string[] = [];
   function walk(e: RuleExpr) {
-    if (e.type === 'literal' && isKeywordLiteral(e.value)) keywords.push(e.value);
+    // collect a literal the element CONSUMES that bears a scope obligation: a keyword, OR any
+    // literal the grammar declares a scope for (e.g. a `&` separator scoped punctuation.separator)
+    // — so a declared-scope PUNCTUATION inside the element keeps its scope inside `<…>` too, not
+    // only keywords. The emit site picks the right boundary (`\b` for words, none for punctuation).
+    if (e.type === 'literal' && (isKeywordLiteral(e.value) || grammar.scopeOverrides.has(e.value))) keywords.push(e.value);
     if (e.type === 'seq' || e.type === 'alt') e.items.forEach(walk);
     if (e.type === 'quantifier' || e.type === 'group') walk(e.body);
     // A keyword reached through a `sep` sub-list of the element is just as direct as one in a
@@ -6563,12 +6567,15 @@ export function generateTmLanguage(grammar: CstGrammar): TmGrammar {
         { include: '#declaration-type-params' },
       ];
       for (const kw of allTypeParamKws) {
+        // `=` and `,` are the type-param's STRUCTURAL punctuation, emitted by the dedicated
+        // handlers below — skip them here so they are not double-emitted.
+        if (kw === '=' || kw === ',') continue;
         const scope = getScope(scopeOverrides,kw);
         if (scope) {
-          tpInner.push({
-            match: `\\b${escapeRegex(kw)}\\b`,
-            name: `${scope}.${langName}`,
-          });
+          // a word literal is `\b`-bounded; a punctuation literal (e.g. `&`) must NOT be — `\b&\b`
+          // never matches. Pick the boundary by the literal's class.
+          const m = isKeywordLiteral(kw) ? `\\b${escapeRegex(kw)}\\b` : escapeRegex(kw);
+          tpInner.push({ match: m, name: `${scope}.${langName}` });
         }
       }
       tpInner.push({ include: '#type-inner' });

From ea488dd754ff026b0257db66b88437096c2692ea Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sun, 21 Jun 2026 00:30:32 +0800
Subject: [PATCH 10/14] Close a co-blind markup path and correct the proof's
 overclaims (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

An adversarial review of the "proven no gaps" conclusion found real holes;
this acts on them.

- Markup co-blindness (the one fixable defect): tokenCensus discharged EVERY
  markup token via `if (g.markup) bump('markup-region')` with zero
  verification, so a markup token whose declared scope generateMarkupTm does
  not model (e.g. a `<?…?>` processing instruction) was reported discharged
  while the engine painted it the bare root. Now a markup token with an
  explicit `scope` must have that scope actually emitted, or it is an orphan
  (verified: the PI counterexample is now caught; html stays 7/7).

- COMPLETENESS.md corrections, all overclaims the review caught:
  - "per-token discharge by structural identity (match IS tokenPatternSource)"
    was false — the identifier match is widened to identPattern, and the census
    checks PRESENCE (a reachable non-root-scoped entry), not regex identity.
    Reworded to presence-not-identity.
  - "∀ G by structural induction" conflated the algebra CLOSURE (which is ∀ G)
    with the per-grammar DISCHARGE (executed on the shipped set). Separated.
  - "ordering is undecidable" was the project's own impossibility-without-proof
    trap: for a fixed G the pattern list is finite and the winner is a finite
    index read (gen-tm even sorts it deterministically). It is "not reachable by
    a corpus-free structural fold," a measurement limit, not undecidability.

Residual, honestly stated: the shape class is MEASURED not proven — the
robustness gate covers 3 of ~20 detectors, mutation testing mutates only flat
tokens (not shape regions), and detectAngleBracketCast keeps a latent
ref-factored fragility (type-cast fires on no shipped grammar). 81/81 · 40/40.
---
 COMPLETENESS.md         | 49 ++++++++++++++++++++++++++++-------------
 test/tm-completeness.ts |  9 +++++++-
 2 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/COMPLETENESS.md b/COMPLETENESS.md
index 2f36a54..32777c4 100644
--- a/COMPLETENESS.md
+++ b/COMPLETENESS.md
@@ -48,8 +48,11 @@ constructor-occurrence or a config-field-occurrence. So completeness reduces to:
 obligation generator, the generator has a discharging, reachable emission** — three
 mechanically-checkable layers. Both sides are finite — a finite `G`, a finite `gen-tm(G)`, and an
 obligation taxonomy bounded by TextMate's finite construct kinds — so completeness is a **decidable**
-property per grammar, and holds **∀ G by structural induction** over the finite combinator algebra
-(finitely many cases). It is checked a-priori on the emitted artifact, with no corpus.
+property **per grammar**, checked a-priori on the emitted artifact with no corpus. The **algebra
+closure** (Layer A) is what holds ∀ G by structural induction (the lowering/compilation is total over
+the finite combinator algebra); the per-grammar **discharge** is then executed on the shipped set — a
+decidable check run on concrete grammars, with the closure as its inductive backbone, not a mechanised
+∀-G proof of discharge.
 
 ## Layer A — closure: the universe is the algebra, and lowering is total
 
@@ -142,19 +145,35 @@ The honest, measured result:
   artifact's *sequence*, not the grammar's algebra — so no corpus-free structural check reaches it,
   and a scope-preserving reorder slips even the bucket-level differential.
 
-The line is precise. **Completeness — every required construct PRESENT + REACHABLE + visually scoped
-— is DECIDABLE**, and decided a-priori with no corpus: a finite grammar `G`, a finite emitted artifact
-`gen-tm(G)`, a finite obligation taxonomy (bounded by TextMate's finite construct kinds), and per-token
-discharge by *structural identity* (the flat `match` **is** `tokenPatternSource(t)`, so no semantic
-regex-matching is needed). ∀ `G` follows by structural induction over the finite combinator algebra.
-What is **undecidable is soundness** — do the present constructs paint *correctly on all inputs*: a
-wrong-role paint, or which of two overlapping patterns *wins* (ordering), is an agreement between a
-CFG-derived role and a regex-stack-machine tokenizer over an infinite input space, which slides into
-regex-vs-CFG undecidability (Oniguruma's `\g<>`/backreferences are non-regular). So this document
-proves completeness and *measures* its detector (mutation testing); soundness it does not claim to
-decide — that is `test/gap-ledger.ts`'s by-construction + corpus axis. The earlier framing that
-"a-priori completeness over the whole gap space is unavailable" was an over-concession: completeness
-is available; it was soundness's wall, mistaken for completeness's.
+The line is precise — and narrower than an earlier draft of it claimed (an adversarial review of
+that draft is owed these corrections). **Completeness — every required construct PRESENT + REACHABLE
++ visually scoped — is checked structurally, no corpus**, and for a fixed `G` it is DECIDABLE (finite
+`G`, finite `gen-tm(G)`, an obligation taxonomy bounded by TextMate's finite construct kinds). Three
+honest bounds on that:
+
+- **Presence, not identity.** The discharge is *presence* — a reachable repository entry whose scope
+  is non-root — not a deeper *identity* of the matcher. The flat `match` is *derived from*
+  `tokenPatternSource(t)` (widened to `identPattern` for the identifier token, so not literally equal);
+  `tokenCensus` checks the entry exists and carries a visual scope, it does **not** re-verify the regex
+  recognises the right bytes — that is soundness. (For markup grammars the census additionally checks
+  that a token's *explicitly declared* scope is actually emitted, closing a co-blind path a review
+  found — an unmodelled `<?…?>` construct that fell through to the bare root.)
+- **Verified on the shipped set, not run ∀ G.** The Layer-A closure (A1/A2) is the inductive backbone
+  — the algebra IS closed and lowering/compilation IS total ∀ G — but the per-grammar *discharge* is
+  EXECUTED on the six shipped grammars plus synthetic witnesses, not run as a quantified proof over all `G`.
+- **Ordering is decidable, not undecidable.** Which of two overlapping patterns *wins* is, for a fixed
+  `G`, a finite index read (the emitted `patterns` list is finite, the winner is leftmost-by-order, and
+  gen-tm computes that order by a deterministic sort). It is simply **not reachable by a corpus-free
+  structural fold over the grammar** — a measurement limit, the wording §"Measuring the detector" uses,
+  NOT undecidability. (An earlier draft called ordering "undecidable"; that was the same impossibility-
+  without-proof over-claim the project guards against.)
+
+What is genuinely beyond an a-priori check is **soundness** — whether the present constructs paint
+*correctly on all inputs* (a wrong-role paint): a CFG-derived role vs a regex-stack-machine tokenizer
+over an infinite input space. That is handled by-construction + `test/gap-ledger.ts`'s corpus, not
+claimed decided here. So the honest scope: a structural **presence** proof, **decidable per grammar**,
+**executed on the shipped set**, with the algebra-closure as its inductive backbone — and the detector's
+power **measured** (mutation testing), not proven, for the shape class.
 
 ## Reachability — root ∪ export surfaces
 
diff --git a/test/tm-completeness.ts b/test/tm-completeness.ts
index 4b28094..8178184 100644
--- a/test/tm-completeness.ts
+++ b/test/tm-completeness.ts
@@ -263,7 +263,14 @@ export function tokenCensus(g: CstGrammar, tmJson: TmGrammarJson): TokenCensus {
     if (flat) { if (flatNeutered(flat)) neutered.push(`${t.name}→${(flat as any).name ?? '∅'}`); else bump('flat'); continue; }
     if (t.flags.includes('regex')) { bump('regex-family'); continue; }
     if (tokenPatternIsNever(t)) { bump('engine-emitted'); continue; }
-    if (g.markup) { bump('markup-region'); continue; }                 // generateMarkupTm owns it
+    if (g.markup) {
+      // generateMarkupTm owns the ROLE-based markup tokens (text / tag / attr — no explicit `scope`).
+      // But a markup token with an EXPLICITLY declared scope (a construct generateMarkupTm may not
+      // model, e.g. a `<?…?>` processing instruction) must actually have that scope emitted — else it
+      // falls through to the bare document root, the same neuter gap the token-stream path catches.
+      if (t.scope && !full.includes(t.scope)) { orphans.push(`${t.name}[markup-unmodeled:${t.scope}]`); continue; }
+      bump('markup-region'); continue;
+    }
     const delim = tokenPatternLiteralText(t);                          // a region owns this token's delimiter?
     if (delim && full.includes(JSON.stringify(delim).slice(1, -1))) { bump('region-owned'); continue; }
     orphans.push(`${t.name}[${t.flags.join(',') || '-'}]`);

From 9552ad62b9c373a8c5ef37f5bd8a41a6fda35ba5 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sun, 21 Jun 2026 07:31:14 +0800
Subject: [PATCH 11/14] Extend the shape-robustness gate to every detector; fix
 the two it exposed (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The shape-robustness gate covered 3 constructs (ternary / call / generic-type-
params). Extended it to 19 — one per gen-tm shape-detector — each asserting the
construct discharges its obligation for several EQUIVALENT factorings (canonical
/ opt-tail / alt-split / via-ref / trailing-comma). The builders were authored by
running the real emitter over each factoring; an `xfail` list records a factoring
a detector is known to drop, so a NEW drop OR a stale xfail (a landed fix) both go
red — never a silent false-green.

Building it out caught two residual fixed-shape fragilities (both latent — neither
construct fires on a shipped grammar — so both fixes are byte-identical on all six):

- detectAngleBracketCast dropped a cast whose `<Type>` head is its own rule (the
  `<`/`>` hidden behind a ref). Now resolves a ref to a `'<' type '>'` cast-head
  rule by name, mirroring detectCallExpression reaching its args through a rule.
- detectTypeParamConstraintKeywords extracted the constraint keyword only from a
  `?`-quantifier `[kw, ref]`, dropping `alt([…,kw,type],[…])` (optionality via an
  alt branch) and `opt(kw, sep(type))` (type behind a sep). Now reads the
  constraint as the optional `[kw, type]` segment by which one expandAlts branch
  extends a prefix-shorter sibling — uniform across opt / alt / sep, and a leading
  modifier (not a prefix extension) is still excluded. (A first attempt keyed on
  the follower being a type-flagged rule; the gate caught that it broke the
  canonical factoring when the constraint type is not type-flagged.)

19 constructs all green; the YAML/markup detectors are positive-controlled (their
fixed shapes are deliberate, so they guard the canonical form, not factorings).
40/40 · 96/96 completeness checks.
---
 src/gen-tm.ts           |  87 ++++++++++------
 test/tm-completeness.ts | 217 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 261 insertions(+), 43 deletions(-)

diff --git a/src/gen-tm.ts b/src/gen-tm.ts
index 10337c4..bbef912 100644
--- a/src/gen-tm.ts
+++ b/src/gen-tm.ts
@@ -1091,33 +1091,33 @@ function detectTypeParamConstraintKeywords(grammar: CstGrammar, typeArgRule: str
   };
   for (const rule of grammar.rules) for (const seq of expandAlts(rule.body)) scanForTypeParamSep(seq);
 
-  // In each such rule, find OPTIONAL `[<word-literal>, <ref>]` pairs — the constraint.
-  // The literal must be a WORD (starts with a letter/`_`) so it is `\b`-bounded; a
-  // punctuation lead like `=` (the default) is excluded on purpose (see doc above).
+  // In each such rule, find the constraint keyword: the WORD literal that BEGINS an OPTIONAL
+  // `[keyword, type…]` segment. "Optional" is read structurally from expandAlts(body): the constraint
+  // is exactly the segment by which one branch EXTENDS a prefix-shorter sibling — so `opt(kw, type)`
+  // (a `?` body), a separate alt branch `alt([…, kw, type], […])`, and a `sep`-wrapped type ALL reduce
+  // to "branch B = branch A ++ [kw, …]". The keyword is `B[len(A)]` when it is a word literal. This
+  // reads the OPTIONALITY (the distinguishing fact), so a LEADING modifier (`const`/`in`/`out`, whose
+  // own optionality makes `[name]` vs `[const,name]` — NOT a prefix pair) is not mistaken for it, and a
+  // punctuation lead like `=` (the default) is excluded by the word test.
   const keywords = new Set<string>();
   const isWord = (s: string) => /^[A-Za-z_]/.test(s);
-  const scanConstraint = (expr: RuleExpr): void => {
-    if (expr.type === 'quantifier') {
-      if (expr.kind === '?' && expr.body.type === 'seq') {
-        const its = expr.body.items;
-        if (its.length >= 2 && its[0].type === 'literal' && its[1].type === 'ref') {
-          const lit = (its[0] as { value: string }).value;
-          if (isWord(lit)) keywords.add(lit);
-        }
-      }
-      scanConstraint(expr.body);
-    } else if (expr.type === 'seq' || expr.type === 'alt') {
-      for (const it of (expr as { items: RuleExpr[] }).items) scanConstraint(it);
-    } else if (expr.type === 'group') {
-      scanConstraint((expr as { body: RuleExpr }).body);
-    } else if (expr.type === 'sep') {
-      // a constraint keyword reached through a `&`/`,`-separated sub-list is just as direct —
-      // recurse into the element (mirrors getTypeParamElementKeywords' `sep` arm).
-      scanConstraint((expr as { element: RuleExpr }).element);
-    }
-  };
+  const sig = (e: RuleExpr): string =>
+    e.type === 'literal' ? 'L:' + (e as { value: string }).value
+    : e.type === 'ref' ? 'R:' + (e as { name: string }).name
+    : e.type === 'sep' ? 'S:' + sig((e as { element: RuleExpr }).element)
+    : e.type === 'seq' || e.type === 'alt' ? e.type + '[' + (e as { items: RuleExpr[] }).items.map(sig).join(',') + ']'
+    : e.type === 'quantifier' ? 'Q' + (e as { kind: string }).kind + sig((e as { body: RuleExpr }).body)
+    : (e.type === 'group' || e.type === 'not') ? e.type[0] + sig((e as { body: RuleExpr }).body)
+    : e.type;
   for (const rule of grammar.rules) {
-    if (sepElementRules.has(rule.name)) scanConstraint(rule.body);
+    if (!sepElementRules.has(rule.name)) continue;
+    const branches = expandAlts(rule.body);
+    for (const a of branches) for (const b of branches) {
+      if (b.length <= a.length) continue;
+      if (!a.every((it, i) => sig(it) === sig(b[i]))) continue;   // a is a strict prefix of b
+      const head = b[a.length];
+      if (head.type === 'literal' && isWord((head as { value: string }).value)) keywords.add((head as { value: string }).value);
+    }
   }
   return [...keywords];
 }
@@ -1748,15 +1748,40 @@ function detectAngleBracketCast(grammar: CstGrammar): string | null {
   );
   if (typeRuleNameSet.size === 0) return null;
 
+  const ruleByName = new Map(grammar.rules.map(r => [r.name, r] as const));
+  // a "cast head" is a rule whose EVERY expanded branch is exactly `'<' <typeRef> '>'` — the author
+  // factored the angle-cast prefix into its own rule. Recover the type name (null if not that shape).
+  const castHeadType = (body: RuleExpr): string | null => {
+    const branches = expandAlts(body);
+    if (!branches.length) return null;
+    let ty: string | null = null;
+    for (const items of branches) {
+      if (items.length === 3 && items[0].type === 'literal' && (items[0] as { value: string }).value === '<' &&
+          items[1].type === 'ref' && typeRuleNameSet.has((items[1] as { name: string }).name) &&
+          items[2].type === 'literal' && (items[2] as { value: string }).value === '>') {
+        ty = (items[1] as { name: string }).name;
+      } else return null;
+    }
+    return ty;
+  };
   let found: string | null = null;
   const walkSeq = (items: RuleExpr[]): void => {
-    for (let i = 0; i + 3 < items.length; i++) {
-      const a = items[i], b = items[i + 1], c = items[i + 2], d = items[i + 3];
-      if (a.type === 'literal' && a.value === '<' &&
-          b.type === 'ref' && typeRuleNameSet.has(b.name) &&
-          c.type === 'literal' && c.value === '>' &&
-          d /* an operand follows the cast */) {
-        found = b.name;
+    for (let i = 0; i + 1 < items.length; i++) {
+      const a = items[i];
+      // inline `'<' <type> '>' operand`
+      if (i + 3 < items.length) {
+        const b = items[i + 1], c = items[i + 2], d = items[i + 3];
+        if (a.type === 'literal' && a.value === '<' &&
+            b.type === 'ref' && typeRuleNameSet.has(b.name) &&
+            c.type === 'literal' && c.value === '>' && d /* an operand follows the cast */) {
+          found = b.name;
+        }
+      }
+      // via a cast-head RULE: `[<castHeadRef>, operand]` — the `<…>` is hidden behind the ref boundary,
+      // so resolve it by name (mirrors detectCallExpression reaching its args through a separate rule).
+      if (a.type === 'ref' && ruleByName.has(a.name)) {
+        const ty = castHeadType(ruleByName.get(a.name)!.body);
+        if (ty) found = ty;
       }
     }
   };
diff --git a/test/tm-completeness.ts b/test/tm-completeness.ts
index 8178184..57ef3cf 100644
--- a/test/tm-completeness.ts
+++ b/test/tm-completeness.ts
@@ -41,7 +41,7 @@ import {
   tsRelax, capExpr, awaitCtx, yieldCtx, asyncGenCtx, resetCtx, op, prefix, postfix,
   sameLine, noCommentBefore, noMultilineFlowBefore, notLeftLeaf,
   oneOf, noneOf, seq, altPattern, optPattern, star, plus, repeat,
-  followedBy, notFollowedBy, precededBy, notPrecededBy, start, end, never, anyChar, range, none,
+  followedBy, notFollowedBy, precededBy, notPrecededBy, start, end, never, anyChar, range, none, left,
 } from '../src/api.ts';
 import { tokenPatternToRegex, tokenPatternIsNever, tokenPatternLiteralText } from '../src/token-pattern.ts';
 import { collectLiterals, isKeywordLiteral } from '../src/grammar-utils.ts';
@@ -495,38 +495,231 @@ async function regionKeywordProbe(): Promise<void> {
 //  factorings must ALL emit the same region key. It BITES if a detector regresses to a fixed
 //  shape (a factoring loses the region).
 // ════════════════════════════════════════════════════════════════════════════
+
+// Shared .tsx scaffolding for the `type-param-constraint` construct: a JSX grammar (so the
+// generic-arrow ⇄ JSX `extends` disambiguation guard is emitted at all) whose ONLY varying part
+// is the type-param rule body, produced by `mkTParam(CType)` (CType is the registered constraint
+// type rule). Mirrors the angle-bracket disambiguation fixtures in test/agnostic.ts.
+function tpcGrammar(
+  name: string, SelfEnd: any, CloseTg: any, Id: any,
+  mkTParam: (CType: any) => any,
+): Record<string, any> {
+  const Type: any = rule(() => [[Id]]);
+  const CType: any = rule(() => [[Id]]);                         // the constraint's TYPE rule (REGISTERED)
+  const TParam = rule(() => [mkTParam(CType)]);
+  const TP = rule(() => [['<', sep(TParam, ','), '>']]);
+  const Param = rule(() => [[Id, opt(':', Type)]]);
+  const Decl = rule(() => [['fn', Id, opt(TP), '(', sep(Param, ','), ')', '{', '}']]);   // emits #arrow-type-parameters
+  const Arrow = rule(() => [[opt(TP), '(', sep(Param, ','), ')', '=>', Id]]);
+  const Call = rule(() => [[Id, '<', sep(Type, ','), '>', '(', sep(Id, ','), ')']]);
+  const Attr = rule(() => [[Id, opt('=', Id)]]);
+  const Elem = rule(() => [['<', Id, many(Attr), alt(SelfEnd, ['>', CloseTg, Id, '>'])]]);
+  const E = rule(() => [Id, Call, Arrow, Elem]);
+  const S = rule(() => [Decl, E]);
+  const Prog = rule(() => [[many(S)]]);
+  return {
+    name, scopeName: `source.${name}`,
+    tokens: { SelfEnd, CloseTg, Id }, prec: [none('<', '>')],
+    scopes: { 'storage.type.function': ['fn'], 'keyword.operator.expression.extends': ['extends'] },
+    rules: { Type, CType, TParam, TP, Param, Decl, Arrow, Call, Attr, Elem, E, S, Prog }, entry: Prog,
+  };
+}
+
 function checkShapeRobustness(): void {
   const Id = token(plus(range('a', 'z')), { identifier: true });
-  const emits = (key: string, build: () => Record<string, any>): boolean => {
-    try { return !!(generateTmLanguage(defineGrammar(build() as any) as any) as any).repository[key]; }
+  // JSX delimiter tokens — needed by the constructs whose discharge only fires in a .tsx
+  // grammar (the generic-arrow ⇄ JSX disambiguation, e.g. type-param-constraint below).
+  const SelfEnd = token(seq('/', '>'));   // />
+  const CloseTg = token(seq('<', '/'));   // </
+  // A construct's obligation is "discharged" iff `observable(tm)` is true. Most are a
+  // repository-key presence; some (the disambiguation guards) are a substring of an
+  // emitted regex — `keyObs(k)` is the common case, an explicit `observable` the rest.
+  const keyObs = (key: string) => (tm: any) => !!tm.repository[key];
+  const emits = (observable: (tm: any) => boolean, build: () => Record<string, any>): boolean => {
+    try { return !!observable(generateTmLanguage(defineGrammar(build() as any) as any) as any); }
     catch { return false; }
   };
-  // each construct, in several EQUIVALENT factorings; the region key must be present in all.
-  const constructs: { name: string; key: string; factorings: { label: string; build: () => Record<string, any> }[] }[] = [
+  // each construct, in several EQUIVALENT factorings; the obligation must discharge in all.
+  // `xfail` records factorings a detector is KNOWN to drop today (issue #51 residual fragility):
+  // the assertion tolerates exactly those, so the gate goes RED on a NEW drop or once a fix lands
+  // and the xfail goes stale — never a silent false-green, never a permanent red.
+  const constructs: { name: string; key?: string; observable: (tm: any) => boolean; xfail?: string[]; factorings: { label: string; build: () => Record<string, any> }[] }[] = [
     {
-      name: 'ternary', key: 'ternary-expression', factorings: [
+      name: 'ternary', key: 'ternary-expression', observable: keyObs('ternary-expression'), factorings: [
         { label: 'flat', build: () => { const E = rule((s: any) => [[Id, '?', s, ':', s], [Id]]); const P = rule(() => [[many(E)]]); return { name: 't1', scopeName: 'source.t1', tokens: { Id }, rules: { E, P }, entry: P }; } },
         { label: 'opt-tail', build: () => { const E = rule((s: any) => [[Id, opt('?', s, ':', s)]]); const P = rule(() => [[many(E)]]); return { name: 't2', scopeName: 'source.t2', tokens: { Id }, rules: { E, P }, entry: P }; } },
       ],
     },
     {
-      name: 'call', key: 'function-call', factorings: [
+      name: 'call', key: 'function-call', observable: keyObs('function-call'), factorings: [
         { label: 'inline', build: () => { const A = rule(() => [[Id]]); const E = rule((s: any) => [[A, '(', sep(s, ','), ')'], [A]]); const P = rule(() => [[many(E)]]); return { name: 'c1', scopeName: 'source.c1', tokens: { Id }, rules: { A, E, P }, entry: P }; } },
         { label: 'args-rule', build: () => { const A = rule(() => [[Id]]); const CA = rule((s: any) => [['(', sep(s, ','), ')']]); const C = rule((s: any) => [[A, CA], [A]]); const E = rule((s: any) => [[C]]); const P = rule(() => [[many(E)]]); return { name: 'c2', scopeName: 'source.c2', tokens: { Id }, rules: { A, CA, C, E, P }, entry: P }; } },
       ],
     },
     {
-      name: 'generic-type-params', key: 'declaration-type-params', factorings: [
+      name: 'generic-type-params', key: 'declaration-type-params', observable: keyObs('declaration-type-params'), factorings: [
         { label: '3-item', build: () => { const T = rule(() => [[Id]]); const Pm = rule(() => [[Id, opt('extends', T)]]); const TP = rule(() => [['<', sep(Pm, ','), '>']]); const D = rule(() => [['fn', Id, opt(TP), '{', '}']]); const P = rule(() => [[many(D)]]); return { name: 'g1', scopeName: 'source.g1', tokens: { Id }, prec: [none('<', '>')], scopes: { 'storage.type.function': ['fn'], 'keyword.operator.expression.extends': ['extends'] }, rules: { T, Pm, TP, D, P }, entry: P }; } },
         { label: 'trailing-comma', build: () => { const T = rule(() => [[Id]]); const Pm = rule(() => [[Id, opt('extends', T)]]); const TP = rule(() => [['<', sep(Pm, ','), opt(','), '>']]); const D = rule(() => [['fn', Id, opt(TP), '{', '}']]); const P = rule(() => [[many(D)]]); return { name: 'g2', scopeName: 'source.g2', tokens: { Id }, prec: [none('<', '>')], scopes: { 'storage.type.function': ['fn'], 'keyword.operator.expression.extends': ['extends'] }, rules: { T, Pm, TP, D, P }, entry: P }; } },
       ],
     },
+    // ── conditional-type (detectConditionalType, key #type-conditional) ──
+    // `{type:true}` rule with `ref KW ref ? ref : ref`. detectConditionalType runs its
+    // 7-window over expandAlts(body), so opt-tail / alt-split normalise to the same flat
+    // adjacency. ROBUST (all factorings emit).
+    {
+      name: 'conditional-type', key: 'type-conditional', observable: keyObs('type-conditional'), factorings: [
+        { label: 'canonical', build: () => { const T: any = rule(() => [[Id, 'extends', Id, '?', Id, ':', Id], [Id]], { type: true }); const Ann = rule(() => [[Id, ':', T]]); const P = rule(() => [[many(Ann)]]); return { name: 'cd1', scopeName: 'source.cd1', tokens: { Id }, scopes: { 'keyword.operator.expression.extends': ['extends'] }, rules: { T, Ann, P }, entry: P }; } },
+        { label: 'opt-tail', build: () => { const T: any = rule(() => [[Id, opt('extends', Id, '?', Id, ':', Id)]], { type: true }); const Ann = rule(() => [[Id, ':', T]]); const P = rule(() => [[many(Ann)]]); return { name: 'cd2', scopeName: 'source.cd2', tokens: { Id }, scopes: { 'keyword.operator.expression.extends': ['extends'] }, rules: { T, Ann, P }, entry: P }; } },
+        { label: 'alt-split', build: () => { const T: any = rule(() => [alt([Id, 'extends', Id, '?', Id, ':', Id], [Id])], { type: true }); const Ann = rule(() => [[Id, ':', T]]); const P = rule(() => [[many(Ann)]]); return { name: 'cd3', scopeName: 'source.cd3', tokens: { Id }, scopes: { 'keyword.operator.expression.extends': ['extends'] }, rules: { T, Ann, P }, entry: P }; } },
+      ],
+    },
+    // ── generic-call (detectAngleBracketAmbiguity, key #generic-call) ──
+    // `<` sep(ref) `>` CONFIRM (the confirm token is the item after `>`). The detector walks
+    // expandAlts(body), so the `(args)` confirm written as an opt-tail or reached through an
+    // alt() still surfaces the `< sep > (` adjacency. Needs `<`/`>` in the prec table. ROBUST.
+    {
+      name: 'generic-call', key: 'generic-call', observable: keyObs('generic-call'), factorings: [
+        { label: 'canonical', build: () => { const T = rule(() => [[Id]]); const Call = rule(() => [[Id, '<', sep(T, ','), '>', '(', sep(Id, ','), ')']]); const E = rule(() => [Id, Call]); const P = rule(() => [[many(E)]]); return { name: 'gc1', scopeName: 'source.gc1', tokens: { Id }, prec: [none('<', '>')], rules: { T, Call, E, P }, entry: P }; } },
+        { label: 'opt-tail', build: () => { const T = rule(() => [[Id]]); const Call = rule(() => [[Id, '<', sep(T, ','), '>', opt('(', sep(Id, ','), ')')]]); const E = rule(() => [Id, Call]); const P = rule(() => [[many(E)]]); return { name: 'gc2', scopeName: 'source.gc2', tokens: { Id }, prec: [none('<', '>')], rules: { T, Call, E, P }, entry: P }; } },
+        { label: 'alt-confirm', build: () => { const T = rule(() => [[Id]]); const Call = rule(() => [[Id, '<', sep(T, ','), '>', alt(['(', sep(Id, ','), ')'], [Id])]]); const E = rule(() => [Id, Call]); const P = rule(() => [[many(E)]]); return { name: 'gc3', scopeName: 'source.gc3', tokens: { Id }, prec: [none('<', '>')], rules: { T, Call, E, P }, entry: P }; } },
+      ],
+    },
+    // ── angle-cast (detectAngleBracketCast, key #type-cast) ──
+    // 4-window `<` ref(@type) `>` operand. The cast head written as its OWN rule
+    // (`CastHead = '<' Type '>'`, used as `[CastHead, operand]`) hides the `<`/`>` across the ref
+    // boundary; detectAngleBracketCast now resolves a ref to such a cast-head rule by name (like
+    // detectCallExpression reaches its args through a separate rule), so `via-ref` is robust too.
+    {
+      name: 'angle-cast', key: 'type-cast', observable: keyObs('type-cast'), factorings: [
+        { label: 'canonical', build: () => { const T = rule(() => [[Id]], { type: true }); const Call = rule(() => [[Id, '<', sep(T, ','), '>', '(', sep(Id, ','), ')']]); const Cast = rule(() => [['<', T, '>', Id]]); const E = rule(() => [Id, Cast, Call]); const P = rule(() => [[many(E)]]); return { name: 'ac1', scopeName: 'source.ac1', tokens: { Id }, prec: [none('<', '>')], rules: { T, Call, Cast, E, P }, entry: P }; } },
+        { label: 'opt-operand', build: () => { const T = rule(() => [[Id]], { type: true }); const Call = rule(() => [[Id, '<', sep(T, ','), '>', '(', sep(Id, ','), ')']]); const Cast = rule(() => [['<', T, '>', opt(Id)]]); const E = rule(() => [Id, Cast, Call]); const P = rule(() => [[many(E)]]); return { name: 'ac3', scopeName: 'source.ac3', tokens: { Id }, prec: [none('<', '>')], rules: { T, Call, Cast, E, P }, entry: P }; } },
+        { label: 'via-ref', build: () => { const T = rule(() => [[Id]], { type: true }); const Call = rule(() => [[Id, '<', sep(T, ','), '>', '(', sep(Id, ','), ')']]); const CastHead = rule(() => [['<', T, '>']]); const Cast = rule(() => [[CastHead, Id]]); const E = rule(() => [Id, Cast, Call]); const P = rule(() => [[many(E)]]); return { name: 'ac2', scopeName: 'source.ac2', tokens: { Id }, prec: [none('<', '>')], rules: { T, Call, CastHead, Cast, E, P }, entry: P }; } },
+      ],
+    },
+    // ── type-param-constraint (detectTypeParamConstraintKeywords) ──
+    // observable: the constraint keyword (`extends`) appears in the #arrow-type-parameters begin
+    // guard (the .tsx generic-arrow ⇄ JSX disambiguation `topTypeParam`) — so this needs a JSX
+    // grammar (`/>`,`</` tokens) and a generic-arrow shape. detectTypeParamConstraintKeywords now
+    // reads the constraint as the OPTIONAL `[kw, type]` segment by which one expandAlts branch
+    // extends a prefix-shorter sibling, so `alt([name, kw, type],[name])` (optionality via an alt
+    // branch, not `?`) and `opt(kw, sep(type,'&'))` (the type behind a `sep`) are robust too — while
+    // a leading modifier (whose own optionality is NOT a prefix extension) is still excluded.
+    {
+      name: 'type-param-constraint', key: 'arrow-type-parameters[extends]',
+      observable: (tm: any) => ((tm.repository['arrow-type-parameters']?.begin as string) ?? '').includes('\\bextends\\b'),
+      factorings: [
+        { label: 'canonical', build: () => tpcGrammar('tpc1', SelfEnd, CloseTg, Id, (CType) => [Id, opt('extends', CType)]) },
+        { label: 'alt-split', build: () => tpcGrammar('tpc2', SelfEnd, CloseTg, Id, (CType) => alt([Id, 'extends', CType], [Id])) },
+        { label: 'sep-constraint', build: () => tpcGrammar('tpc3', SelfEnd, CloseTg, Id, (CType) => [Id, opt('extends', sep(CType, '&'))]) },
+      ],
+    },
+    {
+      name: "bare-arrow", observable: (tm => !!tm.repository['arrow-parameter'] && JSON.stringify(tm.repository['arrow-parameter']).includes('variable.parameter')),
+      factorings: [
+        { label: "canonical", build: () => { const E = rule((s) => [[Id, '=>', s], [Id]]); const P = rule(() => [[many(E)]]); return { name: 'ba1', scopeName: 'source.ba1', tokens: { Id }, rules: { E, P }, entry: P }; } },
+        { label: "opt-tail", build: () => { const E = rule((s) => [[Id, opt('=>', s)]]); const P = rule(() => [[many(E)]]); return { name: 'ba2', scopeName: 'source.ba2', tokens: { Id }, rules: { E, P }, entry: P }; } },
+        { label: "via-ref", build: () => { const Ar = rule((s) => [[Id, '=>', s]]); const E = rule((s) => [Ar, [Id]]); const P = rule(() => [[many(E)]]); return { name: 'ba3', scopeName: 'source.ba3', tokens: { Id }, rules: { Ar, E, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: "property-access", observable: (tm => !!tm.repository['property-access'] && JSON.stringify(tm.repository['property-access']).includes('entity.other.property')),
+      factorings: [
+        { label: "canonical", build: () => { const E = rule((s) => [[Id, many('.', Id)], [Id]]); const P = rule(() => [[many(E)]]); return { name: 'pa1', scopeName: 'source.pa1', tokens: { Id }, rules: { E, P }, entry: P }; } },
+        { label: "opt-tail", build: () => { const E = rule((s) => [[Id, opt('.', Id)]]); const P = rule(() => [[many(E)]]); return { name: 'pa2', scopeName: 'source.pa2', tokens: { Id }, rules: { E, P }, entry: P }; } },
+        { label: "via-ref", build: () => { const Acc = rule(() => [['.', Id]]); const E = rule(() => [[Id, many(Acc)], [Id]]); const P = rule(() => [[many(E)]]); return { name: 'pa3', scopeName: 'source.pa3', tokens: { Id }, rules: { Acc, E, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: "paren-arrow", observable: (tm => { const r = tm.repository['arrow-function-params']; return !!r && JSON.stringify(r).includes('variable.parameter'); }),
+      factorings: [
+        { label: "canonical", build: () => { const Pm = rule(() => [[Id]]); const E = rule((s) => [['(', sep(Pm, ','), ')', '=>', s], [Id]]); const P = rule(() => [[many(E)]]); return { name: 'pra1', scopeName: 'source.pra1', tokens: { Id }, rules: { Pm, E, P }, entry: P }; } },
+        { label: "opt-tail", build: () => { const Pm = rule(() => [[Id]]); const Ty = rule(() => [[Id]]); const E = rule((s) => [['(', sep(Pm, ','), ')', opt(':', Ty), '=>', s], [Id]]); const P = rule(() => [[many(E)]]); return { name: 'pra2', scopeName: 'source.pra2', tokens: { Id }, rules: { Pm, Ty, E, P }, entry: P }; } },
+        { label: "alt-split", build: () => { const Pm = rule(() => [[Id]]); const E = rule((s) => [alt(['(', sep(Pm, ','), ')', '=>', s], [Id])]); const P = rule(() => [[many(E)]]); return { name: 'pra3', scopeName: 'source.pra3', tokens: { Id }, rules: { Pm, E, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: "direct-param-keyword", observable: (tm => !!tm.repository['ctor-declaration'] && !!tm.repository['declaration-params']),
+      factorings: [
+        { label: "canonical", build: () => { const Pm = rule(() => [[Id]]); const Blk = rule(() => [['{', '}']]); const D = rule(() => [['fn', Id, '(', sep(Pm, ','), ')', Blk]]); const Ctor = rule(() => [['ctor', '(', sep(Pm, ','), ')', Blk]]); const Mem = rule(() => [D, Ctor]); const Body = rule(() => [['{', many(Mem), '}']]); const Cls = rule(() => [['cls', Id, Body]]); const P = rule(() => [[many(Cls)]]); return { name: 'dpk1', scopeName: 'source.dpk1', tokens: { Id }, scopes: { 'storage.type.function': ['fn', 'ctor'], 'storage.type.class': ['cls'] }, rules: { Pm, Blk, D, Ctor, Mem, Body, Cls, P }, entry: P }; } },
+        { label: "alt-split", build: () => { const Pm = rule(() => [[Id]]); const Blk = rule(() => [['{', '}']]); const D = rule(() => [['fn', Id, '(', sep(Pm, ','), ')', Blk]]); const Ctor = rule(() => [alt(['ctor', '(', sep(Pm, ','), ')', Blk], ['ctor', '(', ')', Blk])]); const Mem = rule(() => [D, Ctor]); const Body = rule(() => [['{', many(Mem), '}']]); const Cls = rule(() => [['cls', Id, Body]]); const P = rule(() => [[many(Cls)]]); return { name: 'dpk2', scopeName: 'source.dpk2', tokens: { Id }, scopes: { 'storage.type.function': ['fn', 'ctor'], 'storage.type.class': ['cls'] }, rules: { Pm, Blk, D, Ctor, Mem, Body, Cls, P }, entry: P }; } },
+        { label: "opt-tail", build: () => { const Pm = rule(() => [[Id]]); const Blk = rule(() => [['{', '}']]); const D = rule(() => [['fn', Id, '(', sep(Pm, ','), ')', Blk]]); const Ctor = rule(() => [['ctor', '(', opt(sep(Pm, ',')), ')', Blk]]); const Mem = rule(() => [D, Ctor]); const Body = rule(() => [['{', many(Mem), '}']]); const Cls = rule(() => [['cls', Id, Body]]); const P = rule(() => [[many(Cls)]]); return { name: 'dpk3', scopeName: 'source.dpk3', tokens: { Id }, scopes: { 'storage.type.function': ['fn', 'ctor'], 'storage.type.class': ['cls'] }, rules: { Pm, Blk, D, Ctor, Mem, Body, Cls, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: "constructor-keyword", observable: (tm => !!tm.repository['new-expression'] && JSON.stringify(tm.repository['new-expression']).includes('keyword.operator.expression.new')),
+      factorings: [
+        { label: "canonical", build: () => { const Ty = rule(() => [[Id]]); const NewE = rule((s) => [['new', Ty, '(', sep(s, ','), ')'], [Id]]); const P = rule(() => [[many(NewE)]]); return { name: 'ck1', scopeName: 'source.ck1', tokens: { Id }, scopes: { 'keyword.operator.expression.new': ['new'] }, rules: { Ty, NewE, P }, entry: P }; } },
+        { label: "opt-call-tail", build: () => { const Ty = rule(() => [[Id]]); const NewE = rule((s) => [['new', Ty, opt('(', sep(s, ','), ')')], [Id]]); const P = rule(() => [[many(NewE)]]); return { name: 'ck2', scopeName: 'source.ck2', tokens: { Id }, scopes: { 'keyword.operator.expression.new': ['new'] }, rules: { Ty, NewE, P }, entry: P }; } },
+        { label: "alt-split", build: () => { const Ty = rule(() => [[Id]]); const TArgs = rule(() => [['<', sep(Ty, ','), '>']]); const NewE = rule((s) => [['new', Ty, opt(alt([TArgs], ['(', sep(s, ','), ')']))], [Id]]); const P = rule(() => [[many(NewE)]]); return { name: 'ck3', scopeName: 'source.ck3', tokens: { Id }, prec: [none('<', '>')], scopes: { 'keyword.operator.expression.new': ['new'] }, rules: { Ty, TArgs, NewE, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: "block-declaration", observable: (tm => !!tm.repository['declaration-body']),
+      factorings: [
+        { label: "canonical", build: () => { const M = rule(() => [[Id]]); const Body = rule(() => [['{', many(M), '}']]); const D = rule(() => [['class', Id, Body]]); const P = rule(() => [[many(alt(D, Id))]]); return { name: 'b1', scopeName: 'source.b1', tokens: { Id }, scopes: { 'storage.type.class': ['class'] }, rules: { M, Body, D, P }, entry: P }; } },
+        { label: "alt-split", build: () => { const M = rule(() => [[Id]]); const Body = rule(() => [['{', many(M), '}'], ['{', '}']]); const D = rule(() => [['class', Id, Body]]); const P = rule(() => [[many(alt(D, Id))]]); return { name: 'b2', scopeName: 'source.b2', tokens: { Id }, scopes: { 'storage.type.class': ['class'] }, rules: { M, Body, D, P }, entry: P }; } },
+        { label: "opt-tail", build: () => { const M = rule(() => [[Id]]); const Body = rule(() => [['{', opt(many(M)), '}']]); const D = rule(() => [['class', Id, Body]]); const P = rule(() => [[many(alt(D, Id))]]); return { name: 'b3', scopeName: 'source.b3', tokens: { Id }, scopes: { 'storage.type.class': ['class'] }, rules: { M, Body, D, P }, entry: P }; } },
+        { label: "via-ref", build: () => { const M = rule(() => [[Id]]); const Body = rule(() => [['{', many(M), '}']]); const D = rule(() => [['class', Id, opt(Body)]]); const P = rule(() => [[many(alt(D, Id))]]); return { name: 'b4', scopeName: 'source.b4', tokens: { Id }, scopes: { 'storage.type.class': ['class'] }, rules: { M, Body, D, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: "class-declaration-head", observable: (tm => { const r = tm.repository['class-declaration']; return !!r && JSON.stringify(r.beginCaptures ?? {}).includes('entity.name.type.class'); }),
+      factorings: [
+        { label: "canonical", build: () => { const M = rule(() => [[Id]]); const D = rule(() => [['class', Id, '{', many(M), '}']]); const P = rule(() => [[many(alt(D, Id))]]); return { name: 'h1', scopeName: 'source.h1', tokens: { Id }, scopes: { 'storage.type.class': ['class'] }, rules: { M, D, P }, entry: P }; } },
+        { label: "via-ref", build: () => { const M = rule(() => [[Id]]); const Body = rule(() => [['{', many(M), '}']]); const D = rule(() => [['class', Id, Body]]); const P = rule(() => [[many(alt(D, Id))]]); return { name: 'h2', scopeName: 'source.h2', tokens: { Id }, scopes: { 'storage.type.class': ['class'] }, rules: { M, Body, D, P }, entry: P }; } },
+        { label: "type-params", build: () => { const T = rule(() => [[Id]]); const TP = rule(() => [['<', sep(T, ','), '>']]); const M = rule(() => [[Id]]); const D = rule(() => [['class', Id, opt(TP), '{', many(M), '}']]); const P = rule(() => [[many(alt(D, Id))]]); return { name: 'h3', scopeName: 'source.h3', tokens: { Id }, prec: [none('<', '>')], scopes: { 'storage.type.class': ['class'] }, rules: { T, TP, M, D, P }, entry: P }; } },
+        { label: "alt-split", build: () => { const M = rule(() => [[Id]]); const D = rule(() => [[opt('abstract'), 'class', Id, '{', many(M), '}']]); const P = rule(() => [[many(alt(D, Id))]]); return { name: 'h4', scopeName: 'source.h4', tokens: { Id }, scopes: { 'storage.type.class': ['class'], 'storage.modifier': ['abstract'] }, rules: { M, D, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: "regex-literal", observable: (tm => !!tm.repository['regex-literal']),
+      factorings: [
+        { label: "canonical", build: () => { const Re = token(seq('/', plus(range('a','z')), '/'), { regex: true }); const Ex = rule(() => [Id, Re]); const P = rule(() => [[many(Ex)]]); return { name: 'r1', scopeName: 'source.r1', tokens: { Id, Re }, prec: [left('/')], rules: { Ex, P }, entry: P }; } },
+        { label: "alt-split", build: () => { const Re = token(seq('/', plus(range('a','z')), '/'), { regex: true }); const Ex = rule(() => [Id, Re]); const P = rule(() => [[many(Ex)]]); return { name: 'r2', scopeName: 'source.r2', tokens: { Id, Re }, prec: [none('/')], rules: { Ex, P }, entry: P }; } },
+        { label: "via-ref", build: () => { const Re = token(seq('/', plus(range('a','z')), '/'), { regex: true }); const Ex = rule(() => [Id, Re]); const P = rule(() => [[many(Ex)]]); return { name: 'r3', scopeName: 'source.r3', tokens: { Id, Re }, prec: [left('/=')], rules: { Ex, P }, entry: P }; } },
+      ],
+    },
+    {
+      name: "block-sequence", observable: (tm => !!tm.repository['block-sequence']),
+      factorings: [
+        { label: "canonical", build: () => { const ph = noneOf(' ', '\t', '\n', ':', '-', '?', ',', '[', ']', '{', '}', '#'); const PB = star(noneOf(':', '\n', ',', '[', ']', '{', '}')); const KS = followedBy(seq(star(oneOf(' ', '\t')), ':')); const Plain = token(seq(ph, PB), { scope: 'string.unquoted', blockPattern: seq(ph, PB) }); const Key = token(seq(ph, PB, KS), { scope: 'entity.name.tag', blockPattern: seq(ph, PB, KS) }); const BlockScalar = token(never(), { scope: 'string.unquoted.block' }); const Indent = token(never(), {}), Dedent = token(never(), {}), Newline = token(never(), {}); const Item = rule(() => [['-', Key, Plain]]); const Seq = rule(() => [[Item, many(Newline, Item)]]); const Fold = rule(() => [[Plain, many(Newline, Plain)]]); const Node = rule(() => [Seq, Fold, Key, Plain, BlockScalar]); const Doc = rule(() => [[many(Node)]]); return { name: 'bseqA', scopeName: 'source.bseqA', tokens: { Key, Plain, BlockScalar, Indent, Dedent, Newline }, indent: { indentToken: 'Indent', dedentToken: 'Dedent', newlineToken: 'Newline', flowOpen: ['[', '{'], flowClose: [']', '}'], comment: '#', keyValueSeparator: ':', foldTokens: ['Key', 'Plain'], compactIndicators: ['-', '?'], blockScalar: { introducers: ['|', '>'], token: 'BlockScalar', documentMarkers: ['---', '...'], indicatorScope: 'keyword.control.flow.block-scalar' } }, rules: { Item, Seq, Fold, Node, Doc }, entry: Doc }; } },
+        { label: "plus-arity", build: () => { const ph = noneOf(' ', '\t', '\n', ':', '-', '?', ',', '[', ']', '{', '}', '#'); const PB = star(noneOf(':', '\n', ',', '[', ']', '{', '}')); const KS = followedBy(seq(star(oneOf(' ', '\t')), ':')); const Plain = token(seq(ph, PB), { scope: 'string.unquoted', blockPattern: seq(ph, PB) }); const Key = token(seq(ph, PB, KS), { scope: 'entity.name.tag', blockPattern: seq(ph, PB, KS) }); const BlockScalar = token(never(), { scope: 'string.unquoted.block' }); const Indent = token(never(), {}), Dedent = token(never(), {}), Newline = token(never(), {}); const Item = rule(() => [['-', Key, Plain]]); const Seq = rule(() => [[Item, Newline, Item, many(Newline, Item)]]); const Fold = rule(() => [[Plain, many(Newline, Plain)]]); const Node = rule(() => [Seq, Fold, Key, Plain, BlockScalar]); const Doc = rule(() => [[many(Node)]]); return { name: 'bseqB', scopeName: 'source.bseqB', tokens: { Key, Plain, BlockScalar, Indent, Dedent, Newline }, indent: { indentToken: 'Indent', dedentToken: 'Dedent', newlineToken: 'Newline', flowOpen: ['[', '{'], flowClose: [']', '}'], comment: '#', keyValueSeparator: ':', foldTokens: ['Key', 'Plain'], compactIndicators: ['-', '?'], blockScalar: { introducers: ['|', '>'], token: 'BlockScalar', documentMarkers: ['---', '...'], indicatorScope: 'keyword.control.flow.block-scalar' } }, rules: { Item, Seq, Fold, Node, Doc }, entry: Doc }; } },
+      ],
+    },
+    {
+      name: "explicit-key", observable: (tm => !!tm.repository['explicit-key']),
+      factorings: [
+        { label: "canonical", build: () => { const ph = noneOf(' ', '\t', '\n', ':', '-', '?', ',', '[', ']', '{', '}', '#'); const PB = star(noneOf(':', '\n', ',', '[', ']', '{', '}')); const KS = followedBy(seq(star(oneOf(' ', '\t')), ':')); const Plain = token(seq(ph, PB), { scope: 'string.unquoted' }); const Key = token(seq(ph, PB, KS), { scope: 'entity.name.tag' }); const Indent = token(never(), {}), Dedent = token(never(), {}), Newline = token(never(), {}); const ExplicitEntry = rule(() => [['?', Key, opt(':', Plain)]]); const ExplicitMapping = rule(() => [[ExplicitEntry, many(Newline, ExplicitEntry)]]); const Node = rule(() => [ExplicitMapping, Plain]); const Doc = rule(() => [[many(Node)]]); return { name: 'ekA', scopeName: 'source.ekA', tokens: { Key, Plain, Indent, Dedent, Newline }, indent: { indentToken: 'Indent', dedentToken: 'Dedent', newlineToken: 'Newline', flowOpen: ['[', '{'], flowClose: [']', '}'], comment: '#', keyValueSeparator: ':', foldTokens: ['Key', 'Plain'], compactIndicators: ['-', '?'] }, rules: { ExplicitEntry, ExplicitMapping, Node, Doc }, entry: Doc }; } },
+        { label: "opt-tail (many-quantified equivalent)", build: () => { const ph = noneOf(' ', '\t', '\n', ':', '-', '?', ',', '[', ']', '{', '}', '#'); const PB = star(noneOf(':', '\n', ',', '[', ']', '{', '}')); const KS = followedBy(seq(star(oneOf(' ', '\t')), ':')); const Plain = token(seq(ph, PB), { scope: 'string.unquoted' }); const Key = token(seq(ph, PB, KS), { scope: 'entity.name.tag' }); const Indent = token(never(), {}), Dedent = token(never(), {}), Newline = token(never(), {}); const ExplicitEntry = rule(() => [['?', Key, many(':', Plain)]]); const ExplicitMapping = rule(() => [[ExplicitEntry, many(Newline, ExplicitEntry)]]); const Node = rule(() => [ExplicitMapping, Plain]); const Doc = rule(() => [[many(Node)]]); return { name: 'ekB', scopeName: 'source.ekB', tokens: { Key, Plain, Indent, Dedent, Newline }, indent: { indentToken: 'Indent', dedentToken: 'Dedent', newlineToken: 'Newline', flowOpen: ['[', '{'], flowClose: [']', '}'], comment: '#', keyValueSeparator: ':', foldTokens: ['Key', 'Plain'], compactIndicators: ['-', '?'] }, rules: { ExplicitEntry, ExplicitMapping, Node, Doc }, entry: Doc }; } },
+      ],
+    },
+    {
+      name: "flow-mapping", observable: (tm => !!tm.repository['flow-mapping']),
+      factorings: [
+        { label: "canonical", build: () => { const ph = noneOf(' ', '\t', '\n', ':', '-', '?', ',', '[', ']', '{', '}', '#'); const PB = star(noneOf(':', '\n', ',', '[', ']', '{', '}')); const KS = followedBy(seq(star(oneOf(' ', '\t')), ':')); const Plain = token(seq(ph, PB), { scope: 'string.unquoted' }); const Key = token(seq(ph, PB, KS), { scope: 'entity.name.tag' }); const Indent = token(never(), {}), Dedent = token(never(), {}), Newline = token(never(), {}); const FlowEntry = rule(() => [[Key, ':', Plain]]); const FlowMap = rule(() => [['{', sep(FlowEntry, ','), '}']]); const FlowSeq = rule(() => [['[', sep(Plain, ','), ']']]); const Node = rule(() => [FlowMap, FlowSeq, Plain]); const Doc = rule(() => [[many(Node)]]); const fs = { byOpen: { '{': { begin: 'punctuation.definition.mapping.begin', end: 'punctuation.definition.mapping.end', separator: 'punctuation.separator.mapping' }, '[': { begin: 'punctuation.definition.sequence.begin', end: 'punctuation.definition.sequence.end', separator: 'punctuation.separator.sequence' } }, keyValue: 'punctuation.separator.key-value' }; return { name: 'fmA', scopeName: 'source.fmA', tokens: { Key, Plain, Indent, Dedent, Newline }, indent: { indentToken: 'Indent', dedentToken: 'Dedent', newlineToken: 'Newline', flowOpen: ['[', '{'], flowClose: [']', '}'], comment: '#', keyValueSeparator: ':', flowScopes: fs, foldTokens: ['Key', 'Plain'], compactIndicators: ['-', '?'] }, rules: { FlowEntry, FlowMap, FlowSeq, Node, Doc }, entry: Doc }; } },
+        { label: "trailing-comma", build: () => { const ph = noneOf(' ', '\t', '\n', ':', '-', '?', ',', '[', ']', '{', '}', '#'); const PB = star(noneOf(':', '\n', ',', '[', ']', '{', '}')); const KS = followedBy(seq(star(oneOf(' ', '\t')), ':')); const Plain = token(seq(ph, PB), { scope: 'string.unquoted' }); const Key = token(seq(ph, PB, KS), { scope: 'entity.name.tag' }); const Indent = token(never(), {}), Dedent = token(never(), {}), Newline = token(never(), {}); const FlowEntry = rule(() => [[Key, ':', Plain]]); const FlowMap = rule(() => [['{', sep(FlowEntry, ','), opt(','), '}']]); const FlowSeq = rule(() => [['[', sep(Plain, ','), ']']]); const Node = rule(() => [FlowMap, FlowSeq, Plain]); const Doc = rule(() => [[many(Node)]]); const fs = { byOpen: { '{': { begin: 'punctuation.definition.mapping.begin', end: 'punctuation.definition.mapping.end', separator: 'punctuation.separator.mapping' }, '[': { begin: 'punctuation.definition.sequence.begin', end: 'punctuation.definition.sequence.end', separator: 'punctuation.separator.sequence' } }, keyValue: 'punctuation.separator.key-value' }; return { name: 'fmB', scopeName: 'source.fmB', tokens: { Key, Plain, Indent, Dedent, Newline }, indent: { indentToken: 'Indent', dedentToken: 'Dedent', newlineToken: 'Newline', flowOpen: ['[', '{'], flowClose: [']', '}'], comment: '#', keyValueSeparator: ':', flowScopes: fs, foldTokens: ['Key', 'Plain'], compactIndicators: ['-', '?'] }, rules: { FlowEntry, FlowMap, FlowSeq, Node, Doc }, entry: Doc }; } },
+      ],
+    },
+    {
+      name: "markup-tag", observable: (tm => !!tm.repository['tag']),
+      factorings: [
+        { label: "canonical", build: () => { const Id = token(plus(range('a', 'z')), { identifier: true }); const Text = token(never(), { scope: 'text' }); const Tag = rule(() => [['<', Id, '>']]); const Doc = rule(() => [[many(alt(Tag, Id))]]); return { name: 'mkA', scopeName: 'text.mkA', tokens: { Id, Text }, markup: { textToken: 'Text', tagOpen: '<', tagClose: '>', closeMarker: '/' }, rules: { Tag, Doc }, entry: Doc }; } },
+        { label: "alt-split (open/self-close/close element)", build: () => { const Id = token(plus(range('a', 'z')), { identifier: true }); const Text = token(never(), { scope: 'text' }); const SelfEnd = token(seq('/', '>')); const CloseTg = token(seq('<', '/')); const Tag = rule(() => [['<', Id, alt(SelfEnd, ['>', CloseTg, Id, '>'])]]); const Doc = rule(() => [[many(alt(Tag, Id))]]); return { name: 'mkB', scopeName: 'text.mkB', tokens: { SelfEnd, CloseTg, Id, Text }, markup: { textToken: 'Text', tagOpen: '<', tagClose: '>', closeMarker: '/' }, rules: { Tag, Doc }, entry: Doc }; } },
+      ],
+    },
   ];
   for (const c of constructs) {
-    const results = c.factorings.map(f => ({ label: f.label, ok: emits(c.key, f.build) }));
-    const allEmit = results.every(r => r.ok);
-    check(`shape-robustness: \`${c.name}\` emits #${c.key} for every equivalent factoring`, allEmit,
-      results.filter(r => !r.ok).map(r => r.label).join(', '));
+    const xfail = new Set(c.xfail ?? []);
+    const results = c.factorings.map(f => ({ label: f.label, ok: emits(c.observable, f.build) }));
+    // PASS unless a factoring NOT on the xfail list drops, OR a factoring ON it unexpectedly
+    // started discharging (stale xfail — the fix may have landed; drop the annotation).
+    const newDrops = results.filter(r => !r.ok && !xfail.has(r.label)).map(r => r.label);
+    const staleXfail = results.filter(r => r.ok && xfail.has(r.label)).map(r => r.label);
+    check(`shape-robustness: \`${c.name}\` discharges #${c.key ?? c.name} for every equivalent factoring`,
+      newDrops.length === 0 && staleXfail.length === 0,
+      [newDrops.length ? `drops: ${newDrops.join(', ')}` : '',
+       staleXfail.length ? `stale xfail (now discharges, remove): ${staleXfail.join(', ')}` : '',
+       xfail.size ? `(known #51 drops still expected: ${[...xfail].join(', ')})` : ''].filter(Boolean).join('  '));
   }
 }
 

From 2f01d1e9009e8c2713498d2bceb440cb4a00817e Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sun, 21 Jun 2026 07:34:45 +0800
Subject: [PATCH 12/14] Add the jsx-element factoring to the shape-robustness
 gate (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Rounds the gate to 20 constructs — one per shape-detector. detectJsx's
hasElementShape walks expandAlts branches for the `<`+ref element lead, so the
attribute list inline or wrapped in opt/alt both surface #jsx-element-in-
expression; verified robust. 97/97 completeness checks.
---
 test/tm-completeness.ts | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/test/tm-completeness.ts b/test/tm-completeness.ts
index 57ef3cf..2551087 100644
--- a/test/tm-completeness.ts
+++ b/test/tm-completeness.ts
@@ -707,6 +707,17 @@ function checkShapeRobustness(): void {
         { label: "alt-split (open/self-close/close element)", build: () => { const Id = token(plus(range('a', 'z')), { identifier: true }); const Text = token(never(), { scope: 'text' }); const SelfEnd = token(seq('/', '>')); const CloseTg = token(seq('<', '/')); const Tag = rule(() => [['<', Id, alt(SelfEnd, ['>', CloseTg, Id, '>'])]]); const Doc = rule(() => [[many(alt(Tag, Id))]]); return { name: 'mkB', scopeName: 'text.mkB', tokens: { SelfEnd, CloseTg, Id, Text }, markup: { textToken: 'Text', tagOpen: '<', tagClose: '>', closeMarker: '/' }, rules: { Tag, Doc }, entry: Doc }; } },
       ],
     },
+    // ── jsx-element (detectJsx, key #jsx-element-in-expression) ──
+    // an element shape `'<' Id … ('/>' | '>' '</' Id '>')`. detectJsx's hasElementShape walks
+    // expandAlts branches for the `<`+ref lead, so the attribute list written inline or wrapped in
+    // an `opt`/`alt` still surfaces the element. ROBUST.
+    {
+      name: 'jsx-element', observable: (tm: any) => !!tm.repository['jsx-element-in-expression'],
+      factorings: [
+        { label: 'canonical', build: () => { const Attr = rule(() => [[Id, opt('=', Id)]]); const Elem = rule(() => [['<', Id, many(Attr), alt(SelfEnd, ['>', CloseTg, Id, '>'])]]); const E = rule(() => [Id, Elem]); const P = rule(() => [[many(E)]]); return { name: 'jx1', scopeName: 'source.jx1', tokens: { SelfEnd, CloseTg, Id }, prec: [none('<', '>')], rules: { Attr, Elem, E, P }, entry: P }; } },
+        { label: 'opt-attrs', build: () => { const Attr = rule(() => [[Id, opt('=', Id)]]); const Elem = rule(() => [['<', Id, opt(many(Attr)), alt(SelfEnd, ['>', CloseTg, Id, '>'])]]); const E = rule(() => [Id, Elem]); const P = rule(() => [[many(E)]]); return { name: 'jx2', scopeName: 'source.jx2', tokens: { SelfEnd, CloseTg, Id }, prec: [none('<', '>')], rules: { Attr, Elem, E, P }, entry: P }; } },
+      ],
+    },
   ];
   for (const c of constructs) {
     const xfail = new Set(c.xfail ?? []);

From 615c9c3688c9e2541ffeca182529a8beaa854f26 Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sun, 21 Jun 2026 09:24:22 +0800
Subject: [PATCH 13/14] Fix two region-leak soundness bugs the ceiling audit
 derived (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The soundness-ceiling audit (which found gen-tm has no expressiveness ceiling)
surfaced two over-accepts by derivation, not corpus — both real divergences from
the parser's scoping, both root-caused in gen-tm. Not byte-identical: the two
enum/namespace-bearing grammars (ts, tsx) change; the fix is verified by witness
+ gates + scope-gap, not by byte-identity.

- enum brace-leak: #enum-body was the lone emitBracketRegion call whose body
  lacked a self-balancing first include (its siblings #code-block /
  #declaration-body both carry one), so a nested `{…}` in a member initializer
  (`enum E { A = { x: 1 }, B }`) let the inner `}` close the enum region early and
  leak the following members out of enum scope. Prepend the shared #code-block
  balancing region (an inner brace is an ordinary expression block, not another
  enum body) so braces balance to any depth. `B` now scopes enummember; the
  shorthand-object key case stays correct.

- declaration-keyword over-accept: a `storage.type.*` keyword that is also a valid
  identifier (`module`, `namespace`, `type`, `interface`) was painted by the flat
  global match unconditionally, so `module.exports` / `namespace.foo` / `type.foo`
  mis-read the property-access base as the declaration keyword. Demote the whole
  class — the same contextual-keyword machinery gen-tm already uses for
  `as`/`keyof`/`public` — to an accessor-guarded `*-decl` rule (`(?!\s*(?:\.|\?\.))`),
  so a real declaration head still wins while a `.`/`?.`-adjacent base falls through
  to identifier scoping. Verified `module X {}` / `type X = …` / `namespace N {}`
  keep their keyword scope.

Both DERIVED (agnostic 9/9 — keyed on scope family + declaration/reserved sets, no
hardcoded word); npm run check 40/40; scope-gap ts unchanged (the corpus is blind
to both witnesses — exactly why the audit derived them).
---
 src/gen-tm.ts                   | 56 +++++++++++++++++++++++++++++++--
 typescript.tmLanguage.json      | 27 +++++++++-------
 typescriptreact.tmLanguage.json | 27 +++++++++-------
 3 files changed, 84 insertions(+), 26 deletions(-)

diff --git a/src/gen-tm.ts b/src/gen-tm.ts
index bbef912..92d9c8f 100644
--- a/src/gen-tm.ts
+++ b/src/gen-tm.ts
@@ -6732,9 +6732,21 @@ export function generateTmLanguage(grammar: CstGrammar): TmGrammar {
             // enum-body is the same `{ … }` bracket region; only its body differs (members are
             // NAMES via #enum-member, not statements). CALLER predicate: a brace-bodied decl
             // whose keyword scope ends in `.enum`.
+            //
+            // A member INITIALIZER may itself contain a nested `{…}` (an object literal,
+            // `A = { x: 1 }`). Like every brace region (see #code-block / #declaration-body),
+            // the body must consume an inner balanced `{…}` as a UNIT, or the inner `}` matches
+            // this region's `end: \}` and prematurely closes the enum body — leaking the
+            // following members out of enum scope (`B` reads as a plain variable, not an
+            // enummember). The nested brace is NOT another enum body but an ordinary
+            // expression-context block, so the balancing recurse target is `#code-block`
+            // (a self-balancing `{}` with no member-name scoping), listed FIRST so an inner
+            // `{` opens it before `#enum-member`/`$self` can mis-handle the brace. `#code-block`
+            // is emitted unconditionally above whenever any declaration exists, so it is always
+            // present here.
             repository['enum-body'] = emitBracketRegion({
               openLit: '\\{', closeLit: '\\}', beginCapName: blockCapName, endCapName: blockCapName,
-              bodyPatterns: [{ include: '#enum-member' }, { include: '$self' }],
+              bodyPatterns: [{ include: '#code-block' }, { include: '#enum-member' }, { include: '$self' }],
             });
           }
           innerPatterns.push({ include: '#enum-body' });
@@ -7860,6 +7872,34 @@ export function generateTmLanguage(grammar: CstGrammar): TmGrammar {
       // lookahead. The rest of the group keeps the unconditional flat match.
       const ctxModKws = kws.filter(k => contextualModifiers.has(k) && !alwaysBeforeString(k) && !ctxOpSet.has(k));
       const ctxModSet = new Set(ctxModKws);
+      // Contextual DECLARATION keywords: a `storage.type.*` keyword whose keyword role
+      // is owned by a positional `*-declaration` rule (it appears in
+      // `declarationKeywords`) AND which is non-reserved (the grammar proves it a valid
+      // identifier somewhere — no not()-guard forbids it). Such a word doubles as a
+      // property-access BASE: `module.exports`, `namespace.foo`, `type.foo`,
+      // `interface.foo` — there the word is an ordinary identifier, NOT the namespace/
+      // type-alias keyword. The declaration rule already paints the keyword use
+      // positionally (`module X {}`, `type X = …`, even the string-named `module "x"
+      // {}` / `declare module "foo"` forms — those reach the keyword scope only through
+      // the flat match), so the flat match must ABSTAIN exactly when the word is
+      // immediately followed by a member accessor (`.`/`?.`); the word then falls
+      // through to identifier/property scoping. A declaration head always puts
+      // whitespace + a name/string/`{` after the keyword (never `.`/`?.`), so the guard
+      // never suppresses a real declaration — every form, including the string-named
+      // and `import`/`export type` modifier uses, keeps its scope (they are not
+      // accessor-adjacent). Unlike the support.class abstain below, NO call `(` is
+      // excluded: a declaration keyword is never call-adjacent, and the witness/official
+      // guard is the `.`-adjacency. Mirrors the official grammar's `(?<!\.)` keyword
+      // guard. Keyed on the scope family + the declaration-rule/reserved sets, never on
+      // a specific word.
+      const accessorOpeners = [
+        propAccess.hasDot ? '\\.' : undefined,
+        propAccess.hasOptionalChain ? '\\?\\.' : undefined,
+      ].filter((x): x is string => !!x);
+      const ctxDeclKws = (scope === 'storage.type' || scope.startsWith('storage.type.')) && accessorOpeners.length > 0
+        ? kws.filter(k => declarationKeywords.has(k) && !reservedWordsForCtx.has(k) && !alwaysBeforeString(k) && !ctxOpSet.has(k) && !ctxModSet.has(k))
+        : [];
+      const ctxDeclSet = new Set(ctxDeclKws);
       // Drop keywords whose keyword role is owned by a dedicated declaration context
       // (e.g. `constructor` → #constructor-declaration in class bodies). They double
       // as identifiers everywhere else, so the flat match must not paint them.
@@ -7867,7 +7907,7 @@ export function generateTmLanguage(grammar: CstGrammar): TmGrammar {
       // scoped positionally by #import-export-all — never in the flat match, which
       // would mis-paint their ordinary-identifier uses (`const defer`, `defer()`,
       // `import defer from "m"`).
-      const globalKws = kws.filter(k => !alwaysBeforeString(k) && !ctxOpSet.has(k) && !ctxModSet.has(k) && !contextDeclaredKws.has(k) && !phaseModifierKws.has(k));
+      const globalKws = kws.filter(k => !alwaysBeforeString(k) && !ctxOpSet.has(k) && !ctxModSet.has(k) && !ctxDeclSet.has(k) && !contextDeclaredKws.has(k) && !phaseModifierKws.has(k));
       if (globalKws.length > 0) {
         // A `support.class` group names BUILTIN CLASS/TYPE identifiers (Object, Array,
         // Promise, …) — but, unlike a true keyword, those words also appear as runtime
@@ -7924,6 +7964,18 @@ export function generateTmLanguage(grammar: CstGrammar): TmGrammar {
         };
         topPatterns.push({ include: `#${mkey}` });
       }
+      // Contextual declaration keywords (see ctxDeclKws above): one accessor-guarded
+      // entry, placed at the same position as the flat group so a real declaration-head
+      // keyword still wins, while a `.`/`?.`-adjacent property-access base falls through
+      // to identifier/property scoping.
+      if (ctxDeclKws.length > 0) {
+        const dkey = `${key}-decl`;
+        repository[dkey] = {
+          match: `\\b(${ctxDeclKws.map(escapeRegex).join('|')})\\b(?!\\s*(?:${accessorOpeners.join('|')}))`,
+          name: `${scope}.${langName}`,
+        };
+        topPatterns.push({ include: `#${dkey}` });
+      }
       for (const kw of beforeStringKws) {
         const ckey = `${key}-${kw.replace(/[^a-z0-9]/gi, '')}`;
         repository[ckey] = {
diff --git a/typescript.tmLanguage.json b/typescript.tmLanguage.json
index 3409a7f..fdf7892 100644
--- a/typescript.tmLanguage.json
+++ b/typescript.tmLanguage.json
@@ -220,16 +220,16 @@
       "include": "#scope-storage-type"
     },
     {
-      "include": "#scope-storage-type-interface"
+      "include": "#scope-storage-type-interface-decl"
     },
     {
-      "include": "#scope-storage-type-type"
+      "include": "#scope-storage-type-type-decl"
     },
     {
       "include": "#scope-storage-type-enum"
     },
     {
-      "include": "#scope-storage-type-namespace"
+      "include": "#scope-storage-type-namespace-decl"
     },
     {
       "include": "#scope-storage-type-function-arrow"
@@ -1715,6 +1715,9 @@
         }
       },
       "patterns": [
+        {
+          "include": "#code-block"
+        },
         {
           "include": "#enum-member"
         },
@@ -2612,20 +2615,20 @@
       "match": "\\b(debugger|with)\\b",
       "name": "keyword.control.ts"
     },
-    "scope-storage-type-interface": {
-      "match": "\\b(interface)\\b",
+    "scope-storage-type-interface-decl": {
+      "match": "\\b(interface)\\b(?!\\s*(?:\\.|\\?\\.))",
       "name": "storage.type.interface.ts"
     },
-    "scope-storage-type-type": {
-      "match": "\\b(type)\\b",
+    "scope-storage-type-type-decl": {
+      "match": "\\b(type)\\b(?!\\s*(?:\\.|\\?\\.))",
       "name": "storage.type.type.ts"
     },
     "scope-storage-type-enum": {
       "match": "\\b(enum)\\b",
       "name": "storage.type.enum.ts"
     },
-    "scope-storage-type-namespace": {
-      "match": "\\b(module|namespace)\\b",
+    "scope-storage-type-namespace-decl": {
+      "match": "\\b(module|namespace)\\b(?!\\s*(?:\\.|\\?\\.))",
       "name": "storage.type.namespace.ts"
     },
     "scope-support-variable": {
@@ -2816,8 +2819,8 @@
       "match": "\\b(async)\\b",
       "name": "storage.modifier.ts"
     },
-    "expr-scope-storage-type-namespace": {
-      "match": "\\b(module)\\b",
+    "expr-scope-storage-type-namespace-decl": {
+      "match": "\\b(module)\\b(?!\\s*(?:\\.|\\?\\.))",
       "name": "storage.type.namespace.ts"
     },
     "expression": {
@@ -2988,7 +2991,7 @@
           "include": "#scope-storage-type-class"
         },
         {
-          "include": "#expr-scope-storage-type-namespace"
+          "include": "#expr-scope-storage-type-namespace-decl"
         },
         {
           "include": "#scope-storage-type-function-arrow"
diff --git a/typescriptreact.tmLanguage.json b/typescriptreact.tmLanguage.json
index 5c3f52f..e1a773c 100644
--- a/typescriptreact.tmLanguage.json
+++ b/typescriptreact.tmLanguage.json
@@ -226,16 +226,16 @@
       "include": "#scope-storage-type"
     },
     {
-      "include": "#scope-storage-type-interface"
+      "include": "#scope-storage-type-interface-decl"
     },
     {
-      "include": "#scope-storage-type-type"
+      "include": "#scope-storage-type-type-decl"
     },
     {
       "include": "#scope-storage-type-enum"
     },
     {
-      "include": "#scope-storage-type-namespace"
+      "include": "#scope-storage-type-namespace-decl"
     },
     {
       "include": "#scope-storage-type-function-arrow"
@@ -2220,6 +2220,9 @@
         }
       },
       "patterns": [
+        {
+          "include": "#code-block"
+        },
         {
           "include": "#enum-member"
         },
@@ -3117,20 +3120,20 @@
       "match": "\\b(debugger|with)\\b",
       "name": "keyword.control.tsx"
     },
-    "scope-storage-type-interface": {
-      "match": "\\b(interface)\\b",
+    "scope-storage-type-interface-decl": {
+      "match": "\\b(interface)\\b(?!\\s*(?:\\.|\\?\\.))",
       "name": "storage.type.interface.tsx"
     },
-    "scope-storage-type-type": {
-      "match": "\\b(type)\\b",
+    "scope-storage-type-type-decl": {
+      "match": "\\b(type)\\b(?!\\s*(?:\\.|\\?\\.))",
       "name": "storage.type.type.tsx"
     },
     "scope-storage-type-enum": {
       "match": "\\b(enum)\\b",
       "name": "storage.type.enum.tsx"
     },
-    "scope-storage-type-namespace": {
-      "match": "\\b(module|namespace)\\b",
+    "scope-storage-type-namespace-decl": {
+      "match": "\\b(module|namespace)\\b(?!\\s*(?:\\.|\\?\\.))",
       "name": "storage.type.namespace.tsx"
     },
     "scope-support-variable": {
@@ -3321,8 +3324,8 @@
       "match": "\\b(async)\\b",
       "name": "storage.modifier.tsx"
     },
-    "expr-scope-storage-type-namespace": {
-      "match": "\\b(module)\\b",
+    "expr-scope-storage-type-namespace-decl": {
+      "match": "\\b(module)\\b(?!\\s*(?:\\.|\\?\\.))",
       "name": "storage.type.namespace.tsx"
     },
     "expression": {
@@ -3499,7 +3502,7 @@
           "include": "#scope-storage-type-class"
         },
         {
-          "include": "#expr-scope-storage-type-namespace"
+          "include": "#expr-scope-storage-type-namespace-decl"
         },
         {
           "include": "#scope-storage-type-function-arrow"

From 2548c2951e94699a12cd215fe0294ecc8d6a47be Mon Sep 17 00:00:00 2001
From: Johnson Chu <johnsoncodehk@gmail.com>
Date: Sun, 21 Jun 2026 09:26:54 +0800
Subject: [PATCH 14/14] COMPLETENESS.md: record the soundness ceiling audit
 (#51)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds the symmetric half to the completeness spine: a "Soundness — no ceiling"
section stating the audited result that gen-tm has no TextMate expressiveness
ceiling on well-formed input — because its obligation space (lex-local +
delimiter-carried recursion, no general rule-context→scope channel) is itself
TextMate-bounded — with the honest limits (well-formed input only; strong
evidence + structural argument, not a closed-form ∀-grammar proof; the two
region-leak over-accepts the audit derived were fixable bugs, not ceilings).
---
 COMPLETENESS.md | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/COMPLETENESS.md b/COMPLETENESS.md
index 32777c4..0dcd75c 100644
--- a/COMPLETENESS.md
+++ b/COMPLETENESS.md
@@ -215,6 +215,41 @@ recognised and scoped; what is refined at the frontier is *which* role at the am
 Improving that precision (var-width forms for the `vscode-oniguruma`-only grammars, `\g<>` for the
 arrow region) is a separate, soundness-gated change. **The completeness obligation is discharged.**
 
+## Soundness — no ceiling, audited not assumed
+
+Completeness is *presence*; soundness is *correctness* — does each present construct read the
+*right* scope on every input. Soundness is not decided here (`test/scope-gap.ts` / `gap-ledger`), but
+one structural question about it **is** settled: does gen-tm have a *ceiling* — a scope obligation no
+TextMate grammar can reproduce, the wall the naive "regex < CFG" intuition predicts? By exhaustive
+audit the answer is **no ceiling on well-formed input** — and the reason is not that TextMate is
+omnipotent but that gen-tm's *obligation space* is itself TextMate-bounded:
+
+- **The obligation is the role map, not the parse.** A scope in Monogram is `f(token type)` [lex-time]
+  or `f(literal text)` [`scopeOverrides`], plus a finite set of shape-detectors — there is no general
+  *rule-context → scope* channel. So even when the CST encodes unbounded context (a cross-serial index,
+  a depth parity), the highlighting obligation does not read it; there is nothing for TextMate to fail
+  to reproduce. Two such candidates, built and run through `createParser` + the role map, both
+  *dissolve* at the obligation layer.
+- **The dichotomy.** A context-dependent scope is therefore one of: *lex-local* (a plain match),
+  *bounded* (a fixed-width neighborhood), or *delimiter-carried recursion* — unbounded nesting whose
+  every level has a **consumable** delimiter (`{`, `<`, `-`), so a `\G`/`\g<>`/self-include reproduces
+  it to unbounded depth (the YAML compact block-sequence: 0 mismatch d=1..12). The only shape that
+  *would* be a ceiling — unbounded, **non-delimiter**, parser-assigned context — is unconstructable
+  here: a delimiter-less depth scope is one the *parser itself* cannot assign from finite roles.
+- **The audit.** Every context-dependent scope channel was enumerated (independently re-derived from
+  the emitted grammar — 78 keyword/identifier heads — and ground-truthed with `vscode-oniguruma` at
+  deep nesting): each is lex-local, bounded, or delimiter-carried; the ceiling shape occurs **zero**
+  times. Delimiter-carried reproduction was confirmed frame-per-level with no fixed-arity cap (enum
+  d=25, generic d=60, JSX d=20, template `${}` d=20).
+
+Honest bounds: this is **well-formed-input** soundness, and **strong evidence + a structural argument,
+not a closed-form ∀-grammar proof** — it audits the channels gen-tm *emits* and infers "unbounded"
+from monotone frame growth verified to depth 60. `highlighter ≡ parser` thus holds on well-formed
+input but is **not a total equivalence**: broken/partial input still has local region-leaks where a
+regex region's heuristic boundary diverges from the parse. The audit *derived* two such over-accepts
+the corpus was blind to — an enum brace-leak and a `module`/`namespace`/`type` declaration-keyword
+over-accept — and root-caused both in gen-tm. They were fixable bugs, not ceilings.
+
 ## The proof ledger
 
 The fixed denominator is every measured obligation (token discharge + repository reachability +