Skip to content

Prove TextMate-generator completeness (#51)#55

Open
johnsoncodehk wants to merge 10 commits into
masterfrom
tm-completeness-proof
Open

Prove TextMate-generator completeness (#51)#55
johnsoncodehk wants to merge 10 commits into
masterfrom
tm-completeness-proof

Conversation

@johnsoncodehk

Copy link
Copy Markdown
Owner

Closes #51. Proves src/gen-tm.ts complete: every TextMate-representable highlighting obligation a grammar induces is emitted and reachable. The dual of the soundness ledger (KNOWN-GAPS.md, which finds wrong paints) — this proves there are no missing ones.

What now holds

The generator's input is a closed, finite algebra (RuleExpr / TokenPattern), so "every obligation" is enumerable. Completeness reduces to three mechanically-checked layers:

  • closuretoRuleExpr, the token-pattern compiler, and the shared collectLiterals backbone are total over their unions, so no supported combinator shape is silently dropped;
  • coverage — every non-skip token has a discharge path (the token census), and every content/keyword obligation leaf is painted (2433/2433 across the six grammars);
  • reachability — every emitted repository key is reachable from the root patterns or a declared export surface (#expression / canonicalRepoNames / aliasScopes); zero dead keys.

Fixed-denominator ledger: 3137/3137 = 100%, 0 open completeness gaps.

The gap it found and fixed

The Layer-A audit surfaced one latent silent-drop — getTypeParamElementKeywords ignored sep, dropping a keyword nested in a sep-list within a type-parameter element. Fixed by recursing into sep.element (not / ref stay omitted on purpose). The six shipped grammars are byte-identical (the drop is latent), with a biting regression guard in the checker.

The frontier is not an impossibility

Three "TextMate can't express this" claims — variable-width lookbehind for the cast/arrow value test, the balanced-paren arrow confirm, and a regex after a control-flow head — were adversarially attacked and refuted: each is expressible in vscode-oniguruma (verified), and the fixed-width forms gen-tm emits are deliberate Onigmo-portability choices. They are a soundness-precision axis, not a completeness gap.

Deliverable

  • COMPLETENESS.md — the proof spine (contract, three layers, frontier, ledger), modelled on TOTAL-PARSING.md.
  • test/tm-completeness.ts — mechanises it; npm run completeness[:check], wired into npm run check as a gate.

Distinct from the README corpus metrics (empirical oracle agreement) and the soundness ledger (scope-gap / gap-ledger) — this is the formal derivation property. Full gate suite: 39/39.

Add a formal completeness proof for src/gen-tm.ts: for every grammar
expressible through the public src/api.ts combinators, every
TextMate-representable highlighting obligation is emitted and reachable.
This is the dual of the soundness ledger (KNOWN-GAPS.md, which finds
WRONG paints) — here we prove there are no MISSING ones.

The proof rests on the generator's input being a closed, finite algebra
(RuleExpr / TokenPattern), which makes "every obligation" enumerable and
reduces completeness to three mechanically-checked layers:

- closure — toRuleExpr, the token-pattern compiler, and the shared
  collectLiterals backbone are total over their unions, so no supported
  combinator shape is silently dropped;
- coverage — every non-skip token has a discharge path (the token
  census), and every content/keyword obligation leaf is painted
  (2433/2433 across the six grammars), on a fixed denominator;
- reachability — every emitted repository key is reachable from the root
  patterns or a declared export surface (#expression / canonicalRepoNames
  / aliasScopes); zero dead keys.

The Layer-A audit surfaced one latent silent-drop:
getTypeParamElementKeywords omitted `sep`, so a keyword in a sep-list
within a type-parameter element lost its keyword role inside `<…>`. Fixed
by recursing into sep.element (`not`/`ref` stay omitted on purpose — a
forbidden word / a constraint type's own keywords must not be hoisted);
the six shipped grammars are byte-identical (the drop is latent), and the
checker carries a biting regression guard.

Three sites that looked like TextMate impossibilities — variable-width
lookbehind for the cast/arrow value test, the balanced-paren arrow
confirm, and a regex after a control-flow head — were attacked and
refuted: each is expressible in vscode-oniguruma (verified), and the
fixed-width forms gen-tm emits are deliberate Onigmo-portability choices.
They are a soundness-precision axis, not a completeness gap.

COMPLETENESS.md is the proof spine; test/tm-completeness.ts mechanises it
(npm run completeness[:check]) and joins `npm run check` as a gate.
A passing checker means nothing if the checker is blind. Add
test/tm-mutation.ts: it injects a catalogue of known gaps into the emitted
grammar (drop a key / all of a token's includes, neuter a scope to the bare
root, add a dead key, a dangling include, mis-scope a token to a wrong role,
reorder two disambiguation patterns) and records which detector layer kills
each — measuring the detector's power instead of asserting it.

Measured: every PRESENCE / REACHABILITY gap is killed corpus-free (16/16,
12/16 by reachability / token-census / the new flat-token neuter check);
WRONG-ROLE gaps are caught only by a differential witness (presence ≠
correctness); ORDERING gaps are a measured blind spot — TextMate is
order-sensitive and pattern rank lives in the emitted artifact, not the
grammar algebra, so no corpus-free structural check reaches it. This is the
honest boundary, now empirical: the structural proof covers presence +
reachability; ordering / correctness are the soundness axis, reached only by
evaluation. The over-claim of an a-priori "no gap can hide" over the whole
gap space is dropped — COMPLETENESS.md states the bounded, measured claim.

The harness motivated one detector strengthening: tokenCensus now flags a
flat token NEUTERED to the bare root scope (an entry that exists but paints
inert document text), moving that gap class from differential-only to
corpus-free. Wired as a meta-gate in `npm run check`.
The keyword/operator obligation was the one class still checked by the
grammar-derived corpus (leaf coverage). Replace it with a structural,
a-priori discharge: `literalDischarge` confirms every alphabetic literal the
grammar consumes (collectLiterals over every rule + the prec/led tables)
appears, as a scoped word, in some REACHABLE pattern whose scope is a keyword
family — a finite scan of the emitted artifact, no corpus. 248/248 across the
six grammars; non-vacuous (stripping `class` from its patterns reports it
undischarged).

Completeness is now a decidable structural check end to end — token discharge
(census, incl. neuter) + keyword-literal discharge + repository reachability,
952/952 = 100%, no corpus. The leaf-coverage corpus pass is demoted to a
redundant differential cross-check on the soundness axis; `tm-mutation`'s
structural layer now also kills a dropped/neutered keyword corpus-free.

COMPLETENESS.md draws the line correctly: COMPLETENESS (present + reachable +
scoped) is decidable — finite G, finite gen-tm(G), an obligation taxonomy
bounded by TextMate's finite construct kinds, per-token discharge by
structural identity (the flat `match` IS the token's own pattern) — and ∀ G by
structural induction over the finite combinator algebra. SOUNDNESS (do the
present constructs paint correctly on all inputs — wrong-role, pattern
ordering) is the undecidable residual (CFG vs regex-stack-machine over infinite
input). The earlier "a-priori completeness is unavailable" was an
over-concession: that was soundness's wall, mistaken for completeness's.
An adversarial gap-hunt (5 agents over the classes the structural checker
does NOT cover) found that the getTypeParamElementKeywords `sep` drop had
un-fixed siblings in the same family:

- detectTypeParamConstraintKeywords.scanConstraint omitted `sep`, so a
  type-parameter CONSTRAINT keyword reached through a `&`/`,`-separated list
  was not collected — the .tsx generic-arrow⇄JSX no-comma disambiguation lost
  its constraint signal (mis-scoping the header as a JSX tag).
- detectDeclarations.containsBlockRef omitted `sep`, so a declaration whose
  brace body is reached through a `sep` was not seen as having a body — its
  #declaration-body member-scoping region was dropped.

Both recurse into `sep.element` now, mirroring the prior fix. Byte-identical
on all six shipped grammars (latent: every shipped grammar writes constraints
as `opt('extends', Type)` and block bodies as direct refs).

The hunt found 8 latent completeness gaps total; all were VERIFIED latent —
tokenizing the witnesses against the shipped grammars shows ternary, calls,
trailing-comma type params, and every prec operator are correctly scoped, so
the 0-gap soundness ledger is real, not corpus-blind. The remaining gaps are
detector SHAPE-FRAGILITY (fixed-offset window matching in detectTernary /
detectCallExpression / isAngleBracketSepRule misses equivalent factorings),
an unimplemented `rawBlock` config region, and punctuation-in-region — tracked
for follow-up; none triggers on a shipped grammar.
The fixed-window detector fragility the gap-hunt surfaced was a symptom; the
root cause is that the shape-detectors pattern-match RAW RuleExpr shapes, so an
equivalent factoring of the same construct (an `opt`-tail, a separate args
rule, a trailing comma) is a different shape and slips them. Adding a `sep` arm
per detector only widens the symptom patch — the next factoring still slips.

Fix the structural condition: match the NORMALISED form. expandAlts already
canonicalises alt-split / opt-tail / group / quantifier into the same flat
adjacency; route the three fragile detectors through it, plus FIRST-set for the
one ref-hidden case:

- detectTernary, isAngleBracketSepRule → expandAlts (opt-tail and trailing-comma
  factorings now reduce to the matched adjacency; `sep` stays opaque so the sep
  node survives).
- detectCallExpression → FIRST(next) instead of a literal `(`, so the args may be
  the inline `(` OR a separate `CallArgs` rule referenced after the callee.

Byte-identical on all six shipped grammars (they already write the canonical
factoring); the three latent gaps are now emitted for their equivalent forms
(verified: #ternary-expression / #function-call / #declaration-type-params).

And the detector is no longer blind to this class: a new shape-robustness gate
(test/tm-completeness.ts) asserts each construct emits its region for EVERY
equivalent factoring — it bites (3 failures) without the normalization. So the
class is now caught structurally, not just by an adversarial hunt.
A systematic shape-fragility audit (5 agents running the real emitter over
equivalent factorings of each detector's construct) found the fixed-window
fragility is broad, not three isolated cases. Fix four more by routing through
the normalized form, same pattern as detectTernary:

- detectConditionalType — ran its 7-window on raw r.body; now over expandAlts,
  so an `opt`-tail / grouped / alt-split conditional `?:` is detected.
- getTypeParamElementKeywords — early-out demanded an exact 3-item `'<' sep '>'`
  body; now scans the expandAlts branches for that adjacency, so a trailing
  `opt(',')` or alt-wrapped type-param list still hoists its constraint keyword.
- detectDeclarations.isBlockRule — matched only a raw-seq `{ … }` body; now
  EVERY expandAlts branch must be `{ … }`-bounded (`.every`, not `.some`, so a
  `Type` that is only SOMETIMES an object literal is not mis-read as a brace
  declaration — `.some` regressed `type X = …` to #declaration-body).
- detectJsx hasElementShape — matched `<`+ref only in a raw seq; now over the
  expandAlts branches, so an opt/alt/group factoring of the element qualifies.

Byte-identical on all six shipped grammars (verified the `.every` form after a
`.some` first attempt changed typescript/tsx output), and each fix is verified
to detect its previously-dropped factoring. Seven detectors are now shape-robust;
the YAML region detectors and the expression group are the remaining batch.
Reading each expression detector myself (the audit agent's output was
malformed): five matched a construct on the raw r.body, so an equivalent
factoring slips them. Route them through expandAlts, same pattern as
detectTernary/detectCallExpression:

- detectBareArrowParam — `ref '=>'`; an opt-tail arrow (`[x, opt('=>', body)]`)
  was dropped (verified: variable.parameter now emitted for that factoring).
- detectPropertyAccess — `'.'`/`'?.'` before a token ref.
- detectParenArrowParams + detectArrowParamDelims — the deliberate pair that
  read the same arrow param-list production; routed identically so they still
  cannot disagree.
- detectDirectParamKeywords — keyword directly before `(`; also recurse `sep`.

detectConstructorKeywords already expands (no change). Byte-identical on all six
shipped grammars. Twelve detectors are now shape-robust. The YAML region
detectors are deferred to a semantics-aware pass: their fixed shapes encode
DELIBERATE YAML semantics (detectFold stops at a leaf token and does NOT follow
a rule-ref by design — an Indent+rule-ref is a SIBLING node, not a fold), so the
audit's heuristic "follow the ref" fix would break the fold-vs-sibling meaning.
Not every fixed shape is fragility — some are intent.
)

Worked the YAML region detectors semantics-aware (not the audit's heuristic).
Key finding: the audit's "route through expandAlts" fix is WRONG for these —
they match STRUCTURAL nodes (a `(Newline item)*` quantifier, config-keyed
bracket pairs), and expandAlts EXPANDS the very quantifier they depend on.

- detectBlockSequence — the one genuine positional rigidity: it matched the
  `[item, (Newline item)*]` pattern only at items[0]/items[1]. Now scanned
  pairwise (any adjacent k), so a leading element before the sequence does not
  hide it. NOT routed through expandAlts (that would expand the quantifier).
  yaml byte-identical; the full yaml gate group (depth-witnesses, deepest-sibling,
  compact-nest-sites, flow-sites, blockscalar-depth, issue12) stays green.
- detectFold — its visit is ALREADY pairwise; its refsLeaf stopping at a leaf
  token (not following a rule-ref) is the DELIBERATE fold-vs-sibling distinction
  (an Indent+rule-ref is a sibling node). No change.
- detectExplicitKey — the indicator at items[0] is intrinsic (an explicit-key
  entry is headed by `?`); the inner already unwraps a quantifier. No change.
- detectFlowCollections — topLits is positional-agnostic and unwraps
  quantifier/group/sep, deliberately stopping at alt/ref to read THIS rule's own
  structure. No change.

So the fixed-window fragility class is closed: 13 detectors made shape-robust
where the rigidity was accidental; the rest is deliberate YAML semantics that
the heuristic over-flagged. Not every fixed shape is fragility.
getTypeParamElementKeywords collected only KEYWORD literals, so a literal the
grammar declares a scope for that is PUNCTUATION (e.g. a `&` scoped
punctuation.separator) inside a type-parameter element lost its scope inside
`<…>`. Collect any scope-declared literal now, and emit it with the right
boundary (`\b` for words, none for punctuation — `\b&\b` never matches). `=`
and `,` are skipped in the loop since the dedicated handlers emit them, so the
six shipped grammars stay byte-identical (verified the `=` double-emit and
excluded it); a scoped `&` in a type-param element is now scoped (verified).

This closes the last clean completeness gap from the hunt. The remaining two are
not clean completeness gaps: the symbolic-operator case is an ORDERING concern (a
short overridden op can shadow a longer non-overridden one across the separate
#operator-overrides / #operators patterns — the provably-hard ordering axis, and
the op IS scoped, just shadowed), and `rawBlock` is a declared-but-unimplemented
IndentConfig field (no shipped grammar sets it; implementing the verbatim region
speculatively, with no adopter to verify against, is deferred).
An adversarial review of the "proven no gaps" conclusion found real holes;
this acts on them.

- Markup co-blindness (the one fixable defect): tokenCensus discharged EVERY
  markup token via `if (g.markup) bump('markup-region')` with zero
  verification, so a markup token whose declared scope generateMarkupTm does
  not model (e.g. a `<?…?>` processing instruction) was reported discharged
  while the engine painted it the bare root. Now a markup token with an
  explicit `scope` must have that scope actually emitted, or it is an orphan
  (verified: the PI counterexample is now caught; html stays 7/7).

- COMPLETENESS.md corrections, all overclaims the review caught:
  - "per-token discharge by structural identity (match IS tokenPatternSource)"
    was false — the identifier match is widened to identPattern, and the census
    checks PRESENCE (a reachable non-root-scoped entry), not regex identity.
    Reworded to presence-not-identity.
  - "∀ G by structural induction" conflated the algebra CLOSURE (which is ∀ G)
    with the per-grammar DISCHARGE (executed on the shipped set). Separated.
  - "ordering is undecidable" was the project's own impossibility-without-proof
    trap: for a fixed G the pattern list is finite and the winner is a finite
    index read (gen-tm even sorts it deterministically). It is "not reachable by
    a corpus-free structural fold," a measurement limit, not undecidability.

Residual, honestly stated: the shape class is MEASURED not proven — the
robustness gate covers 3 of ~20 detectors, mutation testing mutates only flat
tokens (not shape regions), and detectAngleBracketCast keeps a latent
ref-factored fragility (type-cast fires on no shipped grammar). 81/81 · 40/40.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prove full completeness of the TextMate generator

1 participant