Skip to content

indent: un-overload YAML semantics off generic flags/literals (#44)#52

Merged
johnsoncodehk merged 1 commit into
masterfrom
issue-44-indent-decouple
Jun 19, 2026
Merged

indent: un-overload YAML semantics off generic flags/literals (#44)#52
johnsoncodehk merged 1 commit into
masterfrom
issue-44-indent-decouple

Conversation

@johnsoncodehk

Copy link
Copy Markdown
Owner

Closes #44.

Decouples the generic indent core from YAML-specific behavior that previously rode on general-purpose token flags and hardcoded literals, so a non-YAML indentation grammar no longer inherits YAML semantics it never asked for. Each behavior is now an explicit, mode-neutral IndentConfig field that defaults to off; yaml.ts opts in field-by-field and tokenizes byte-identically.

What changed

  • flowSeparatorAfterTokens: string[] (replaces flowColonSeparator: boolean) — explicit membership, by token type, for the flow : key/value separator carve-out, instead of deriving it from the string flag. A grammar can now flag a token string: true (for string-region scoping / auto-close delimiter derivation) without being dragged into separator emission. Flow-close delimiters are always part of the carve-out; off unless a token is named.
  • foldTokens: string[] — explicit membership for plain-scalar continuation folding, instead of deriving it from blockPattern. A token can carry a blockPattern without inheriting YAML's plain-scalar fold. Off unless declared.
  • keyValueSeparator is now the single source of truth for the separator glyph in both the lexer's key-line sniffs and the highlighter — they previously disagreed (the lexer hardcoded :, gen-tm read the field), a latent parser↔highlighter split. Compact-indicator sniffs route through compactIndicators likewise.

Breaking change

flowColonSeparator (added in #41) is removed. The carve-out is now off-by-default, so a grammar that set flowColonSeparator: false simply drops the field; a grammar that relied on the YAML carve-out names its quoted-key tokens in flowSeparatorAfterTokens. A migration for the one known adopter (NMBL) accompanies this change.

Proof

  • npm run gen produces zero generated-file diff — yaml + ts/js/jsx/tsx/html highlighters/parsers are byte-identical.
  • All gates green except the environment-only tm-diagnostics (it requires the vendor/RedCMD submodule and throws before tokenizing). core/indent-extensions is extended with toy non-YAML grammars that assert on the actual token stream: a string: true token keeps its :name after a value (no auto-enlist), a blockPattern token does not fold, and a non-: keyValueSeparator is honored by the lexer.

Deferred

The §6.1 tab-in-indentation errors and the value/item-position classification still hardcode a few YAML indicators (&/!/[/{/*, sequence -, explicit-key ?); a clean split needs the indicator set parameterized — noted in #44 as the larger sub-task.

A non-YAML indentation grammar inherited three YAML behaviors derived
from flags/literals that mean something else, with no opt-out short of
mis-declaring the grammar. Detach each onto its own explicit, mode-neutral
IndentConfig field that defaults OFF; yaml.ts opts into each.

(A) Flow `:` key/value separator carve-out was derived from the `string`
    flag (`stringTokenNames`), silently enlisting every string-region token.
    New `flowSeparatorAfterTokens: string[]` names the membership explicitly
    (carve-out OFF when empty); `string: true` keeps its region-scoping /
    auto-close-derivation jobs without dragging a token into separator
    emission. PR #41's wholesale `flowColonSeparator` boolean is removed —
    an empty list is the neutral-off it provided, without re-overloading.

(B) Plain-scalar continuation folding was derived from `blockPattern`,
    giving YAML folding to any block-pattern token. New `foldTokens:
    string[]` names the fold participants explicitly (folding OFF when
    empty); the last-named token is the catch-all continuation type. A
    grammar can now carry a `blockPattern` token without inheriting the fold.

(C) `keyValueSeparator` was honored by gen-tm but the lexer hardcoded `:`
    (and `-`/`?`) in its key-line sniffs, a latent parser/highlighter split.
    Route every lexer key-separator sniff through `indent.keyValueSeparator`
    (via a shared `keySepAt` helper) and every compact-indicator sniff
    through `compactIndicators`, so the lexer and gen-tm share one source of
    truth for the separator for any value.

Deferred: (D) the §6.1 tab-in-indentation errors and the value/item-position
classification (seq-item `-` vs explicit-key `?`) still hardcode a few YAML
indicators; cleanly splitting them needs `startsBlockStructuralNode`'s
property/flow/alias indicator set parameterized — a larger sub-task, noted
in-code at each site.

yaml.ts opts in field-by-field (flowSeparatorAfterTokens + foldTokens) and
tokenizes byte-identically: `npm run gen` produces zero generated-file diff
across yaml + ts/js/jsx/tsx/html. test/indent-extensions.ts gains toy
non-YAML grammars proving each un-overload (a `string:true` token that keeps
its `:name` after values; a `blockPattern` token that does not fold; a
`keyValueSeparator:'='` grammar whose lexer treats `=` as structural).
@johnsoncodehk johnsoncodehk force-pushed the issue-44-indent-decouple branch from 79711e8 to 489343e Compare June 19, 2026 12:34
@johnsoncodehk johnsoncodehk merged commit c17c521 into master Jun 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decouple the generic indent core from the YAML profile

1 participant