feat: add parse() (list-of-successes) to the grammar combinators (#467 step 2) by SJrX · Pull Request #469 · SJrX/systemdUnitFilePlugin

SJrX · 2026-06-21T16:03:41Z

What

Grows a second matching method, Combinator.parse(), directly on the existing combinators — alongside SyntacticMatch/SemanticMatch, not replacing them. The caller (GrammarOptionValue) is untouched and still uses the old methods, and none of the 225 grammar definitions change. This lets us validate the new engine against the real production grammars before deciding on any migration.

Stacked on #468 (issue-345). This PR's diff is the engine method + result types + tests.

(Supersedes the earlier standalone grammar2 PoC, which this PR removes: a parallel package had a jarringly different surface and implied rewriting every grammar. Same idea, grown in place instead.)

The one idea

fun parse(value: String, offset: Int): Sequence<Parse>   // every way it can match, lazily

instead of today's single greedy first-match. Seq threads each possibility of one part into the next; a value is valid if any path consumes the whole input. So Seq(ZeroOrMore("a"), "a") on "aa" matches — built from the same SequenceCombinator/ZeroOrMore/LiteralChoiceTerminal classes whose SemanticMatch still fails it (the test asserts both). Alt offers all options, so option ordering no longer affects correctness; ZeroOrMore/OneOrMore/Repeat offer every count.

One pass instead of two

Parse.kt adds ParsedToken/Parse and a validate() free function. Each leaf carries a valid flag (the strict/"semantic" check), so one lenient parse answers both:

syntactic ("could be this") = did any path consume the whole value?
semantic ("actually valid") = did any such path use only valid tokens?

No second traversal, and no change to the authoring DSL.

Tests (`ParseTest`, plain JUnit)

runs validate() against the actual ConfigParseAddressFamiliesOptionValue grammar (valid + invalid cases from the canary);
runs it against the real IPV6_ADDR combinator (15+ hand-ordered alternatives + IPv4 suffix) — the old engine needed that ordering to dodge greedy traps; parse() explores all forms;
an integer-range grammar (config_parse_ip_port shape);
the greedy Seq(ZeroOrMore("a"), "a") case, asserting the old SemanticMatch returns -1 while parse() accepts it.

The existing GrammarTest (old engine) is untouched and still passes; the full suite is green.

Known limitations (next layers, deliberately out of scope)

Error localization is best-effort. SyntaxError.furthest collapses to 0 when a trailing EOF() discards a partial path (e.g. AF_INET, AF_INET6 would want 7). Pinned in a test with a comment. Precise localization needs the frontier/expected-set layer — the same machinery as completion (Grammar Based Completion #343) — which is the next step.
No completion yet, no role-labeling/coloring yet (those build on this).
No state merging — like any backtracking matcher, pathological grammars could blow up; fine for short option values, but a path cap + a neutral cancellation hook are planned before wiring parse() into IntelliJ (EDT safety).

Refs #467 #345 #343 #342

🤖 Generated with Claude Code

Parallel, self-contained proof of concept for a new option-value grammar engine in a `grammar2` package. It does not touch the existing `grammar` package; it sits beside it so we can play with the approach and validate it against existing behaviour first. Core idea: every matcher returns ALL the ways it can match, lazily (`parse(input, offset): Sequence<Parse>` — Wadler's "list of successes"), instead of one greedy first match. Seq threads each possibility into the next and a value is valid if any path consumes the whole input, so Seq(ZeroOrMore("a"), "a") now matches "aa" (the case the current engine's own docs warn it fails). Two more ideas come along for free: - a labeled parse tree (Branch with a Role), so a span like an IPv4 address is one labeled unit rather than per-terminal colors; - per-leaf validity flags, collapsing the old syntactic/semantic two passes into one lenient parse (a token can match yet be flagged invalid). Capabilities (validate, and the seed of coloring) are free functions over the parse result — no per-combinator code. No IntelliJ types in the engine; it speaks plain Int offsets so the IntelliJ layer can adapt later. Tests reproduce the RestrictAddressFamilies= canary cases, show well-formed-but- unknown vs malformed error kinds, expose leaf roles for coloring, and prove the greedy completeness win. Refs #467 #345 #343 #342 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-21T16:17:53Z

Test Results

1 117 tests +5 1 117 ✅ +5 49s ⏱️ -1s
295 suites +1 0 💤 ±0
295 files +1 0 ❌ ±0

Results for commit c3b8f6d. ± Comparison against base commit 8b48357.

♻️ This comment has been updated with latest results.

…ammar2 Replaces the standalone grammar2 PoC with the same list-of-successes idea grown directly on the existing combinators, per review feedback: the parallel package had a jarringly different surface and implied a rewrite of all 225 grammars. Instead, add one new method — Combinator.parse(value, offset): Sequence<Parse> — ALONGSIDE the existing SyntacticMatch/SemanticMatch, implemented on each of the 12 combinators next to its existing match logic. The caller (GrammarOptionValue) is untouched and still uses the old methods; nothing in the 225 grammar definitions changes. parse() can therefore be validated against the REAL production grammars before any migration decision. - Parse.kt: ParsedToken / Parse result types + validate() free function. One lenient pass folds the strict "semantic" check into a per-token `valid` flag, so it answers both syntactic (any full parse?) and semantic (a full parse with only valid tokens?) without two traversals. - Each combinator returns every way it can match, lazily; Alt offers all options (ordering no longer matters), ZeroOrMore/OneOrMore/Repeat offer every count. - ParseTest runs validate() against the actual ConfigParseAddressFamiliesOptionValue grammar and the real IPV6_ADDR combinator (15+ hand-ordered alternatives), an integer-range grammar, and the greedy Seq(ZeroOrMore("a"),"a") case — which it shows the old SemanticMatch still fails while parse() succeeds. Known limitation pinned in a test: SyntaxError `furthest` is best-effort and collapses to 0 when a trailing EOF() discards partial progress; precise localization needs the frontier/expected-set layer (the same machinery as completion, #343), deliberately out of scope here. Refs #467 #345 #343 #342 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

SJrX changed the title ~~feat: grammar engine PoC — list-of-successes matcher (#467 step 2)~~ feat: add parse() (list-of-successes) to the grammar combinators (#467 step 2) Jun 21, 2026

SJrX mentioned this pull request Jun 21, 2026

feat: frontier/expected-set for error localization (#467 step 3) #470

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add parse() (list-of-successes) to the grammar combinators (#467 step 2)#469

feat: add parse() (list-of-successes) to the grammar combinators (#467 step 2)#469
SJrX wants to merge 2 commits into
issue-345from
issue-345-2

SJrX commented Jun 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SJrX commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

The one idea

One pass instead of two

Tests (ParseTest, plain JUnit)

Known limitations (next layers, deliberately out of scope)

Uh oh!

github-actions Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SJrX commented Jun 21, 2026 •

edited

Loading

Tests (`ParseTest`, plain JUnit)

github-actions Bot commented Jun 21, 2026 •

edited

Loading