feat: add parse() (list-of-successes) to the grammar combinators (#467 step 2)#469
Open
SJrX wants to merge 2 commits into
Open
feat: add parse() (list-of-successes) to the grammar combinators (#467 step 2)#469SJrX wants to merge 2 commits into
SJrX wants to merge 2 commits into
Conversation
Parallel, self-contained proof of concept for a new option-value grammar engine
in a `grammar2` package. It does not touch the existing `grammar` package; it
sits beside it so we can play with the approach and validate it against existing
behaviour first.
Core idea: every matcher returns ALL the ways it can match, lazily
(`parse(input, offset): Sequence<Parse>` — Wadler's "list of successes"), instead
of one greedy first match. Seq threads each possibility into the next and a value
is valid if any path consumes the whole input, so Seq(ZeroOrMore("a"), "a") now
matches "aa" (the case the current engine's own docs warn it fails).
Two more ideas come along for free:
- a labeled parse tree (Branch with a Role), so a span like an IPv4 address is one
labeled unit rather than per-terminal colors;
- per-leaf validity flags, collapsing the old syntactic/semantic two passes into
one lenient parse (a token can match yet be flagged invalid).
Capabilities (validate, and the seed of coloring) are free functions over the
parse result — no per-combinator code. No IntelliJ types in the engine; it speaks
plain Int offsets so the IntelliJ layer can adapt later.
Tests reproduce the RestrictAddressFamilies= canary cases, show well-formed-but-
unknown vs malformed error kinds, expose leaf roles for coloring, and prove the
greedy completeness win.
Refs #467 #345 #343 #342
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ammar2
Replaces the standalone grammar2 PoC with the same list-of-successes idea grown
directly on the existing combinators, per review feedback: the parallel package
had a jarringly different surface and implied a rewrite of all 225 grammars.
Instead, add one new method — Combinator.parse(value, offset): Sequence<Parse> —
ALONGSIDE the existing SyntacticMatch/SemanticMatch, implemented on each of the 12
combinators next to its existing match logic. The caller (GrammarOptionValue) is
untouched and still uses the old methods; nothing in the 225 grammar definitions
changes. parse() can therefore be validated against the REAL production grammars
before any migration decision.
- Parse.kt: ParsedToken / Parse result types + validate() free function. One
lenient pass folds the strict "semantic" check into a per-token `valid` flag, so
it answers both syntactic (any full parse?) and semantic (a full parse with only
valid tokens?) without two traversals.
- Each combinator returns every way it can match, lazily; Alt offers all options
(ordering no longer matters), ZeroOrMore/OneOrMore/Repeat offer every count.
- ParseTest runs validate() against the actual ConfigParseAddressFamiliesOptionValue
grammar and the real IPV6_ADDR combinator (15+ hand-ordered alternatives), an
integer-range grammar, and the greedy Seq(ZeroOrMore("a"),"a") case — which it
shows the old SemanticMatch still fails while parse() succeeds.
Known limitation pinned in a test: SyntaxError `furthest` is best-effort and
collapses to 0 when a trailing EOF() discards partial progress; precise
localization needs the frontier/expected-set layer (the same machinery as
completion, #343), deliberately out of scope here.
Refs #467 #345 #343 #342
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Grows a second matching method,
Combinator.parse(), directly on the existing combinators — alongsideSyntacticMatch/SemanticMatch, not replacing them. The caller (GrammarOptionValue) is untouched and still uses the old methods, and none of the 225 grammar definitions change. This lets us validate the new engine against the real production grammars before deciding on any migration.The one idea
instead of today's single greedy first-match.
Seqthreads each possibility of one part into the next; a value is valid if any path consumes the whole input. SoSeq(ZeroOrMore("a"), "a")on"aa"matches — built from the sameSequenceCombinator/ZeroOrMore/LiteralChoiceTerminalclasses whoseSemanticMatchstill fails it (the test asserts both).Altoffers all options, so option ordering no longer affects correctness;ZeroOrMore/OneOrMore/Repeatoffer every count.One pass instead of two
Parse.ktaddsParsedToken/Parseand avalidate()free function. Each leaf carries avalidflag (the strict/"semantic" check), so one lenient parse answers both:No second traversal, and no change to the authoring DSL.
Tests (
ParseTest, plain JUnit)validate()against the actualConfigParseAddressFamiliesOptionValuegrammar (valid + invalid cases from the canary);IPV6_ADDRcombinator (15+ hand-ordered alternatives + IPv4 suffix) — the old engine needed that ordering to dodge greedy traps;parse()explores all forms;config_parse_ip_portshape);Seq(ZeroOrMore("a"), "a")case, asserting the oldSemanticMatchreturns -1 whileparse()accepts it.The existing
GrammarTest(old engine) is untouched and still passes; the full suite is green.Known limitations (next layers, deliberately out of scope)
SyntaxError.furthestcollapses to 0 when a trailingEOF()discards a partial path (e.g.AF_INET, AF_INET6would want 7). Pinned in a test with a comment. Precise localization needs the frontier/expected-set layer — the same machinery as completion (Grammar Based Completion #343) — which is the next step.parse()into IntelliJ (EDT safety).Refs #467 #345 #343 #342
🤖 Generated with Claude Code