You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added grammar-level support for Python's a and L regular expression flags, closing the gap between Lark's supported regex suffix syntax and Python's re flags. Fixes#1527
because Python is picky with (?L), only working with bytes regexes. latin-1 is chosen because it maps code points 0-255 directly to byte values. If we used UTF-8, one character might become multiple bytes, and then the regex would mean something slightly different.
And what is _strip_width_only_locale_flags for?
Flags like a, L, and u affect character-class semantics, such as \w, \b, and case handling, but they do not affect regex width. For width analysis, like checking whether a token has fixed/min/max length, those flags are irrelevant.
And what is test_token_flags_locale_bytes testing?
verifies the bytes-specific locale case
regex has locale flag
input is bytes
Python gets a bytes regex, not a string regex
latin-1 doesn’t mess up the bytes
the width-check helper doesn’t break actual matching
I think _strip_width_only_locale_flags() is the wrong approach.
_get_width() calls get_regexp_width(self.to_regexp()), so we can just generate the regexp without the ?L flag, instead of having to do a brittle search and replace over it.
i.e. something like get_regexp_width(self.to_regexp(locale_flags=False)), and get_regexp_width doesn't have to change.
It would also be nice to see better error handling, and tests for that error handling, and possibly some edge cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added grammar-level support for Python's a and L regular expression flags, closing the gap between Lark's supported regex suffix syntax and Python's re flags. Fixes #1527