Support Python regex a and L flags by bcmeireles · Pull Request #1589 · lark-parser/lark

bcmeireles · 2026-04-28T17:16:18Z

Added grammar-level support for Python's a and L regular expression flags, closing the gap between Lark's supported regex suffix syntax and Python's re flags. Fixes #1527

erezsh · 2026-05-01T14:09:06Z

Did you use an LLM to write this?

bcmeireles · 2026-05-01T14:51:56Z

@erezsh no

erezsh · 2026-05-01T15:00:52Z

So, can you explain a few parts?

Why use regexp.encode('latin-1') ?

And what is _strip_width_only_locale_flags for?

And what is test_token_flags_locale_bytes testing?

bcmeireles · 2026-05-07T15:37:03Z

Why use regexp.encode('latin-1') ?

because Python is picky with (?L), only working with bytes regexes.
latin-1 is chosen because it maps code points 0-255 directly to byte values. If we used UTF-8, one character might become multiple bytes, and then the regex would mean something slightly different.

And what is _strip_width_only_locale_flags for?

Flags like a, L, and u affect character-class semantics, such as \w, \b, and case handling, but they do not affect regex width. For width analysis, like checking whether a token has fixed/min/max length, those flags are irrelevant.

And what is test_token_flags_locale_bytes testing?

verifies the bytes-specific locale case

regex has locale flag
input is bytes
Python gets a bytes regex, not a string regex
latin-1 doesn’t mess up the bytes
the width-check helper doesn’t break actual matching

erezsh · 2026-05-09T21:41:57Z

Ok, I think I have a better understanding now.

I think _strip_width_only_locale_flags() is the wrong approach.

_get_width() calls get_regexp_width(self.to_regexp()), so we can just generate the regexp without the ?L flag, instead of having to do a brittle search and replace over it.

i.e. something like get_regexp_width(self.to_regexp(locale_flags=False)), and get_regexp_width doesn't have to change.

It would also be nice to see better error handling, and tests for that error handling, and possibly some edge cases.

Support Python regex a and L flags

e1ee4df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Python regex a and L flags#1589

Support Python regex a and L flags#1589
bcmeireles wants to merge 1 commit into
lark-parser:masterfrom
bcmeireles:support-a-and-l-regex-flags

bcmeireles commented Apr 28, 2026

Uh oh!

erezsh commented May 1, 2026 •

edited

Loading

Uh oh!

bcmeireles commented May 1, 2026

Uh oh!

erezsh commented May 1, 2026

Uh oh!

bcmeireles commented May 7, 2026

Uh oh!

erezsh commented May 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bcmeireles commented Apr 28, 2026

Uh oh!

erezsh commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bcmeireles commented May 1, 2026

Uh oh!

erezsh commented May 1, 2026

Uh oh!

bcmeireles commented May 7, 2026

Uh oh!

erezsh commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erezsh commented May 1, 2026 •

edited

Loading

erezsh commented May 9, 2026 •

edited

Loading