Skip to content

Generalize ScriptWhitespace#337

Open
informarte wants to merge 1 commit into
com-lihaoyi:masterfrom
informarte:generalize-script-whitespace
Open

Generalize ScriptWhitespace#337
informarte wants to merge 1 commit into
com-lihaoyi:masterfrom
informarte:generalize-script-whitespace

Conversation

@informarte
Copy link
Copy Markdown

solves #336

@informarte
Copy link
Copy Markdown
Author

@lihaoyi , any chance you could find 5 minutes to look at my PR? Thanks!

@@ -10,6 +10,7 @@ object WhitespaceTests extends TestSuite{
def checkCommon(p0: Whitespace) = {
val p = p0.apply(_)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function tests for line comments and block comments (lines 16-24), which are NOT supported by — it only supports a single configurable delimiter for line comments. These tests will fail when called with in the test (line 46). Either should be split into separate helpers for different whitespace types, or should not call .

@@ -74,10 +74,12 @@ object ScriptWhitespace{
case 0 =>
(currentChar: @switch) match{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance issue: The inner (currentChar: @switch) match now has a problem. Originally it had 3 cases: whitespace chars, #, and case _. The @switch annotation compiles this to a tableswitch JVM instruction (O(1) dispatch).

After this change, case # is removed from the match and moved into case _ as a runtime if (currentChar == delim) check. This means:

  1. The @switch annotation on the inner match is still valid (2 cases + default), BUT
  2. Every non-whitespace character now pays an extra if check for delim before returning success. Previously case _ went directly to success. Now it does if (currentChar == delim) ... else success.

For the default ScriptWhitespace.whitespace (delim=#), this adds one extra character comparison per non-whitespace character encountered. In a hot path that parses whitespace frequently (between every token in a source file), this is a measurable regression.

Fix: Keep the hardcoded case # => in the switch AND still support custom delimiters. One approach: make custom return a different anonymous Whitespace subclass that has its own apply method with the delim hardcoded via a val captured at construction time, but the @switch match should still have the delim as a literal case. However, since delim is a runtime parameter, you cannot have it as a case literal.

Alternative: accept that custom delimiters incur a tiny overhead but document this, or use a macro/inline to generate the literal case at compile time.

@@ -86,6 +88,7 @@ object ScriptWhitespace{
rec(current = ctx.index, state = 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance concern: implicit val whitespace = custom('#') allocates a new Whitespace instance on ScriptWhitespace initialization, same as implicit object. However, custom(delim) called by users with arbitrary chars allocates a new anonymous Whitespace instance per call.

This is minor since users typically call custom once and cache the result as an implicit val. But consider documenting that the result should be stored in a val, not called inline in a parse hot path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants