Generalize ScriptWhitespace#337
Conversation
|
@lihaoyi , any chance you could find 5 minutes to look at my PR? Thanks! |
| @@ -10,6 +10,7 @@ object WhitespaceTests extends TestSuite{ | |||
| def checkCommon(p0: Whitespace) = { | |||
| val p = p0.apply(_) | |||
There was a problem hiding this comment.
The function tests for line comments and block comments (lines 16-24), which are NOT supported by — it only supports a single configurable delimiter for line comments. These tests will fail when called with in the test (line 46). Either should be split into separate helpers for different whitespace types, or should not call .
| @@ -74,10 +74,12 @@ object ScriptWhitespace{ | |||
| case 0 => | |||
| (currentChar: @switch) match{ | |||
There was a problem hiding this comment.
Performance issue: The inner (currentChar: @switch) match now has a problem. Originally it had 3 cases: whitespace chars, #, and case _. The @switch annotation compiles this to a tableswitch JVM instruction (O(1) dispatch).
After this change, case # is removed from the match and moved into case _ as a runtime if (currentChar == delim) check. This means:
- The
@switchannotation on the inner match is still valid (2 cases + default), BUT - Every non-whitespace character now pays an extra
ifcheck fordelimbefore returning success. Previouslycase _went directly to success. Now it doesif (currentChar == delim) ... else success.
For the default ScriptWhitespace.whitespace (delim=#), this adds one extra character comparison per non-whitespace character encountered. In a hot path that parses whitespace frequently (between every token in a source file), this is a measurable regression.
Fix: Keep the hardcoded case # => in the switch AND still support custom delimiters. One approach: make custom return a different anonymous Whitespace subclass that has its own apply method with the delim hardcoded via a val captured at construction time, but the @switch match should still have the delim as a literal case. However, since delim is a runtime parameter, you cannot have it as a case literal.
Alternative: accept that custom delimiters incur a tiny overhead but document this, or use a macro/inline to generate the literal case at compile time.
| @@ -86,6 +88,7 @@ object ScriptWhitespace{ | |||
| rec(current = ctx.index, state = 0) | |||
There was a problem hiding this comment.
Performance concern: implicit val whitespace = custom('#') allocates a new Whitespace instance on ScriptWhitespace initialization, same as implicit object. However, custom(delim) called by users with arbitrary chars allocates a new anonymous Whitespace instance per call.
This is minor since users typically call custom once and cache the result as an implicit val. But consider documenting that the result should be stored in a val, not called inline in a parse hot path.
solves #336