[CALCITE-7448] Add support for : as a field/item access operator#4844
[CALCITE-7448] Add support for : as a field/item access operator#4844tmater wants to merge 9 commits intoapache:mainfrom
: as a field/item access operator#4844Conversation
|
I might wait until the CI issue is resolved before reviewing this PR. |
|
I don't understand Calcite JavaCC code very well, I might need to learn it. If no one reviews it this week, I'll start working on it next week. |
Thank you for taking a look, @caicancai. I agree this is still a fairly complex PR, and I wouldn’t claim to fully understand all of its implications yet. That’s also why I tried to include a wide range of test variations and keep the changes gated behind a conformance config. I’ve organized it into five commits (prior to the CI failures) to make the review easier. If it would help further, I’m happy to split it into 4–5 smaller PRs. |
|
Is the behavior with |
…acket refactor The branch moved bracket ([...]) and dot (.) postfix handling out of Expression2()'s loop and into AddRegularPostfixes (called from AddExpression2b). Babel's InfixCast production was implicitly relying on Expression2()'s loop to consume postfixes after the DataType() call, so after the refactor, expressions like v::varchar[1] would fail to parse because no grammar rule picked up the trailing [1]. Fix: call AddRegularPostfixes(list) after DataType() in InfixCast, making the postfix consumption explicit rather than relying on the caller's loop. Also consolidate InfixCast test coverage: remove the now-redundant testInfixCastBracketAccessNeedsParentheses and extend testColonFieldAccessWithInfixCast with comprehensive cases covering bracket access, dot access, parenthesized forms, and array types—both with and without isColonFieldAccessAllowed.
@mihaibudiu , good catch! The parenthesis requirement itself is not new, on However, you're right that something was off: the bracket refactor broke parsing |
| [ | ||
| LOOKAHEAD(2, <COLON> SimpleIdentifier(), | ||
| { this.conformance.isColonFieldAccessAllowed() }) | ||
| <COLON> |
There was a problem hiding this comment.
Why is chained colon access not supported?
There was a problem hiding this comment.
I kept : intentionally non-chainable here. After one : we are already back in the existing postfix world, so the follow-up access can be expressed with the normal operators:
a:b.cinstead ofa:b:ca:['x'].yinstead ofa:['x']:y(a:b)['x']/a:b['x']instead ofa:b:['x']
So a second : would mostly be extra surface syntax, not extra expressiveness. Keeping it to a single colon also keeps the grammar narrower and avoids widening the ambiguous space around other colon uses, especially :: in Babel and JSON constructor : handling.
If we decide later that repeated : is a real dialect requirement, we can extend the current [...] to a loop, but I wanted to start with the minimal syntax that covers the targeted cases.
| s.pos())); | ||
| list.add(dt); | ||
| } | ||
| AddRegularPostfixes(list) |
There was a problem hiding this comment.
I'm not sure if it will affect existing semantics.
There was a problem hiding this comment.
Before the refactor, InfixCast could stop after DataType() and rely on the outer Expression2() loop to notice any following . or [...]. After the refactor, that outer postfix branch was removed and the shared postfix handling moved earlier into AddExpression2b(). But Babel InfixCast is not part of that earlier path; it is an extraBinaryExpressions hook that runs later.
So if InfixCast does not invoke AddRegularPostfixes() itself, nobody else will. That is why it has to “own” it now: not because the semantics changed, but because the control flow changed. The shared postfix parser still exists, but this is now the only place on the Babel :: path where it can be called.
| // Double-colon followed by colon field access is not allowed | ||
| f.sql("select v::variant^:^field from t") | ||
| .fails("(?s).*Encountered \":.*\".*"); | ||
| } |
|
@tmater Thank you for following up. I left a question on Jira. Could you answer it when you have time? |
|



Changes Proposed
This PR adds opt-in support for
:as a field/item access operator behind a newSqlConformance#isColonFieldAccessAllowed()hook. No built-in conformance enables it yet, so the default parser behavior is unchanged.It also refactors postfix parsing so colon field access composes with existing dot and bracket access. That covers cases such as
v:field,v:['field name'],arr[1]:field,obj['x']:nested['y'], and colon access followed by bracket or dot postfixes.To avoid ambiguity when colon field access is enabled,
JSON_OBJECTandJSON_OBJECTAGGmust useVALUEsyntax instead of:or comma-separated key/value shorthand.Reproduction
Testing
Added parser coverage in
SqlParserTestandBabelParserTestfor valid and invalid colon field access, JSON constructor disambiguation, and::cast interactions.Jira Link
CALCITE-7448