Skip to content

[CALCITE-6636] Support CNF condition of Arrow ArrowAdapter#4848

Merged
caicancai merged 1 commit intoapache:mainfrom
caicancai:6636
Apr 2, 2026
Merged

[CALCITE-6636] Support CNF condition of Arrow ArrowAdapter#4848
caicancai merged 1 commit intoapache:mainfrom
caicancai:6636

Conversation

@caicancai
Copy link
Copy Markdown
Member

@caicancai caicancai commented Mar 26, 2026

@caicancai caicancai marked this pull request as draft March 26, 2026 14:35
@caicancai caicancai marked this pull request as ready for review March 26, 2026 14:41
* <a href="https://issues.apache.org/jira/browse/CALCITE/issues/CALCITE-6293">
* [CALCITE-6293] Support OR condition in Arrow adapter</a> is fixed. */
public static final boolean CALCITE_6293_FIXED = false;
public static final boolean CALCITE_6293_FIXED = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these can be completely removed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done

}
String plan = "PLAN=ArrowToEnumerableConverter\n"
+ " ArrowProject(intField=[$0], stringField=[$1])\n"
+ " ArrowFilter(condition=[SEARCH($0, Sarg[0, 1, 2])])\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow supports search?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow/Gandiva does not support the SEARCH operator; I have fully expanded the SEARCH operator.

List<List<String>> translateMatch(RexNode condition) {
// Expand SEARCH nodes and convert to CNF
final RexNode expanded = RexUtil.expandSearch(rexBuilder, null, condition);
final RexNode cnf = RexUtil.toCnf(rexBuilder, expanded);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be much larger than the original condition.
You should add some tests with deeper conditions (multiple nested levels of parens).

return builder.build();
}

private TreeNode parseSingleCondition(String condition) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know the grammar for these conditions, so I cannot tell whether the spaces are where you expect them. Is this documented someplace?

What happens if you have a comparison with a string containing a space, for example?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added instructions, but it doesn't seem to solve the empty string problem yet. I need to think about other solutions. 🤔

Copy link
Copy Markdown
Member Author

@caicancai caicancai Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now changed it to a structured token format instead of string parsing:

  • unary: [fieldName, operator]
  • binary: [fieldName, operator, value, type]

@caicancai caicancai force-pushed the 6636 branch 2 times, most recently from fcd59d0 to ed477d9 Compare March 27, 2026 14:10
@caicancai caicancai requested a review from mihaibudiu March 27, 2026 14:43
/** Adds new predicates.
*
* @param predicates Predicates
* @param predicates Predicates in CNF form (outer list is AND, inner list is OR)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 3 lists, can you explain all of them?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I have improved the comments.

* e.g. {@code ["intField", "isnull"]}</li>
* </ul>
*
* <p>Using structured tokens avoids string splitting and safely supports
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to justify this choice.
In general, you should use higher-level representations as much as possible.

} else {
throw new UnsupportedOperationException("Unsupported disjunctive condition " + condition);
/** The maximum number of nodes allowed during CNF conversion.
* If exceeded, the original expression is returned unchanged. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the translation in that case? It fails?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add testLiteralWithEmptyString

return builder.build();
}

/** Parses a single condition into a Gandiva {@link TreeNode}.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that it would be better to define a new class for this data structure UnaryOrBinaryCondition perhaps, instead of using a List.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@caicancai caicancai requested a review from mihaibudiu March 29, 2026 13:33
@mihaibudiu
Copy link
Copy Markdown
Contributor

I am currently on vacation; I will be only able to review this later in April

@caicancai
Copy link
Copy Markdown
Member Author

I am currently on vacation; I will be only able to review this later in April

Have a great holiday and have some fun!

}

/** Parses a single {@link ConditionToken} into a Gandiva {@link TreeNode}. */
private TreeNode parseSingleCondition(ConditionToken token) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"parse" is probably no longer appropriate, perhaps "convertConditionToGandiva?"

* the original expression unchanged, which may cause the subsequent
* translation to Gandiva predicates to fail with an
* {@link UnsupportedOperationException}. That exception is caught by
* {@link ArrowRules.ArrowFilterRule#onMatch}, which silently skips the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are making here some assumptions about who is calling this code which may not hold; in general, when you write a library, you don't know how it will be used. You can be more precise by saying that this is a possible path: "when invoked by the module to convert to Arrow..."

* {@link UnsupportedOperationException}. That exception is caught by
* {@link ArrowRules.ArrowFilterRule#onMatch}, which silently skips the
* Arrow convention and falls back to an Enumerable plan. */
private static final int MAX_CNF_NODE_COUNT = 256;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would not be a constant, but a parameter of translateMatch, similar to toCNF, but I don't know if that is possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently have no requirements for configurability; however, if users express a need for it in the future, I can certainly provide support. Let's let it run for a while first.

* and operator; binary conditions additionally have a literal value
* and its type.
*
* <p>This class replaces the raw {@code List<String>} representation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general JavaDoc does not need to discuss things that do no longer exists. I think you can just remove this comment.

/** When a filter condition exceeds the CNF node limit, the Arrow adapter
* falls back to the Enumerable convention (EnumerableCalc) instead of
* using ArrowFilter. The query should still return correct results. */
@Test void testCnfExceedsLimitFallsBackToEnumerable() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming you have validated these Arrow plans somehow.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Apr 1, 2026
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 2, 2026

@caicancai caicancai merged commit 1aacc43 into apache:main Apr 2, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants