[CALCITE-6636] Support CNF condition of Arrow ArrowAdapter#4848
[CALCITE-6636] Support CNF condition of Arrow ArrowAdapter#4848caicancai merged 1 commit intoapache:mainfrom
Conversation
| * <a href="https://issues.apache.org/jira/browse/CALCITE/issues/CALCITE-6293"> | ||
| * [CALCITE-6293] Support OR condition in Arrow adapter</a> is fixed. */ | ||
| public static final boolean CALCITE_6293_FIXED = false; | ||
| public static final boolean CALCITE_6293_FIXED = true; |
There was a problem hiding this comment.
Maybe these can be completely removed?
| } | ||
| String plan = "PLAN=ArrowToEnumerableConverter\n" | ||
| + " ArrowProject(intField=[$0], stringField=[$1])\n" | ||
| + " ArrowFilter(condition=[SEARCH($0, Sarg[0, 1, 2])])\n" |
There was a problem hiding this comment.
Arrow/Gandiva does not support the SEARCH operator; I have fully expanded the SEARCH operator.
| List<List<String>> translateMatch(RexNode condition) { | ||
| // Expand SEARCH nodes and convert to CNF | ||
| final RexNode expanded = RexUtil.expandSearch(rexBuilder, null, condition); | ||
| final RexNode cnf = RexUtil.toCnf(rexBuilder, expanded); |
There was a problem hiding this comment.
This could be much larger than the original condition.
You should add some tests with deeper conditions (multiple nested levels of parens).
| return builder.build(); | ||
| } | ||
|
|
||
| private TreeNode parseSingleCondition(String condition) { |
There was a problem hiding this comment.
I don't really know the grammar for these conditions, so I cannot tell whether the spaces are where you expect them. Is this documented someplace?
What happens if you have a comparison with a string containing a space, for example?
There was a problem hiding this comment.
I've added instructions, but it doesn't seem to solve the empty string problem yet. I need to think about other solutions. 🤔
There was a problem hiding this comment.
I have now changed it to a structured token format instead of string parsing:
- unary: [fieldName, operator]
- binary: [fieldName, operator, value, type]
fcd59d0 to
ed477d9
Compare
| /** Adds new predicates. | ||
| * | ||
| * @param predicates Predicates | ||
| * @param predicates Predicates in CNF form (outer list is AND, inner list is OR) |
There was a problem hiding this comment.
I see 3 lists, can you explain all of them?
There was a problem hiding this comment.
Thank you, I have improved the comments.
| * e.g. {@code ["intField", "isnull"]}</li> | ||
| * </ul> | ||
| * | ||
| * <p>Using structured tokens avoids string splitting and safely supports |
There was a problem hiding this comment.
There is no need to justify this choice.
In general, you should use higher-level representations as much as possible.
| } else { | ||
| throw new UnsupportedOperationException("Unsupported disjunctive condition " + condition); | ||
| /** The maximum number of nodes allowed during CNF conversion. | ||
| * If exceeded, the original expression is returned unchanged. */ |
There was a problem hiding this comment.
what happens to the translation in that case? It fails?
There was a problem hiding this comment.
add testLiteralWithEmptyString
| return builder.build(); | ||
| } | ||
|
|
||
| /** Parses a single condition into a Gandiva {@link TreeNode}. |
There was a problem hiding this comment.
It looks to me that it would be better to define a new class for this data structure UnaryOrBinaryCondition perhaps, instead of using a List.
|
I am currently on vacation; I will be only able to review this later in April |
Have a great holiday and have some fun! |
| } | ||
|
|
||
| /** Parses a single {@link ConditionToken} into a Gandiva {@link TreeNode}. */ | ||
| private TreeNode parseSingleCondition(ConditionToken token) { |
There was a problem hiding this comment.
"parse" is probably no longer appropriate, perhaps "convertConditionToGandiva?"
| * the original expression unchanged, which may cause the subsequent | ||
| * translation to Gandiva predicates to fail with an | ||
| * {@link UnsupportedOperationException}. That exception is caught by | ||
| * {@link ArrowRules.ArrowFilterRule#onMatch}, which silently skips the |
There was a problem hiding this comment.
You are making here some assumptions about who is calling this code which may not hold; in general, when you write a library, you don't know how it will be used. You can be more precise by saying that this is a possible path: "when invoked by the module to convert to Arrow..."
| * {@link UnsupportedOperationException}. That exception is caught by | ||
| * {@link ArrowRules.ArrowFilterRule#onMatch}, which silently skips the | ||
| * Arrow convention and falls back to an Enumerable plan. */ | ||
| private static final int MAX_CNF_NODE_COUNT = 256; |
There was a problem hiding this comment.
Ideally this would not be a constant, but a parameter of translateMatch, similar to toCNF, but I don't know if that is possible.
There was a problem hiding this comment.
I currently have no requirements for configurability; however, if users express a need for it in the future, I can certainly provide support. Let's let it run for a while first.
| * and operator; binary conditions additionally have a literal value | ||
| * and its type. | ||
| * | ||
| * <p>This class replaces the raw {@code List<String>} representation |
There was a problem hiding this comment.
In general JavaDoc does not need to discuss things that do no longer exists. I think you can just remove this comment.
| /** When a filter condition exceeds the CNF node limit, the Arrow adapter | ||
| * falls back to the Enumerable convention (EnumerableCalc) instead of | ||
| * using ArrowFilter. The query should still return correct results. */ | ||
| @Test void testCnfExceedsLimitFallsBackToEnumerable() { |
There was a problem hiding this comment.
I am assuming you have validated these Arrow plans somehow.
|



https://issues.apache.org/jira/browse/CALCITE-6636