Fix ON CONFLICT subqueries referencing excluded rows#345
Conversation
INSERT ... ON CONFLICT DO UPDATE rewrites excluded column references to read from the source row struct. That rewrite was also being applied while copying ResolvedSubqueryExpr parameter lists, even though those lists can only contain ResolvedColumnRef nodes. When a DO UPDATE expression used a subquery such as (SELECT excluded.string_col), the deep-copy visitor tried to consume the replacement GET_STRUCT_FIELD expression as a column ref and failed an internal stack invariant. Handle subquery copying explicitly so excluded parameters are remapped to the source struct column while references inside the subquery body continue to be rewritten to field reads. Preserve the correlated bit on replacement refs and add a regression test for the crashing query shape.
| } | ||
|
|
||
| TEST_P(QueryEngineTest, InsertOnConflictDoUpdateSubqueryCanReferenceExcluded) { | ||
| if (GetParam() == POSTGRESQL) { |
There was a problem hiding this comment.
This should work for PG as well. Do we need this GTEST_SKIP?
| Query{"INSERT INTO test_table (int64_col, string_col) " | ||
| "VALUES(1, 'ten') " | ||
| "ON CONFLICT(int64_col) DO UPDATE SET string_col = " | ||
| "(SELECT excluded.string_col)"}, |
There was a problem hiding this comment.
Maybe good to replace this with a subquery that does a table scan as well:
(SELECT CONCAT(t.string_col, '-', excluded.string_col) from test_table t WHERE t.int64_col = 1)
This will change the output string_col to one-ten since the row with int64_col:1 already exists in this test.
| for (const auto& parameter : node->parameter_list()) { | ||
| if (column_ids_referenced_from_insert_row_.contains( | ||
| parameter->column().column_id())) { | ||
| if (!added_struct_column_holder) { |
There was a problem hiding this comment.
Thanks for addressing this issue and adding a fix.
Can we pls add a comment here - "Replace the column references from insert row with the insert row value struct"
The regression is not GoogleSQL-specific. Let the focused ON CONFLICT DO UPDATE subquery test run for PostgreSQL as well, matching the dialect behavior expected by the review.
Strengthen the ON CONFLICT DO UPDATE regression test so the subquery reads from test_table while also referencing excluded.string_col. This covers the review case where the subquery parameter rewrite must coexist with a real scan inside the subquery.
Add the requested comment explaining that excluded column references in a subquery parameter list are represented by the insert row value struct.
|
Also, if it is in any way more straightforward for you all, it is fine if you just take ownership of this change. It is not important to me that my specific fix be the one submitted (and I understand that the repo policy is generally that PRs are not accepted) |
INSERT ... ON CONFLICT DO UPDATE rewrites
excludedcolumn references to read from the source row struct. That rewrite was also being applied while copyingResolvedSubqueryExprparameter lists, even though those lists can only containResolvedColumnRefnodes. For aDO UPDATEexpression like(SELECT excluded.string_col), the deep-copy visitor tried to consume the replacementGET_STRUCT_FIELDexpression as a column ref and failed an internal stack invariant.This changes subquery copying so
excludedparameters are remapped to the source struct column, while references inside the subquery body continue to be rewritten to field reads. It also preserves the correlated bit on replacement refs and adds a regression test for the crashing query shape.Tested:
bazel test //backend/query:query_engine_test --test_filter=QueryEnginePerDialectTests/QueryEngineTest.InsertOnConflictDoUpdateSubqueryCanReferenceExcluded* --jobs=1 --local_resources=memory=4096 --local_resources=cpu=1bazel test //backend/query:query_engine_test --test_filter=QueryEnginePerDialectTests/QueryEngineTest.InsertOnConflictDoUpdate* --jobs=1 --local_resources=memory=4096 --local_resources=cpu=1