Skip to content

Fix ON CONFLICT subqueries referencing excluded rows#345

Open
cliff-openai wants to merge 4 commits into
GoogleCloudPlatform:masterfrom
cliff-openai:cliff/fix-on-conflict
Open

Fix ON CONFLICT subqueries referencing excluded rows#345
cliff-openai wants to merge 4 commits into
GoogleCloudPlatform:masterfrom
cliff-openai:cliff/fix-on-conflict

Conversation

@cliff-openai
Copy link
Copy Markdown

INSERT ... ON CONFLICT DO UPDATE rewrites excluded column references to read from the source row struct. That rewrite was also being applied while copying ResolvedSubqueryExpr parameter lists, even though those lists can only contain ResolvedColumnRef nodes. For a DO UPDATE expression like (SELECT excluded.string_col), the deep-copy visitor tried to consume the replacement GET_STRUCT_FIELD expression as a column ref and failed an internal stack invariant.

This changes subquery copying so excluded parameters are remapped to the source struct column, while references inside the subquery body continue to be rewritten to field reads. It also preserves the correlated bit on replacement refs and adds a regression test for the crashing query shape.

Tested:

  • bazel test //backend/query:query_engine_test --test_filter=QueryEnginePerDialectTests/QueryEngineTest.InsertOnConflictDoUpdateSubqueryCanReferenceExcluded* --jobs=1 --local_resources=memory=4096 --local_resources=cpu=1
  • bazel test //backend/query:query_engine_test --test_filter=QueryEnginePerDialectTests/QueryEngineTest.InsertOnConflictDoUpdate* --jobs=1 --local_resources=memory=4096 --local_resources=cpu=1

INSERT ... ON CONFLICT DO UPDATE rewrites excluded column references to read from the source row struct. That rewrite was also being applied while copying ResolvedSubqueryExpr parameter lists, even though those lists can only contain ResolvedColumnRef nodes. When a DO UPDATE expression used a subquery such as (SELECT excluded.string_col), the deep-copy visitor tried to consume the replacement GET_STRUCT_FIELD expression as a column ref and failed an internal stack invariant.

Handle subquery copying explicitly so excluded parameters are remapped to the source struct column while references inside the subquery body continue to be rewritten to field reads. Preserve the correlated bit on replacement refs and add a regression test for the crashing query shape.
Comment thread backend/query/query_engine_test.cc Outdated
}

TEST_P(QueryEngineTest, InsertOnConflictDoUpdateSubqueryCanReferenceExcluded) {
if (GetParam() == POSTGRESQL) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work for PG as well. Do we need this GTEST_SKIP?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread backend/query/query_engine_test.cc Outdated
Query{"INSERT INTO test_table (int64_col, string_col) "
"VALUES(1, 'ten') "
"ON CONFLICT(int64_col) DO UPDATE SET string_col = "
"(SELECT excluded.string_col)"},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to replace this with a subquery that does a table scan as well:
(SELECT CONCAT(t.string_col, '-', excluded.string_col) from test_table t WHERE t.int64_col = 1)

This will change the output string_col to one-ten since the row with int64_col:1 already exists in this test.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (const auto& parameter : node->parameter_list()) {
if (column_ids_referenced_from_insert_row_.contains(
parameter->column().column_id())) {
if (!added_struct_column_holder) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this issue and adding a fix.
Can we pls add a comment here - "Replace the column references from insert row with the insert row value struct"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

The regression is not GoogleSQL-specific. Let the focused ON CONFLICT DO UPDATE subquery test run for PostgreSQL as well, matching the dialect behavior expected by the review.
Strengthen the ON CONFLICT DO UPDATE regression test so the subquery reads from test_table while also referencing excluded.string_col. This covers the review case where the subquery parameter rewrite must coexist with a real scan inside the subquery.
Add the requested comment explaining that excluded column references in a subquery parameter list are represented by the insert row value struct.
@cliff-openai cliff-openai requested a review from shreyaprg May 12, 2026 23:17
@cliff-openai
Copy link
Copy Markdown
Author

Also, if it is in any way more straightforward for you all, it is fine if you just take ownership of this change. It is not important to me that my specific fix be the one submitted (and I understand that the repo policy is generally that PRs are not accepted)

Copy link
Copy Markdown

@shreyaprg shreyaprg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants