Fix projection functional dependency remapping#23028
Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
neilconway
left a comment
There was a problem hiding this comment.
Thanks for reporting this!
I believe optimize_projections should already do this (which is why the newly added SLTs pass without the changes in this PR). From digging around a little bit, it looks like the TPC-DS query is running into a different bug: if there's an intermediate Projection, it seems like the FDs aren't being propagated correctly due to a bug in calc_func_dependencies_for_project when a projection list contains a mix of computed and passthrough columns. Claude found this test case:
#[test]
fn projection_with_leading_computed_column_preserves_pk() {
// input: [id (PK), name, amount]
let input_fds = FunctionalDependencies::new_from_constraints(
Some(&Constraints::new_unverified(vec![Constraint::PrimaryKey(vec![0])])),
3,
);
// output of a CSE-style projection: [__common_expr_1 (computed), id, name, amount]
// -> proj_indices must be [MAX, 0, 1, 2], NOT [0, 1, 2]
let proj_indices = vec![usize::MAX, 0, 1, 2];
let projected = input_fds.project_functional_dependencies(&proj_indices, 4);
// determinant must remap to output position 1 (`id`), not 0 (`__common_expr_1`)
assert_eq!(projected[0].source_indices, vec![1]);
}Would you like to take a look at fixing that bug? I think if we do that then the existing projection optimization rewrite should apply here to the TPC-DS query.
| # Functional-dependency targets should only be added to aggregate | ||
| # GROUP BY outputs when the SQL query actually needs them after | ||
| # aggregation. |
There was a problem hiding this comment.
These tests pass without the code changes in this PR.
There was a problem hiding this comment.
Rewrote the code and these tests are removed.
|
@neilconway I reworked the PR to fix the FD propagation issue in I also dropped the previous aggregate planning changes and added a focused regression test for the computed-column-before-PK case: I also ran a debug SF10 TPC-DS all-query run for this path and it completed with 0 failures and Q39 recovered, but I am treating that only as diagnostic evidence rather than a formal benchmark claim. Thanks again for pointing me at the right root cause. |
Which issue does this PR close?
Rationale for this change
Projection functional dependency propagation builds a mapping from projected output expressions back to input field indices.
Before this change, projection expressions that are not direct input fields, such as planner-generated computed expressions, were omitted from that mapping. This shifted the projected positions of later passthrough columns. If a primary key column appears after such a computed expression, the projected schema can record the primary key functional dependency against the wrong output index.
This showed up in the TPC-DS q39 planning path after primary key constraints were added to the schemas: downstream aggregate planning was reasoning from incorrect functional dependency metadata.
What changes are included in this PR?
Are these changes tested?
Yes.
I also ran a debug SF10 TPC-DS all-query comparison for the fix path separately; it completed with 0 failures. I am treating that as diagnostic evidence, not a formal benchmark claim.
Are there any user-facing changes?
No SQL syntax or public API changes.
Users may see corrected optimized plans and restored performance for queries affected by projection functional dependency remapping.