Skip to content

feat: functional dependencies in JOIN#23047

Open
zyuiop wants to merge 1 commit into
apache:mainfrom
zyuiop:feat/join-functional-dependencies
Open

feat: functional dependencies in JOIN#23047
zyuiop wants to merge 1 commit into
apache:mainfrom
zyuiop:feat/join-functional-dependencies

Conversation

@zyuiop

@zyuiop zyuiop commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

This enables aggregate queries that SELECT over two tables, only one of which is in the WHERE, but for which the join condition guarantees a functional dependency.

Rationale for this change

Currently, queries such as

SELECT 
    Paper.paperId, PaperConflict.conflictType, group_concat(PaperReview.rflags, ' ', PaperReview.reviewNeedsSubmit, ' ', PaperReview.reviewRound)
FROM Paper
    JOIN PaperReview ON (PaperReview.paperId=Paper.paperId
        and ((PaperReview.contactId=1 and (PaperReview.rflags&7936)!=0)))
    LEFT JOIN PaperConflict ON (PaperConflict.paperId=Paper.paperId
        and (PaperConflict.contactId=1))
    GROUP BY Paper.paperId

don't work, because datafusion does not detect that PaperConflict.conflictType uniquely depends on Paper.paperId, and therefore can be included in the SELECT clause.

This PR attempts to address this problem, although the solution is a bit clunky and I am not certain it works in all cases.

What changes are included in this PR?

Extend FunctionalDependencies::join so that it takes as parameter a list of columns which appear in equality comparisons in the ON clause of the join.
This makes it possible to extend functional dependencies across the JOIN if the source columns of a dependency appear in the list.

Alternative approach: this feature may alternatively be realized in an analyser step, but the problem is that the aggregate/project validity check is done in SqlToRel, that is before the analyser runs.

Are these changes tested?

Yes

Are there any user-facing changes?

This enables aggregate queries that SELECT over two tables, only one of which is in the WHERE, but for which the join condition guarantees a functional dependency.
@github-actions github-actions Bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules common Related to common crate labels Jun 19, 2026
@github-actions

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-common v54.0.0 (current)
       Built [  34.810s] (current)
     Parsing datafusion-common v54.0.0 (current)
      Parsed [   0.060s] (current)
    Building datafusion-common v54.0.0 (baseline)
       Built [  33.490s] (baseline)
     Parsing datafusion-common v54.0.0 (baseline)
      Parsed [   0.061s] (baseline)
    Checking datafusion-common v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.652s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure method_parameter_count_changed: pub method parameter count changed ---

Description:
A publicly-visible method now takes a different number of parameters, not counting the receiver (self) parameter.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/method_parameter_count_changed.ron

Failed in:
  datafusion_common::FunctionalDependencies::join takes 3 parameters in /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/229699db0dd05312c530e37c144be4e87a6c9d34/datafusion/common/src/functional_dependencies.rs:331, but now takes 4 parameters in /home/runner/work/datafusion/datafusion/datafusion/common/src/functional_dependencies.rs:335

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  70.278s] datafusion-common
    Building datafusion-expr v54.0.0 (current)
       Built [  28.425s] (current)
     Parsing datafusion-expr v54.0.0 (current)
      Parsed [   0.076s] (current)
    Building datafusion-expr v54.0.0 (baseline)
       Built [  27.486s] (baseline)
     Parsing datafusion-expr v54.0.0 (baseline)
      Parsed [   0.077s] (baseline)
    Checking datafusion-expr v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   1.256s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure function_parameter_count_changed: pub fn parameter count changed ---

Description:
A publicly-visible function now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/function_parameter_count_changed.ron

Failed in:
  datafusion_expr::logical_plan::builder::build_join_schema now takes 4 parameters instead of 3, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/builder.rs:1689
  datafusion_expr::builder::build_join_schema now takes 4 parameters instead of 3, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/builder.rs:1689
  datafusion_expr::logical_plan::build_join_schema now takes 4 parameters instead of 3, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/builder.rs:1689
  datafusion_expr::build_join_schema now takes 4 parameters instead of 3, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/builder.rs:1689

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  58.797s] datafusion-expr
    Building datafusion-optimizer v54.0.0 (current)
       Built [  26.805s] (current)
     Parsing datafusion-optimizer v54.0.0 (current)
      Parsed [   0.029s] (current)
    Building datafusion-optimizer v54.0.0 (baseline)
       Built [  26.178s] (baseline)
     Parsing datafusion-optimizer v54.0.0 (baseline)
      Parsed [   0.031s] (baseline)
    Checking datafusion-optimizer v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.182s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  54.386s] datafusion-optimizer
    Building datafusion-sql v54.0.0 (current)
       Built [  41.019s] (current)
     Parsing datafusion-sql v54.0.0 (current)
      Parsed [   0.031s] (current)
    Building datafusion-sql v54.0.0 (baseline)
       Built [  42.120s] (baseline)
     Parsing datafusion-sql v54.0.0 (baseline)
      Parsed [   0.033s] (baseline)
    Checking datafusion-sql v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.214s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  84.351s] datafusion-sql

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change common Related to common crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant