fix: avoid extraneous casts for equivalent nested types by feichai0017 · Pull Request #20945 · apache/datafusion

feichai0017 · 2026-03-14T14:38:29Z

Summary

This PR avoids inserting extraneous casts during function argument coercion when two nested types are structurally equivalent but differ only in nested field names or metadata.

Specifically, it:

treats equivalent nested DataTypes as matching during UDF argument coercion
avoids rewriting such arguments with unnecessary CASTs
adds regression coverage in both datafusion-expr and datafusion-optimizer

Closes #19943.

Tests

cargo test -p datafusion-optimizer
./dev/rust_lint.sh
cargo test -p datafusion-expr currently fails on an existing snapshot mismatch in logical_plan::plan::tests::test_display_pg_json on the current main baseline, unrelated to this change

Copilot

Pull request overview

This PR updates DataFusion’s function argument coercion to treat structurally equivalent nested DataTypes as matching (ignoring nested field names / metadata), preventing unnecessary CASTs from being inserted during analysis for UDF calls.

Changes:

Use DataType::equals_datatype (via a helper) when checking whether current argument types already satisfy a function/UDF signature.
Update coercion paths to avoid rewriting arguments when nested types are equivalent.
Add regression tests in both datafusion-expr and datafusion-optimizer to ensure equivalent nested types don’t trigger casts.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`datafusion/expr/src/type_coercion/functions.rs`	Switches type matching from strict equality to `equals_datatype` for function/UDF argument matching and related coercion logic; adds a regression test.
`datafusion/optimizer/src/analyzer/type_coercion.rs`	Adds an analyzer regression test ensuring scalar UDF argument coercion doesn’t insert casts for equivalent nested list/struct types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

neilconway · 2026-03-15T16:59:32Z

datafusion/expr/src/type_coercion/functions.rs

    )
 }

+fn data_types_match(valid_types: &[DataType], current_types: &[DataType]) -> bool {


Seems like we aren't handling Map, Struct, or ListView -- is there a reason for that? In fact, the original bug report uses Map.

I wonder if we can simplify this to use equals_datatype from Arrow, as suggested by the bug reporter?

Thanks for mention.

The original reproducer does go through a Map, but the actual mismatch at the coercion point is on the extracted map value (List<Struct<...>>), not on the Map type itself. That’s why I kept the fast-path relaxation narrow.

I did try a broader equals_datatype approach first, but it was too permissive in this path and regressed existing cases where runtime kernels still require exact type identity, especially around Struct. I agree ListView / LargeListView should be handled consistently with List / LargeList, and I’ve updated the matcher for that.

Ah, got it.

Just to help me understand, can you point at an SLT test (e.g., involving structs) that would regress if we used equals_datatype? Or if such an SLT test doesn't already exist, it would probably be a good idea to add one as a sanity check.

Yes, I locally verified that a broader equals_datatype-style matcher regresses existing SLTs.

In particular:

/datafusion/datafusion/sqllogictest/test_files/struct.slt
select [{a: 1, b: 2}, {b: 3, a: 4}];

/datafusion/datafusion/sqllogictest/test_files/spark/array/array.slt
SELECT array(arrow_cast(array(1,2), 'LargeList(Int64)'), array(3));

With the broader matching, both end up failing in array construction (MutableArrayData) because those paths still require exact runtime type identity. That was the main reason I kept this matcher narrower than equals_datatype, especially around Struct.

I agree it would be useful to make that boundary explicit, so I can also add a focused sanity-check regression test in this PR.

Thank you! That makes sense: the key point is that some Arrow kernels depend on struct field ordering, but the "field name" of a list has no influence on the representation of the data. Can we add a brief comment to data_type_matches to explain the rationale for the kinda-structural-equality we are implementing?

It seems like Map has the same behavior as the List variants: the "field name" does not impact the representation of the data. Should we handle that as well, for completeness?

neilconway

What about Struct(a: List(x: Int64)) vs Struct(a: List(y: Int64)) -- should we try to elide the cast in this case as well?

It might be helpful to add an SLT test to verify the end-user behavior that the extraneous casts are indeed omitted.

neilconway · 2026-03-16T16:11:46Z

datafusion/expr/src/type_coercion/functions.rs

    )
 }

+fn data_types_match(valid_types: &[DataType], current_types: &[DataType]) -> bool {


Thank you! That makes sense: the key point is that some Arrow kernels depend on struct field ordering, but the "field name" of a list has no influence on the representation of the data. Can we add a brief comment to data_type_matches to explain the rationale for the kinda-structural-equality we are implementing?

It seems like Map has the same behavior as the List variants: the "field name" does not impact the representation of the data. Should we handle that as well, for completeness?

feichai0017 · 2026-03-17T02:55:16Z

What about Struct(a: List(x: Int64)) vs Struct(a: List(y: Int64)) -- should we try to elide the cast in this case as well?那 Struct（a： List（x： Int64）） 和 Struct（a： List（y： Int64））呢—— 在这种情况下我们也应该尝试省略角色吗？

It might be helpful to add an SLT test to verify the end-user behavior that the extraneous casts are indeed omitted.添加一个 SLT 测试可能会有帮助，以验证终端用户行为是否确实省略了多余的铸造。

Thanks, I updated the matcher to make that boundary explicit.

The fast-path match now:

ignores wrapper field names for list-like types and maps, where the wrapper name does not affect the physical representation
recurses through Struct, while still requiring struct field names/order/nullability to match exactly

That means cases like Struct(a: List(x: Int64)) vs Struct(a: List(y: Int64)) now avoid the extra cast, but broader Struct reordering still does not.

I also added:

analyzer regressions for both Struct(...List(...)) and Map(...)
an EXPLAIN VERBOSE sqllogictest in spark/array/array.slt to verify the user-visible behavior that no extra cast is introduced during type coercion

fix: avoid casts for equivalent nested types

b05d5e9

Copilot AI review requested due to automatic review settings March 14, 2026 14:38

github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Mar 14, 2026

Copilot started reviewing on behalf of feichai0017 March 14, 2026 14:38 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

feichai0017 added 2 commits March 15, 2026 02:15

fix: narrow nested type matching

4af3b3e

style: simplify nested type matching

9bec12c

neilconway reviewed Mar 15, 2026

View reviewed changes

feichai0017 added 2 commits March 16, 2026 12:06

Merge branch 'main' into fix/extraneous-casts-nested-types

dc8bfe6

fix: support list view nested type matching

1c5a339

neilconway reviewed Mar 16, 2026

View reviewed changes

fix: relax nested type matching

538e515

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 17, 2026

Merge branch 'main' into fix/extraneous-casts-nested-types

8b2c60e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid extraneous casts for equivalent nested types#20945

fix: avoid extraneous casts for equivalent nested types#20945
feichai0017 wants to merge 7 commits intoapache:mainfrom
feichai0017:fix/extraneous-casts-nested-types

feichai0017 commented Mar 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

neilconway Mar 15, 2026

Uh oh!

feichai0017 Mar 16, 2026

Uh oh!

neilconway Mar 16, 2026

Uh oh!

feichai0017 Mar 16, 2026

Uh oh!

neilconway Mar 16, 2026

Uh oh!

neilconway left a comment •

edited

Loading

Uh oh!

neilconway Mar 16, 2026

Uh oh!

feichai0017 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

feichai0017 commented Mar 14, 2026

Summary

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

neilconway Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

feichai0017 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

feichai0017 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neilconway Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

feichai0017 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

neilconway left a comment •

edited

Loading