feat: support string to numeric coercion for arithmetic operators#23050
Open
dangdat1111 wants to merge 2 commits into
Open
feat: support string to numeric coercion for arithmetic operators#23050dangdat1111 wants to merge 2 commits into
dangdat1111 wants to merge 2 commits into
Conversation
## Which issue does this PR close? - Closes apache#23041. ## Rationale for this change Comparison operators already coerce a string operand to the other operand's numeric type (via `string_numeric_coercion`), so `col < '5'` works numerically. Arithmetic operators did not, so `1 + '1'` failed with "Cannot coerce arithmetic expression Int64 + Utf8 to valid types". This aligns arithmetic with comparison and with the documented non-ANSI mode behavior that allows implicit string-to-numeric casts. ## What changes are included in this PR? - In `BinaryTypeCoercer::signature_inner`, add a `string_numeric_coercion` fallback to the `Plus | Minus | Multiply | Divide | Modulo` branch. The string operand is coerced to the numeric type of the other operand (e.g. `Int64 + Utf8` -> both `Int64`, `Utf8 + Float64` -> both `Float64`). - `string + string` remains unsupported as the target type is ambiguous, and temporal/string pairs (e.g. `Timestamp + Utf8`) are unaffected since those types are not numeric. ## Are these changes tested? Yes. - Unit tests in `binary/tests/arithmetic.rs`: new `test_type_coercion_arithmetic_string_numeric` and an updated `test_coercion_error` (now uses `Boolean + Boolean`, since `Float32 + Utf8` is valid). - sqllogictests in `string_numeric_coercion.slt`: arithmetic with string literals/columns, result types, EXPLAIN, and runtime/plan-time errors. ## Are there any user-facing changes? Yes. Arithmetic expressions mixing a numeric and a string operand now plan and execute (the string is cast to the numeric type) instead of failing during planning. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Comparison operators already coerce a string operand to the other operand's numeric type (via
string_numeric_coercion), socol < '5'works numerically. Arithmetic operators did not, so1 + '1'failed during planning with:This change makes arithmetic consistent with comparison, and aligns the engine with its own documented non-ANSI mode behavior, which states that implicit casts between types are allowed (e.g. string to integer when possible).
What changes are included in this PR?
BinaryTypeCoercer::signature_inner, add astring_numeric_coercionfallback to thePlus | Minus | Multiply | Divide | Modulobranch. The string operand is coerced to the numeric type of the other operand (e.g.Int64 + Utf8-> bothInt64,Utf8 + Float64-> bothFloat64). The type coercion analyzer then inserts the cast on the string operand.string + string(e.g.'1' + '2') still errors because the target type is ambiguous (matches PostgreSQL).Timestamp + Utf8,Interval + Utf8) are unaffected, since those types are not numeric and sostring_numeric_coerciondoes not apply.Are these changes tested?
Yes.
datafusion/expr-common/src/type_coercion/binary/tests/arithmetic.rs: newtest_type_coercion_arithmetic_string_numeric(all 5 operators acrossUtf8/LargeUtf8/Utf8View, both operand orders, result-type checks, and theUtf8 + Utf8error case).test_coercion_errorwas updated to useBoolean + Boolean, sinceFloat32 + Utf8is now valid.datafusion/sqllogictest/test_files/string_numeric_coercion.slt: a new arithmetic section covering1 + '1','1' + 1, all 5 operators over an integer and a float column,arrow_typeofof the results, anEXPLAINshowing the string literal is cast to the numeric type, a runtime cast error for a non-numeric string, and the plan-time error for'1' + '2'.Are there any user-facing changes?
Yes. Arithmetic expressions that mix a numeric and a string operand now plan and execute (the string is cast to the numeric type) instead of failing during planning. There are no breaking changes to public APIs.
🤖 Generated with Claude Code