[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288
Open
MaxGekk wants to merge 2 commits into
Open
[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288MaxGekk wants to merge 2 commits into
MaxGekk wants to merge 2 commits into
Conversation
Wire Cast to support CAST(<string> AS TIMESTAMP_NTZ(p)) and
CAST(<string> AS TIMESTAMP_LTZ(p)) for fractional-seconds precision p in
[7, 9], on both the interpreted and codegen paths and across LEGACY, ANSI
and TRY eval modes. Reuses the SPARK-57032 string->nanos parse helpers on
SparkDateTimeUtils, which already return a normalized TimestampNanosVal and
apply per-precision truncation.
- Add StringType -> Timestamp{NTZ,LTZ}NanosType arms to canCast/canAnsiCast.
- Add (StringType, TimestampLTZNanosType) to Cast.needsTimeZone (NTZ string
is zone-independent, mirroring micro TIMESTAMP_NTZ).
- Add interpreted castToTimestamp{LTZ,NTZ}Nanos and matching codegen,
dispatched with the precision taken from the target type. NTZ adopts
allowTimeZone = true to match the micro TIMESTAMP_NTZ string cast.
Tests cover success cases over p in [7, 9], ANSI parse errors, LEGACY/TRY
null on malformed input, and a flag-off FEATURE_NOT_ENABLED guard.
…estamps Add end-to-end golden-file coverage to cast.sql for casting strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p), mirroring the existing timestamp, timestamp_ntz and TIME cast checks: - Positive cases assert the result type via typeof (the reverse direction, nanos -> string rendering, is not wired yet; tracked under SPARK-57162). - Negative cases exercise the parse-error path: ANSI mode throws CAST_INVALID_INPUT, non-ANSI returns NULL. Golden files regenerated with SPARK_GENERATE_GOLDEN_FILES=1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR wires
Castto support castingStringTypeto the nanosecond-capable timestamp typesTimestampNTZNanosType(p)andTimestampLTZNanosType(p)with fractional-seconds precisionpin[7, 9], on both the interpreted and codegen paths and across all eval modes (LEGACY,ANSI,TRY):CAST(<string> AS TIMESTAMP_NTZ(p))CAST(<string> AS TIMESTAMP_LTZ(p))Concretely, in
Cast.scala:StringType -> TimestampNTZNanosType(p)/TimestampLTZNanosType(p)arms tocanCastandcanAnsiCast. Try-cast is covered automatically (canTryCastdelegates tocanAnsiCast, andcanUseLegacyCastForTryCastalready matches(StringType, DatetimeType), which the nanos types extend).(StringType, TimestampLTZNanosType)toCast.needsTimeZone. The NTZ string is zone-independent, mirroring the microTIMESTAMP_NTZcast.castToTimestampLTZNanos/castToTimestampNTZNanosand matching codegen, dispatched fromcastInternal/nullSafeCastFunctionwith the precision taken from the target type. The result is aTimestampNanosVal(ornullin legacy/try mode on malformed input).allowTimeZone = trueto match the existing microTIMESTAMP_NTZstring cast, and resolves theTODO(SPARK-57032)left onstringToTimestampNTZNanosAnsi.This reuses the parse entry points added in SPARK-57032 on
SparkDateTimeUtils(inherited byDateTimeUtils), which already return a normalizedTimestampNanosValand apply per-precision truncation, so no separate normalization module is required for the string path.Existing preview gating is unchanged:
Cast.checkInputDataTypescallsTypeUtils.failUnsupportedDataType, which throwsFEATURE_NOT_ENABLEDwhenspark.sql.timestampNanosTypes.enabledis off.Why are the changes needed?
This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).
The logical types, the
TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)SQL syntax, the physical row valueTimestampNanosVal, and the string-to-nanos parse helpers all exist, butCasthad zero arms for the nanos types. As a resultCAST(s AS TIMESTAMP_NTZ(9))failed type-check withCAST_WITHOUT_SUGGESTIONeven when the preview flagspark.sql.timestampNanosTypes.enabledwas on. String ingestion is the most common entry point for these types and unblocks typed literals, filters, and CTAS once coercion lands.Does this PR introduce any user-facing change?
Yes, but only when the preview flag
spark.sql.timestampNanosTypes.enabledis enabled (it defaults to off in production). With the flag on,CAST(<string> AS TIMESTAMP_NTZ(p))andCAST(<string> AS TIMESTAMP_LTZ(p))forpin[7, 9]now produce correct nanosecond values inLEGACY,ANSI, andTRYmodes; previously they failed type-checking. With the flag off, the behavior is unchanged (FEATURE_NOT_ENABLED). Existing microsecond timestamp string casts are unchanged.How was this patch tested?
CastSuiteBase: success cases for both types overpin[7, 9]and a 7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ zone-independent (including a discarded zone suffix). Plus a flag-off guard assertingFEATURE_NOT_ENABLED.CastWithAnsiOnSuite: malformed-input parse errors (DateTimeException/CAST_INVALID_INPUT).CastWithAnsiOffSuite/TryCastSuite: malformed input returnsNULL.cast.sql(regenerated withSPARK_GENERATE_GOLDEN_FILES=1): positive cases assert the result type viatypeof(the reverse direction, nanos -> string rendering, is not wired yet and is tracked under SPARK-57162); negative cases exercise the ANSI parse-error path (andNULLin non-ANSI mode).Verified locally:
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)