Skip to content

[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288

Open
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:nanos-cast-string
Open

[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:nanos-cast-string

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented Jun 2, 2026

What changes were proposed in this pull request?

This PR wires Cast to support casting StringType to the nanosecond-capable timestamp types TimestampNTZNanosType(p) and TimestampLTZNanosType(p) with fractional-seconds precision p in [7, 9], on both the interpreted and codegen paths and across all eval modes (LEGACY, ANSI, TRY):

  • CAST(<string> AS TIMESTAMP_NTZ(p))
  • CAST(<string> AS TIMESTAMP_LTZ(p))

Concretely, in Cast.scala:

  • Add StringType -> TimestampNTZNanosType(p) / TimestampLTZNanosType(p) arms to canCast and canAnsiCast. Try-cast is covered automatically (canTryCast delegates to canAnsiCast, and canUseLegacyCastForTryCast already matches (StringType, DatetimeType), which the nanos types extend).
  • Add (StringType, TimestampLTZNanosType) to Cast.needsTimeZone. The NTZ string is zone-independent, mirroring the micro TIMESTAMP_NTZ cast.
  • Add interpreted castToTimestampLTZNanos / castToTimestampNTZNanos and matching codegen, dispatched from castInternal / nullSafeCastFunction with the precision taken from the target type. The result is a TimestampNanosVal (or null in legacy/try mode on malformed input).
  • The NTZ cast adopts allowTimeZone = true to match the existing micro TIMESTAMP_NTZ string cast, and resolves the TODO(SPARK-57032) left on stringToTimestampNTZNanosAnsi.

This reuses the parse entry points added in SPARK-57032 on SparkDateTimeUtils (inherited by DateTimeUtils), which already return a normalized TimestampNanosVal and apply per-precision truncation, so no separate normalization module is required for the string path.

Existing preview gating is unchanged: Cast.checkInputDataTypes calls TypeUtils.failUnsupportedDataType, which throws FEATURE_NOT_ENABLED when spark.sql.timestampNanosTypes.enabled is off.

Why are the changes needed?

This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).

The logical types, the TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) SQL syntax, the physical row value TimestampNanosVal, and the string-to-nanos parse helpers all exist, but Cast had zero arms for the nanos types. As a result CAST(s AS TIMESTAMP_NTZ(9)) failed type-check with CAST_WITHOUT_SUGGESTION even when the preview flag spark.sql.timestampNanosTypes.enabled was on. String ingestion is the most common entry point for these types and unblocks typed literals, filters, and CTAS once coercion lands.

Does this PR introduce any user-facing change?

Yes, but only when the preview flag spark.sql.timestampNanosTypes.enabled is enabled (it defaults to off in production). With the flag on, CAST(<string> AS TIMESTAMP_NTZ(p)) and CAST(<string> AS TIMESTAMP_LTZ(p)) for p in [7, 9] now produce correct nanosecond values in LEGACY, ANSI, and TRY modes; previously they failed type-checking. With the flag off, the behavior is unchanged (FEATURE_NOT_ENABLED). Existing microsecond timestamp string casts are unchanged.

How was this patch tested?

  • CastSuiteBase: success cases for both types over p in [7, 9] and a 7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ zone-independent (including a discarded zone suffix). Plus a flag-off guard asserting FEATURE_NOT_ENABLED.
  • CastWithAnsiOnSuite: malformed-input parse errors (DateTimeException / CAST_INVALID_INPUT).
  • CastWithAnsiOffSuite / TryCastSuite: malformed input returns NULL.
  • Golden-file checks added to cast.sql (regenerated with SPARK_GENERATE_GOLDEN_FILES=1): positive cases assert the result type via typeof (the reverse direction, nanos -> string rendering, is not wired yet and is tracked under SPARK-57162); negative cases exercise the ANSI parse-error path (and NULL in non-ANSI mode).

Verified locally:

$ build/sbt 'catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *TryCastSuite'
$ build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql'
$ ./dev/scalastyle

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

MaxGekk added 2 commits June 2, 2026 21:59
Wire Cast to support CAST(<string> AS TIMESTAMP_NTZ(p)) and
CAST(<string> AS TIMESTAMP_LTZ(p)) for fractional-seconds precision p in
[7, 9], on both the interpreted and codegen paths and across LEGACY, ANSI
and TRY eval modes. Reuses the SPARK-57032 string->nanos parse helpers on
SparkDateTimeUtils, which already return a normalized TimestampNanosVal and
apply per-precision truncation.

- Add StringType -> Timestamp{NTZ,LTZ}NanosType arms to canCast/canAnsiCast.
- Add (StringType, TimestampLTZNanosType) to Cast.needsTimeZone (NTZ string
  is zone-independent, mirroring micro TIMESTAMP_NTZ).
- Add interpreted castToTimestamp{LTZ,NTZ}Nanos and matching codegen,
  dispatched with the precision taken from the target type. NTZ adopts
  allowTimeZone = true to match the micro TIMESTAMP_NTZ string cast.

Tests cover success cases over p in [7, 9], ANSI parse errors, LEGACY/TRY
null on malformed input, and a flag-off FEATURE_NOT_ENABLED guard.
…estamps

Add end-to-end golden-file coverage to cast.sql for casting strings to
TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p), mirroring the existing timestamp,
timestamp_ntz and TIME cast checks:

- Positive cases assert the result type via typeof (the reverse direction,
  nanos -> string rendering, is not wired yet; tracked under SPARK-57162).
- Negative cases exercise the parse-error path: ANSI mode throws
  CAST_INVALID_INPUT, non-ANSI returns NULL.

Golden files regenerated with SPARK_GENERATE_GOLDEN_FILES=1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant