Skip to content

[SPARK-57285][SQL] Route nanosecond timestamp cast-to-string through the Types Framework#56355

Open
MaxGekk wants to merge 6 commits into
apache:masterfrom
MaxGekk:nanos-typeframework-formatter
Open

[SPARK-57285][SQL] Route nanosecond timestamp cast-to-string through the Types Framework#56355
MaxGekk wants to merge 6 commits into
apache:masterfrom
MaxGekk:nanos-typeframework-formatter

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented Jun 6, 2026

What changes were proposed in this pull request?

This PR makes the Types Framework (TypeApiOps) the single integration point for nanosecond timestamp CAST(... AS STRING), for both the interpreted and codegen paths.

Specifically:

  • TypeApiOps.apply gains a by-name zoneId parameter that defaults to the session-local time zone config (SqlApiConf.get.sessionLocalTimeZone) and is threaded into the TIMESTAMP_LTZ nanos ops. It is by-name so the zone is forced only when the LTZ ops is actually constructed: zone-independent (TimeType, TIMESTAMP_NTZ nanos) and unsupported types never evaluate it, which matters because a CAST's zone is unresolved (None.get) until a time zone is assigned.
  • TimestampLTZNanosTypeApiOps now carries its ZoneId as a required constructor parameter and holds the fraction formatter in a @transient private lazy val, so the formatter is built once per ops instance (once per cast, per task) rather than per row. TimestampNTZNanosTypeApiOps stays zone-independent (UTC).
  • ToStringBase no longer bypasses or special-cases the framework: both the interpreted and codegen paths dispatch uniformly through TypeApiOps(from, zoneId). CAST passes its resolved zone; zone-less callers (EXPLAIN, SQL-literal toSQLValue, Row.jsonValue) accept the session-zone default.
  • The now-unused UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING error condition and its DataTypeErrors helper are removed.

The microsecond timestamp types (TIMESTAMP / TIMESTAMP_NTZ) remain handled inline in ToStringBase and are out of scope.

Why are the changes needed?

SPARK-57256 implemented nanosecond cast-to-string inline in ToStringBase, deliberately bypassing the framework because the zone-less TypeApiOps.format(v) cannot render LTZ in the session time zone. That left nanos cast-to-string as a one-off integration outside the framework, inconsistent with the SPIP direction (SPARK-56822) of wiring the new types through the centralized TypeOps / TypeApiOps. This PR closes that gap.

Does this PR introduce any user-facing change?

Yes. Previously, rendering a TIMESTAMP_LTZ nanosecond value to string without an explicit time zone (EXPLAIN of a literal, SQL-literal toSQLValue, Row.jsonValue) raised UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING. Now these callers render the value in the session-local time zone (spark.sql.session.timeZone), consistent with how CAST(... AS STRING) already rendered LTZ nanos. The UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING error condition is removed. The CAST(... AS STRING) output itself is unchanged.

How was this patch tested?

  • Updated TimestampNanosTypeOpsSuite to cover the new behavior: NTZ renders zone-independently, LTZ renders in the zone carried by the ops instance, and zone-less LTZ now renders in the session-local time zone (instead of raising), exercising precision flooring.
  • Existing CastWithAnsiOnSuite, CastWithAnsiOffSuite, ToPrettyStringSuite, and TimestampNanosRowSuite stay green (275 tests pass), and SparkThrowableSuite (33 tests) confirms the removed error condition leaves the error framework consistent.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

…the Types Framework

### What changes were proposed in this pull request?

This PR makes the Types Framework (`TypeApiOps`) the single integration point for
nanosecond timestamp `CAST(... AS STRING)`, for both the interpreted and codegen paths,
with no change to the rendered string.

Specifically:

- `TypeApiOps` gains a zone-aware formatting hook: `format(v, zoneId)` and
  `formatUTF8(v, zoneId)`, both defaulting to the existing zone-less `format(v)` so
  zone-independent framework types (e.g. `TimeType`) are unaffected.
- `TimestampNTZNanosTypeApiOps` / `TimestampLTZNanosTypeApiOps` implement the hook:
  NTZ is zone-independent (UTC `formatWithoutTimeZoneNanos`); LTZ renders in the session
  zone via `formatNanos`. The zone-less callers (EXPLAIN, SQL-literal `toSQLValue`) now
  format NTZ directly, while LTZ without a session zone keeps raising
  `UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING`.
- `ToStringBase` no longer bypasses the framework: the interpreted path threads the
  session `zoneId` into `formatUTF8(v, zoneId)` for the nanos types, and the codegen path
  emits a runtime call into the ops reference object instead of inlining
  `formatNanos` / `formatWithoutTimeZoneNanos`.

### Why are the changes needed?

SPARK-57256 implemented nanosecond cast-to-string inline in `ToStringBase`, deliberately
bypassing the framework because the zone-less `TypeApiOps.format(v)` cannot render LTZ in
the session time zone. That left nanos cast-to-string as a one-off outside the framework,
inconsistent with the SPIP direction (SPARK-56822) of wiring the new types through the
centralized `TypeOps` / `TypeApiOps`. This PR closes that gap.

### Does this PR introduce _any_ user-facing change?

No. This is a refactor; the rendered string output is unchanged (zone-aware LTZ,
zone-independent NTZ, precision flooring, trailing-zero trimming). NTZ `EXPLAIN` /
SQL-literal rendering now succeeds instead of raising, which was previously unreachable
zone-less behavior for an internal type.

### How was this patch tested?

- Updated `TimestampNanosTypeOpsSuite` to cover the zone-aware hook (NTZ renders
  zone-independently, LTZ renders in the session zone and still raises when zone-less).
- Existing `CastWithAnsiOnSuite`, `CastWithAnsiOffSuite`, `ToPrettyStringSuite`, and
  `TimestampNanosRowSuite` stay green unchanged (275 tests pass).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)
@MaxGekk MaxGekk changed the title [SPARK-57285][SQL] Route nanosecond timestamp cast-to-string through the Types Framework [WIP][SPARK-57285][SQL] Route nanosecond timestamp cast-to-string through the Types Framework Jun 6, 2026
MaxGekk added 2 commits June 6, 2026 18:17
The fraction TimestampFormatter has its own internal cache, so the manual
per-zone formatter caching in TimestampLTZNanosTypeApiOps was redundant.
Build the formatter per call instead.
Drop the dedicated nanosecond-timestamp branch in ToStringBase.castToString:
all framework types now flow through the single zone-aware dispatch
(ops.formatUTF8(v, zoneId)). Zone-independent types (TimeType, TIMESTAMP_NTZ
nanos) ignore the zone; only TIMESTAMP_LTZ nanos renders in it.

The TimeType interpreted-vs-codegen consistency test previously left the cast
unresolved; since cast-to-string now evaluates the session zone, resolve a time
zone in that test like a real cast does, mirroring why the microsecond
TimestampType has no such consistency test.
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 7, 2026

@davidm-db @dejankrak-db @stevomitric Could you review this PR, please.

@MaxGekk MaxGekk changed the title [WIP][SPARK-57285][SQL] Route nanosecond timestamp cast-to-string through the Types Framework [SPARK-57285][SQL] Route nanosecond timestamp cast-to-string through the Types Framework Jun 7, 2026
// LTZ rendering depends on the session time zone. The fraction formatter has its own internal
// cache, so build it per call rather than caching it here.
override def format(v: Any, zoneId: ZoneId): String = {
val formatter = TimestampFormatter.getFractionFormatter(zoneId)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds a fresh formatter per row, while NTZ caches its own. The zoneId is constant across a cast - could we cache it here the same way NTZ does rather than new-ing one each call?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed. TimestampLTZNanosTypeApiOps now takes the ZoneId as a constructor parameter and holds the fraction formatter in a @transient private lazy val, exactly like NTZ, so it is built once per ops instance (once per cast, per task) rather than per row. The zone is threaded in centrally via TypeApiOps.apply(dt, zoneId) (by-name, defaulting to the session-local time zone), and CAST passes its resolved zone. Done in cb0a03d.

MaxGekk added 3 commits June 7, 2026 21:40
TimestampLTZNanosTypeApiOps now takes a ZoneId constructor param (defaulting
to the session-local time zone config) and builds its fraction formatter once
per instance via a lazy val, mirroring NTZ, instead of constructing a fresh
formatter per row. ToStringBase constructs the LTZ ops directly with the cast's
resolved zone (interpreted and codegen), so the per-row formatter allocation is
gone on both paths.

With LTZ rendering driven by the carried zone, the zone-aware
format(v, zoneId)/formatUTF8(v, zoneId) hook on TypeApiOps is no longer needed
and is removed, simplifying the trait and the codegen (no ZoneId reference
object). The zone-less framework lookup (EXPLAIN, SQL-literal toSQLValue,
Row JSON) now renders LTZ in the session zone rather than raising, so the
UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING error condition and its helper are
removed.
Keep a single unified cast-to-string dispatch: ToStringBase calls
TypeApiOps(from, zoneId) for both the interpreted and codegen paths instead of
special-casing TIMESTAMP_LTZ nanos to construct the ops directly. TypeApiOps.apply
gains a by-name zoneId parameter (defaulting to the session-local time zone config)
that it threads into the LTZ ops. By-name is required so the zone is forced only
when the LTZ ops is constructed: zone-independent and unsupported types never
evaluate it, which matters because a CAST's zone is unresolved (None.get) until a
time zone is assigned.

TimestampLTZNanosTypeApiOps's zoneId is now a required constructor param (no
default); the server-side catalyst TimestampLTZNanosTypeOps, which never renders
(cast-to-string flows through TypeApiOps.apply), passes UTC rather than reading the
session config on every construction.
genjavadoc turns the scaladoc `[[TypeApiOps.apply]]` member reference into
`{@link TypeApiOps.apply}`, which javadoc rejects ("reference not found") because
member references need `#`, not `.`. Use monospaced plain text instead of a link.
Copy link
Copy Markdown
Contributor

@stevomitric stevomitric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @MaxGekk for resolving comments.

@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 8, 2026

@uros-b @cloud-fan Could you review this PR, please.

// Route nanosecond timestamp cast-to-string through the Types Framework: emit a runtime
// call into the ops reference object. The cast's session zone is threaded into the lookup
// so LTZ carries it; NTZ is zone-independent (SPARK-57285).
val ops = TypeApiOps(from, zoneId).get
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new codegen path and the interpreted path falls through to castToStringDefault, whose nanos cases were deleted, so it now lands on the generic terminal case: case _ => o => UTF8String.fromString(o.toString). TypeApiOps(...) returns None whenever typesFrameworkEnabled == false. The intended invariant ("nanos types imply the framework is on") is only enforced at set-time, and only on one flag:

      .checkValue(
        enabled => !enabled || SQLConf.get.typesFrameworkEnabled,
        "REQUIREMENT",
        _ => Map("confRequirement" ->
          (s"'${TYPES_FRAMEWORK_ENABLED.key}' must be true to enable the nanosecond " +
            "timestamp types.")))

TYPES_FRAMEWORK_ENABLED has no symmetric guard, so a session can set timestampNanosTypes.enabled=true, materialize nanos values, then set types.framework.enabled=false. In that (admittedly unusual, internal-flag) state, casting a nanos value to string would:

  • interpreted: silently produce TimestampNanosVal.toString (wrong output) instead of a formatted timestamp;
  • codegen: throw NoSuchElementException from .get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants