[FLINK-36953][table] Early fire support for Flink SQL interval join#28353
Open
weiqingy wants to merge 6 commits into
Open
[FLINK-36953][table] Early fire support for Flink SQL interval join#28353weiqingy wants to merge 6 commits into
weiqingy wants to merge 6 commits into
Conversation
…mbing
Introduce the EARLY_FIRE('delay'=..., 'time_mode'=...) SQL join hint for
interval joins and thread it through the planner as inert metadata.
EarlyFireJoinHintOptions defines the typed option keys: a required 'delay'
duration and an optional 'time_mode' (rowtime|proctime). The hint is
registered in FlinkHintStrategies with a key-value option checker that
requires a positive delay, validates time_mode, and rejects unknown option
keys. JoinStrategy, CapitalizeQueryHintsShuttle, and QueryHintsResolver are
extended so the hint propagates intact to StreamPhysicalIntervalJoinRule,
which resolves the default time_mode from the window bounds and rejects
time_mode=rowtime on a proctime join (and, for now, time_mode=proctime on an
event-time join).
The resolved delay and time_mode are carried on StreamPhysicalIntervalJoin
and serialized as two additive @JsonInclude(NON_NULL) fields on
StreamExecIntervalJoin, leaving the ExecNode metadata version unchanged so
existing compiled plans restore as before. No runtime behavior changes yet:
the operator ignores the new fields.
…fire interval join With the EARLY_FIRE hint, an outer interval join speculatively emits a padded unmatched row after the delay and corrects it when a match later arrives, so it no longer produces insert-only changes. Teach FlinkChangelogModeInferenceProgram to reflect this. Split StreamPhysicalIntervalJoin into its own ModifyKindSet arm: its children still consume insert-only, but the node provides INSERT and, when the hint makes it update-producing, UPDATE. A new produceEarlyFireUpdates accessor gates that on the hint being set, the join being outer, and a non-negative window span, so the hint stays inert for inner joins and negative-window joins (which only ever emit inserts). The interval join keeps its place in the UpdateKind and DeleteKind arms. When such a join feeds an insert-only downstream, planning fails with a tailored error that names the hint, rather than the generic "doesn't support consuming update changes" message. Runtime behavior is unchanged; the operator still ignores the hint.
…val join operator Wire the EARLY_FIRE delay into the interval join operator so an outer join speculatively emits its padded unmatched row after the delay and corrects it when a real match arrives. Covers the natural timer pairings: a row-time join fires on event time, a processing-time join fires on processing time. Processing-time triggering on a row-time join stays rejected at planning. When an unmatched outer row is cached, the operator registers an early-fire timer at rowTime + delay. On that timer it emits the padded row as an INSERT and records that it fired. When the row later matches, it retracts the padded row as UPDATE_BEFORE and emits the matched row as UPDATE_AFTER, matching the update-producing changelog mode inferred for the node. The retraction is tied to the one-time matched-and-emitted flip, so a row that matches several times emits a single correction followed by ordinary inserts. The already-fired marker is a new per-side MapState<Long, List<Boolean>> kept positionally aligned with the existing row cache, rather than widening the cache tuple, so the cache serializer is unchanged and old savepoints restore the new state empty. The marker is the single gate that keeps a row padded exactly once when the delay is at or beyond the window span. All early-fire work is gated on the hint being set, an outer join, and a non-negative window, so a plain interval join is unchanged and allocates nothing new. EmitAwareCollector carries the changelog stamping so IntervalJoinFunction stays changelog-agnostic, and every padded or matched emit stamps its RowKind explicitly to avoid leaking a kind onto a reused row.
… interval join
Add the cross-domain timer combination the previous commit left out: an
event-time interval join with EARLY_FIRE('time_mode'='proctime') now fires its
speculative pads on the wall clock while keeping its event-time cleanup. The
temporary "not yet supported" rejection in the planner rule is removed; the
row-time-on-processing-time rejection is retained.
onTimer distinguishes the two timer kinds by OnTimerContext.timeDomain(): in
the cross-domain case early-fire timers are processing-time and cleanup timers
are event-time, so a processing-time firing runs early fire and returns while
an event-time firing runs cleanup only. The discrimination is gated on a new
cross-domain flag, so the natural pairings keep the previous timestamp - delay
recovery where early fire and cleanup share a domain.
A processing-time firing timestamp cannot be mapped back to an event-time cache
bucket arithmetically, so a per-side MapState<Long, List<Long>> keyed by firing
processing-time records the event-time bucket keys due to fire then. It is
allocated only in the cross-domain case and reuses the existing per-bucket emit
and positional fired bit, so the retract-and-correct path is shared. Every
scheduled firing time fires and removes its own entry, and a bucket already
cleaned by event-time expiry makes the firing a no-op, so nothing accumulates.
The schedule is value-typed and order-preserving and processing-time timers are
checkpointed, so a timer pending at snapshot fires after restore against the
restored schedule and fired bits and emits at most the not-yet-emitted pad.
Harness tests cover the wall-clock trigger without watermark advance, a snapshot
before the timer fires, and a snapshot after the pad is emitted.
…-fire interval join
Round out the feature with end-to-end restore coverage and user docs; no
operator or planner changes.
Add an INTERVAL_JOIN_EARLY_FIRE table test program: an event-time LEFT OUTER
interval join carrying EARLY_FIRE('delay'='2s') against a changelog sink. An
unmatched left row is null-padded (+I) before the savepoint and corrected
(-U then +U) when its matching right row arrives after restore. That correction
is only producible if the early-fired bit MapState round-trips through the
savepoint, so the program exercises state survival rather than a single fire in
isolation. The generated plan carries the additive earlyFireDelay and
earlyFireTimeMode fields; the three existing fixtures, which omit them, restore
unchanged and are left untouched.
Document the EARLY_FIRE hint in the SQL joins reference (English and Chinese):
its purpose, the delay and time_mode options with their defaults and error
cases, the single-fire and outer-only semantics, and a note distinguishing it
from the unrelated table.exec.emit.early-fire.* window-aggregation config.
Author
|
Hi @wuchong @xuyangzhong @xccui Could you please help review this PR? Thanks! |
…overage and config docs The EARLY_FIRE join hint only accepts key-value options, so it must be treated like the LOOKUP hint in two places that assume list-style hints: - JoinHintTestBase#testMultiJoinHints builds every join hint with list-option syntax; EARLY_FIRE has to be filtered out alongside LOOKUP. - The config-docs completeness check scans *Options classes under org.apache.flink.table.api.config; EarlyFireJoinHintOptions is documented in the SQL join-hints page, not the generated config tables, so it joins LookupJoinHintOptions in the exclusion set.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
This pull request implements FLIP-497: Early Fire Support for Flink SQL Interval Join.
Today an outer interval join only emits an unmatched row, null-padded, once that row's time window has fully closed. For long windows this delays the null-padded result by the full window span even when a match will never arrive. FLIP-497 adds an
EARLY_FIRESQL join hint that lets an outer interval join emit the null-padded row speculatively after a configurabledelay, then retract and correct it if a real match arrives later within the window. This turns the append-only interval-join result into an updating one, so the speculative latency is paid back as a correction rather than a wrong final answer.The hint is opt-in and scoped: it affects only outer joins (
LEFT/RIGHT/FULL) with a non-negative window span. Inner joins and negative-window joins remain append-only and ignore the hint. Each unmatched outer row fires at most once — the hint is single-fire, not periodic.Example:
When an order has no shipment yet, the join emits
+I[id, NULL]once the delay elapses. If a matching shipment later arrives inside the window, the speculative row is corrected with-U[id, NULL]followed by+U[id, ship_time].Brief change log
The PR is organized as five sequential commits:
EarlyFireJoinHintOptions(requireddelayduration, optionaltime_modeofrowtime/proctime), register theEARLY_FIREhint inFlinkHintStrategieswith a key-value option checker, and thread it throughJoinStrategy,CapitalizeQueryHintsShuttle,QueryHintsResolver, the physical rule/node, andStreamExecIntervalJoin. The resolved delay and time mode are serialized as two additive@JsonInclude(NON_NULL)fields; the ExecNode metadata version is unchanged, so existing compiled plans restore as before. The operator ignores the fields at this stage.StreamPhysicalIntervalJoininto its ownModifyKindSetarm so it advertisesUPDATEwhen the hint makes it update-producing (gated on hint set + outer join + non-negative window). An early-fire join feeding an insert-only downstream now fails planning with a tailored error that names the hint.rowTime + delayfor each cached unmatched outer row; on the timer, emit the padded row as+Iand record the fire in a new per-sideMapState<Long, List<Boolean>>kept positionally aligned with the existing row cache (so the cache serializer is unchanged and old savepoints restore the new state empty). On a later match, retract with-Uand emit the match as+U. Covers the natural pairings: row-time join fires on event time, processing-time join fires on processing time.EARLY_FIRE('time_mode'='proctime')on an event-time interval join — speculative pads fire on the wall clock while event-time cleanup is retained.onTimerdiscriminates the two timer kinds viaOnTimerContext.timeDomain(), and a per-sideMapState<Long, List<Long>>maps each firing processing-time to the event-time bucket keys due then (allocated only in the cross-domain case). The planner's temporary "not yet supported" rejection is removed;time_mode=rowtimeon a processing-time join stays rejected.INTERVAL_JOIN_EARLY_FIRErestore test program (event-timeLEFT OUTERjoin against a changelog sink whose correction is only producible if the fired-bit state survives the savepoint), plus EN/ZH docs for the hint in the SQL joins reference.Verifying this change
This change added tests and can be verified as follows:
RowTimeIntervalJoinTestandProcTimeIntervalJoinTestcover speculative emit, the-U/+Ucorrection on a later match, single-fire when the delay is at or beyond the window span, the cross-domain wall-clock trigger without watermark advance, and snapshot/restore both before and after the pad is emitted.IntervalJoinTest(Scala +.xml) cover hint parsing/validation, the resolved plan fields, the defaulttime_moderesolution, and thetime_modeerror cases.IntervalJoinSpecJsonSerdeTestconfirms the additive JSON fields round-trip and that plans without them restore unchanged.IntervalJoinRestoreTestruns the newINTERVAL_JOIN_EARLY_FIREprogram end-to-end against a generated plan + savepoint fixture, exercising fired-bit state survival across restore.Does this pull request potentially affect one of the following parts:
@Public(Evolving): yes —EarlyFireJoinHintOptionsis@PublicEvolving(the new user-facing hint surface from FLIP-497).MapStaterather than widening the existing row-cache tuple, so the cache serializer is unchanged and old savepoints restore the new state empty. The two new ExecNode JSON fields are additive and@JsonInclude(NON_NULL), leaving the metadata version unchanged.Documentation
Was generative AI tooling used to co-author this PR?