What happened?
A fast source operator (e.g. Text Input with a few rows) stays orange (RUNNING) in the editor after the run has finished and results are shown. The operator never turns green (COMPLETED).
Root cause — physical timestamps used to order causally-ordered state. A worker is the single writer of its own state; its transitions (READY → RUNNING → COMPLETED) have a strict causal order. But the controller reconstructs that state from three unordered channels:
| State |
How it reaches the controller |
source RUNNING |
startWorker response snapshot |
non-source RUNNING |
workerStateUpdated push |
COMPLETED |
queryStatistics response snapshot (no dedicated push) |
WorkerExecution.update resolves conflicts by last System.nanoTime() wins. For a tiny source, the whole run finishes almost instantly, so the startWorker response (carrying the stale RUNNING it sampled at launch) can arrive at the controller after COMPLETED was already recorded. Because its receipt timestamp is later, the stale RUNNING clobbers COMPLETED:
Before: start(RUNNING)──────────────(late)──────────▶ ts=30 ⇒ RUNNING wins ✗
portCompleted/execCompleted ▶ COMPLETED ts=20
After: RUNNING carries version 2 < COMPLETED version 3 ⇒ COMPLETED stays ✓
+ terminal state is absorbing
The result data uses a separate path, so results render correctly while the border is stuck.
Introduced in #3557 (the timestamp-based update).
How to reproduce?
- New workflow with a single fast source operator (Text Input, a few lines).
- Run it. Results appear in the Result panel.
- Operator border stays orange/RUNNING instead of green/COMPLETED.
In the browser WS frames, the last OperatorStatisticsUpdateEvent for the operator carries operatorState: "Running" — i.e. the wrong state is sent by the backend; the frontend renders it faithfully. Intermittent (it is a race), but very likely for tiny sources.
Fix
Order worker state causally, not by wall clock:
- Per-worker logical version:
WorkerStateManager increments a monotonic counter on every transitTo; carried on every state report (WorkerStateResponse, WorkerStateUpdatedRequest, WorkerMetrics). The controller applies a state only if its version is newer. Single source ⇒ no cross-process clock-sync concern.
- Terminal-state absorption: once
COMPLETED/TERMINATED, a worker cannot be moved back by any later report.
Stats keep timestamp ordering (monotonic snapshots within one state).
Version/Branch
1.3.0-incubating-SNAPSHOT (main)
Commit Hash (Optional)
4d05ab2
What browsers are you seeing the problem on?
No response
Relevant log output
No response
What happened?
A fast source operator (e.g. Text Input with a few rows) stays orange (RUNNING) in the editor after the run has finished and results are shown. The operator never turns green (COMPLETED).
Root cause — physical timestamps used to order causally-ordered state. A worker is the single writer of its own state; its transitions (
READY → RUNNING → COMPLETED) have a strict causal order. But the controller reconstructs that state from three unordered channels:RUNNINGstartWorkerresponse snapshotRUNNINGworkerStateUpdatedpushCOMPLETEDqueryStatisticsresponse snapshot (no dedicated push)WorkerExecution.updateresolves conflicts by lastSystem.nanoTime()wins. For a tiny source, the whole run finishes almost instantly, so thestartWorkerresponse (carrying the staleRUNNINGit sampled at launch) can arrive at the controller afterCOMPLETEDwas already recorded. Because its receipt timestamp is later, the staleRUNNINGclobbersCOMPLETED:The result data uses a separate path, so results render correctly while the border is stuck.
Introduced in #3557 (the timestamp-based
update).How to reproduce?
In the browser WS frames, the last
OperatorStatisticsUpdateEventfor the operator carriesoperatorState: "Running"— i.e. the wrong state is sent by the backend; the frontend renders it faithfully. Intermittent (it is a race), but very likely for tiny sources.Fix
Order worker state causally, not by wall clock:
WorkerStateManagerincrements a monotonic counter on everytransitTo; carried on every state report (WorkerStateResponse,WorkerStateUpdatedRequest,WorkerMetrics). The controller applies a state only if its version is newer. Single source ⇒ no cross-process clock-sync concern.COMPLETED/TERMINATED, a worker cannot be moved back by any later report.Stats keep timestamp ordering (monotonic snapshots within one state).
Version/Branch
1.3.0-incubating-SNAPSHOT (main)
Commit Hash (Optional)
4d05ab2
What browsers are you seeing the problem on?
No response
Relevant log output
No response