fix(amber): order worker state by version, not timestamp#6011
fix(amber): order worker state by version, not timestamp#6011Yicong-Huang wants to merge 4 commits into
Conversation
Worker state was reconciled at the controller by last-nanoTime-wins. A fast source's startWorker response carries a stale RUNNING snapshot that can arrive after COMPLETED was recorded, clobbering it and leaving the operator stuck orange. Order state causally instead: WorkerStateManager now stamps every transition with a monotonic per-worker version, carried on all state reports (WorkerStateResponse, WorkerStateUpdatedRequest, WorkerMetrics). The controller applies a state only if its version is newer, and treats COMPLETED/TERMINATED as absorbing. Stats keep timestamp ordering. Closes apache#6010
Automated Reviewer SuggestionsBased on the
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6011 +/- ##
============================================
- Coverage 56.74% 56.58% -0.17%
+ Complexity 3024 3011 -13
============================================
Files 1124 1120 -4
Lines 43593 43179 -414
Branches 4712 4657 -55
============================================
- Hits 24739 24432 -307
+ Misses 17386 17307 -79
+ Partials 1468 1440 -28
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
| config | throughput | MB/s | latency | max Δ latest / 7d | |
|---|---|---|---|---|---|
| 🔴 | bs=10 sw=10 sl=64 | 380 | 0.232 | 25,398/37,589/37,589 us | 🔴 +8.7% / 🔴 +149.4% |
| ⚪ | bs=100 sw=10 sl=64 | 802 | 0.49 | 123,400/139,584/139,584 us | ⚪ within ±5% / 🔴 +28.2% |
| ⚪ | bs=1000 sw=10 sl=64 | 923 | 0.563 | 1,084,067/1,116,763/1,116,763 us | ⚪ within ±5% / 🔴 +8.2% |
Baseline details
Latest main d10e1a2 from same runner
| config | metric | PR | latest main | 7d avg | Δ latest | Δ 7d |
|---|---|---|---|---|---|---|
| bs=10 sw=10 sl=64 | throughput | 380 tuples/sec | 388 tuples/sec | 770.82 tuples/sec | -2.1% | -50.7% |
| bs=10 sw=10 sl=64 | MB/s | 0.232 MB/s | 0.237 MB/s | 0.47 MB/s | -2.1% | -50.7% |
| bs=10 sw=10 sl=64 | p50 | 25,398 us | 24,455 us | 12,723 us | +3.9% | +99.6% |
| bs=10 sw=10 sl=64 | p95 | 37,589 us | 34,568 us | 15,070 us | +8.7% | +149.4% |
| bs=10 sw=10 sl=64 | p99 | 37,589 us | 34,568 us | 18,429 us | +8.7% | +104.0% |
| bs=100 sw=10 sl=64 | throughput | 802 tuples/sec | 808 tuples/sec | 973.75 tuples/sec | -0.7% | -17.6% |
| bs=100 sw=10 sl=64 | MB/s | 0.49 MB/s | 0.493 MB/s | 0.594 MB/s | -0.6% | -17.6% |
| bs=100 sw=10 sl=64 | p50 | 123,400 us | 122,834 us | 102,519 us | +0.5% | +20.4% |
| bs=100 sw=10 sl=64 | p95 | 139,584 us | 139,118 us | 108,855 us | +0.3% | +28.2% |
| bs=100 sw=10 sl=64 | p99 | 139,584 us | 139,118 us | 117,788 us | +0.3% | +18.5% |
| bs=1000 sw=10 sl=64 | throughput | 923 tuples/sec | 900 tuples/sec | 1,004 tuples/sec | +2.6% | -8.1% |
| bs=1000 sw=10 sl=64 | MB/s | 0.563 MB/s | 0.549 MB/s | 0.613 MB/s | +2.6% | -8.1% |
| bs=1000 sw=10 sl=64 | p50 | 1,084,067 us | 1,109,505 us | 1,001,930 us | -2.3% | +8.2% |
| bs=1000 sw=10 sl=64 | p95 | 1,116,763 us | 1,156,428 us | 1,042,923 us | -3.4% | +7.1% |
| bs=1000 sw=10 sl=64 | p99 | 1,116,763 us | 1,156,428 us | 1,074,893 us | -3.4% | +3.9% |
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,526.87,200,128000,380,0.232,25398.31,37589.38,37589.38
1,100,10,64,20,2492.67,2000,1280000,802,0.490,123400.40,139584.10,139584.10
2,1000,10,64,20,21671.42,20000,12800000,923,0.563,1084067.10,1116762.87,1116762.87The pyamber worker reports its state to the controller via start/pause/ resume responses and query-statistics metrics. Without a version these defaulted to 0, so after the first report the controller's version gate dropped every later state change — breaking reconfiguration, which waits to observe RUNNING -> PAUSED -> RUNNING (ReconfigurationIntegrationSpec timed out). Mirror the Scala change in pyamber: StateManager bumps a monotonic state version on each transition, and the four state-reporting handlers include it.
The pause/resume/query-statistics assertions compared full messages against literals that defaulted state_version to 0; read the actual version from the report (as the stats sizes already are) so the equality holds while StateManager tests cover the version itself.
There was a problem hiding this comment.
Pull request overview
Fixes a race in Amber’s controller-side worker-state reconstruction where late-arriving RUNNING snapshots (e.g., from startWorker) could overwrite a newer terminal state (COMPLETED), causing operators to remain visually “RUNNING” after completion. The PR replaces receipt-time ordering with causal ordering via a per-worker logical stateVersion reported by the worker state machine, and makes terminal states absorbing.
Changes:
- Introduce a monotonic per-worker
stateVersion(Scala + Python) and propagate it through RPC/proto messages that report state. - Update controller-side reconciliation to apply worker state only when
stateVersionis strictly newer; keep worker statistics ordered by timestamp independently. - Add/adjust Scala and Python tests to cover version ordering, stale/equal-version rejection, terminal absorption, and regression for #6010.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| amber/src/main/scala/org/apache/texera/amber/engine/common/statetransition/StateManager.scala | Adds a monotonic stateVersion logical clock bumped on successful transitions. |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/deploysemantics/layer/WorkerExecution.scala | Replaces timestamp-based state updates with updateState(stateVersion, state); adds terminal-state absorption and separate updateStats. |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/scheduling/RegionExecutionCoordinator.scala | Applies startWorker state using stateVersion; switches termination path to forceTerminate(). |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/controller/promisehandlers/WorkerStateUpdatedHandler.scala | Orders pushed state updates by stateVersion instead of controller receipt time. |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/controller/promisehandlers/ResumeHandler.scala | Applies resume responses using stateVersion. |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/controller/promisehandlers/PauseHandler.scala | Applies pause responses using stateVersion; updates stats via updateStats. |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/controller/promisehandlers/QueryWorkerStatisticsHandler.scala | Splits metrics application into updateState(stateVersion, ...) and updateStats(timestamp, ...). |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/worker/promisehandlers/StartHandler.scala | Includes stateVersion in WorkerStateResponse for start responses. |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/worker/promisehandlers/ResumeHandler.scala | Includes stateVersion in WorkerStateResponse for resume responses. |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/worker/promisehandlers/PauseHandler.scala | Includes stateVersion in WorkerStateResponse for pause responses. |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/worker/promisehandlers/QueryStatisticsHandler.scala | Includes stateVersion in WorkerMetrics returned by queryStatistics. |
| amber/src/main/scala/org/apache/texera/amber/engine/architecture/worker/DataProcessor.scala | Includes stateVersion in WorkerStateUpdatedRequest push events. |
| amber/src/main/protobuf/org/apache/texera/amber/engine/architecture/worker/statistics.proto | Adds state_version to WorkerMetrics for causal state ordering. |
| amber/src/main/protobuf/org/apache/texera/amber/engine/architecture/rpc/controlreturns.proto | Adds state_version to WorkerStateResponse. |
| amber/src/main/protobuf/org/apache/texera/amber/engine/architecture/rpc/controlcommands.proto | Adds state_version to WorkerStateUpdatedRequest. |
| amber/src/main/python/core/architecture/managers/state_manager.py | Adds a monotonic _state_version and get_state_version(); bumps on transitions. |
| amber/src/main/python/core/architecture/handlers/control/start_worker_handler.py | Returns WorkerStateResponse with state_version. |
| amber/src/main/python/core/architecture/handlers/control/pause_worker_handler.py | Returns WorkerStateResponse with state_version. |
| amber/src/main/python/core/architecture/handlers/control/resume_worker_handler.py | Returns WorkerStateResponse with state_version. |
| amber/src/main/python/core/architecture/handlers/control/query_statistics_handler.py | Includes state_version in WorkerMetrics. |
| amber/src/test/scala/org/apache/texera/amber/engine/common/statetransition/WorkerStateManagerSpec.scala | Adds unit tests for initial version, bumping rules, and no-op/rejected transitions. |
| amber/src/test/scala/org/apache/texera/amber/engine/architecture/scheduling/RegionCoordinatorTestSupport.scala | Updates test stub StartWorker response to include stateVersion. |
| amber/src/test/scala/org/apache/texera/amber/engine/architecture/deploysemantics/layer/WorkerExecutionSpec.scala | Adds coverage for version-based ordering, terminal absorption, and #6010 regression. |
| amber/src/test/scala/org/apache/texera/amber/engine/architecture/controller/execution/OperatorExecutionSpec.scala | Updates test helper to use updateState + updateStats with a monotonic surrogate. |
| amber/src/test/python/core/architecture/managers/test_state_manager.py | Adds Python unit tests mirroring Scala version semantics. |
| amber/src/test/python/core/runnables/test_main_loop.py | Updates expectations to carry through the reported state_version instead of hardcoding counts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @Yicong-Huang, it seems this PR cannot be directly backported. Do we still need to backport this change? I noticed that OperatorExecutionSpec.scala does not exist on the release branch. |
|
I think it will be good if we can include the fix in release v1.2, I will do a separate pr to fix v1.2. But I want to first this PR reviewed. cc @Xiao-zhen-Liu can you help take a look as well? |
What changes were proposed in this PR?
A fast source operator stayed orange (RUNNING) in the editor after the run finished (issue #6010).
Why it's a bug. The controller reconstructs each worker's state from several unordered channels (source
RUNNINGvia thestartWorkerresponse, non-sourceRUNNINGviaworkerStateUpdated,COMPLETED/PAUSEDviaqueryStatistics/pauseWorkerresponses) and reconciled them with last-System.nanoTime()-wins inWorkerExecution. Worker state, however, is single-writer and strictly ordered causally. For a tiny source the run finishes almost instantly, so thestartWorkerresponse — carrying the staleRUNNINGit sampled at launch — can reach the controller afterCOMPLETEDwas recorded; its later receipt timestamp wins and clobbersCOMPLETED. Results render fine (separate path), so only the border is stuck.This PR orders worker state causally instead of by wall clock:
WorkerExecution.update(nanoTime, state)— last-write-wins by receipt timeupdateState(version, state)— newest logical version winsWorkerStateManagerbumps a monotonicstateVersionon everytransitTo(its state-machine logical clock).WorkerStateResponse,WorkerStateUpdatedRequest,WorkerMetrics(3 new proto fields).StateManagernow bumps the same monotonic version and the four handlers include it; otherwise version-0 reports would be dropped after the first and reconfiguration (RUNNING→PAUSED→RUNNING) would hang.Any related issues, documentation, discussions?
Closes #6010. The timestamp-based
updatewas introduced in #3557.How was this PR tested?
JDK 17. Scala unit + Scala/Python integration + Python unit:
WorkerExecutionSpec: version ordering (positive + stale/equal-version negatives), terminal-state absorption,forceTerminate, independent stats-vs-state ordering, and a named regression for Fast source operator stays orange (RUNNING) after the workflow completes #6010 (COMPLETEDsurvives a lateRUNNING). Verified the regression goes red whenupdateStateis reverted to last-write-wins.WorkerStateManagerSpec/test_state_manager.py: version starts at 0, bumps per successful transition, no bump on no-op self-transition or rejected transition.ReconfigurationIntegrationSpecreproduced the failure (timeout) before the pyamber fix and passes after.Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.8 (1M context), via Claude Code