fix(amber): correct WorkerSpec input port to stop flaky timeouts#6037
fix(amber): correct WorkerSpec input port to stop flaky timeouts#6037Ma77Ball wants to merge 2 commits into
Conversation
Automated Reviewer SuggestionsBased on the
|
|
| config | throughput | MB/s | latency | max Δ latest / 7d | |
|---|---|---|---|---|---|
| 🔴 | bs=10 sw=10 sl=64 | 373 | 0.228 | 26,220/36,930/36,930 us | 🔴 +9.0% / 🔴 +145.1% |
| 🔴 | bs=100 sw=10 sl=64 | 784 | 0.478 | 127,638/151,215/151,215 us | 🔴 +5.1% / 🔴 +40.5% |
| 🟢 | bs=1000 sw=10 sl=64 | 919 | 0.561 | 1,092,826/1,125,265/1,125,265 us | 🟢 -7.4% / 🔴 +10.7% |
Baseline details
Latest main 1073b22 from same runner
| config | metric | PR | latest main | 7d avg | Δ latest | Δ 7d |
|---|---|---|---|---|---|---|
| bs=10 sw=10 sl=64 | throughput | 373 tuples/sec | 408 tuples/sec | 777.62 tuples/sec | -8.6% | -52.0% |
| bs=10 sw=10 sl=64 | MB/s | 0.228 MB/s | 0.249 MB/s | 0.475 MB/s | -8.4% | -52.0% |
| bs=10 sw=10 sl=64 | p50 | 26,220 us | 24,060 us | 12,612 us | +9.0% | +107.9% |
| bs=10 sw=10 sl=64 | p95 | 36,930 us | 37,749 us | 15,070 us | -2.2% | +145.1% |
| bs=10 sw=10 sl=64 | p99 | 36,930 us | 37,749 us | 18,360 us | -2.2% | +101.1% |
| bs=100 sw=10 sl=64 | throughput | 784 tuples/sec | 799 tuples/sec | 988.31 tuples/sec | -1.9% | -20.7% |
| bs=100 sw=10 sl=64 | MB/s | 0.478 MB/s | 0.488 MB/s | 0.603 MB/s | -2.0% | -20.8% |
| bs=100 sw=10 sl=64 | p50 | 127,638 us | 121,435 us | 101,066 us | +5.1% | +26.3% |
| bs=100 sw=10 sl=64 | p95 | 151,215 us | 154,061 us | 107,594 us | -1.8% | +40.5% |
| bs=100 sw=10 sl=64 | p99 | 151,215 us | 154,061 us | 115,830 us | -1.8% | +30.5% |
| bs=1000 sw=10 sl=64 | throughput | 919 tuples/sec | 918 tuples/sec | 1,019 tuples/sec | +0.1% | -9.9% |
| bs=1000 sw=10 sl=64 | MB/s | 0.561 MB/s | 0.56 MB/s | 0.622 MB/s | +0.2% | -9.8% |
| bs=1000 sw=10 sl=64 | p50 | 1,092,826 us | 1,085,200 us | 986,982 us | +0.7% | +10.7% |
| bs=1000 sw=10 sl=64 | p95 | 1,125,265 us | 1,215,807 us | 1,028,491 us | -7.4% | +9.4% |
| bs=1000 sw=10 sl=64 | p99 | 1,125,265 us | 1,215,807 us | 1,058,493 us | -7.4% | +6.3% |
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,535.81,200,128000,373,0.228,26219.98,36930.48,36930.48
1,100,10,64,20,2551.66,2000,1280000,784,0.478,127637.86,151214.59,151214.59
2,1000,10,64,20,21772.61,20000,12800000,919,0.561,1092826.24,1125264.82,1125264.82
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6037 +/- ##
============================================
+ Coverage 54.50% 55.37% +0.86%
- Complexity 2915 3002 +87
============================================
Files 1108 1111 +3
Lines 42807 42914 +107
Branches 4604 4614 +10
============================================
+ Hits 23332 23763 +431
+ Misses 18119 17762 -357
- Partials 1356 1389 +33
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Updated error handling comment for clarity. Signed-off-by: Matthew B. <mgball@uci.edu>
What changes were proposed in this PR?
WorkerSpec's input-portAssignPortRequestnow passes an emptystorageUris(List()) matching its emptypartitionings, satisfyingInputManager.addPort'ssizeinvariant, so the input port registers and the three worker tests pass deterministically.AsyncRPCServer.invokeMethodnow re-throwsErrors (e.g. failed assertions) after replying to the sender, instead of swallowing them; normalExceptions are still returned to the sender unchanged, so genuine invariant violations surface loudly rather than degrading into flaky timeouts.Any related issues, documentation, discussions?
Closes: #6036
How was this PR tested?
sbt "WorkflowExecutionService/testOnly *WorkerSpec"; expectTests: succeeded 3, failed 0.assertion failed. Before this fix the count is 3 (the assertion fired and was swallowed every run); after it is 0. Verified locally under Java 17.AsyncRPCServerchange is left to the amber CI job to exercise across all RPC handlers.Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.8 in compliance with ASF