You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add cross-process sc sharing as another reserved extension scenario.
Append the second reserved extension scenario for the ring path
"cross-process sc sharing (where the worker count exceeds the fstack
instance count)" alongside the existing "multi-threaded sc sharing
within a single process". Both Chinese and English versions of
docs/ld_preload_ring_spec/ring_ipc_perf_offline_analysis.md are
updated in 4 locations (revision history, §1.4.4 final verdict,
§1.4.4 enable-conditions list, §10.8 final verdict).
Copy file name to clipboardExpand all lines: docs/ld_preload_ring_spec/ring_ipc_perf_offline_analysis.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Ring IPC Performance Regression — Offline Deep Analysis (v3.6 · Final · Short/Long Connection Full-Scenario Convergence)
2
2
3
-
> Revision history: v1 root cause H10/H11 (drain not present in sem) was falsified by the user; v2 root cause H15 (cache miss) was falsified by perf stat; v3 relocated the root cause to H17 based on F-Stack's official "event aggregation" theory; v3.1 (2026-05-21 morning) falsified H18 in measurement, plan A archived to §5.A; v3.2 (2026-05-21 evening) three sets of measurements jointly falsified H17/H21/H24, and the root cause converged to H19-final + H23; v3.3 (2026-05-22) plan C measured 4% regression and was discarded (H25 falsified), plan C+/D2 measured +9.7% QPS, plan D5 added; **v3.4 (2026-05-22 evening) plan D5 (+1.3%) + D6 (+0.9%) implementation closed out, QPS 91k → 102.2k for total +12.3% (reaching 97.3% of sem). The remaining 2.7% has been identified as ring SPSC architectural inherent overhead and cannot be eliminated**; v3.4.1 (2026-05-25) added §9 Appendix D documenting the multi-worker sem-mode `idle_sleep=0` startup starvation phenomenon; v3.4.2 (2026-05-25) §9.6 synced upstream fix progress (commit `8125beece6`, zero overhead under normal load); v3.5 (2026-05-25 evening) multi-core short-connection measurements across three groups (1/2/4 cores) jointly confirmed "ring has no performance advantage over sem under FF_MULTI_SC multi-worker short-connection scenarios"; **v3.6 (2026-05-25 evening) multi-core long-connection measurements across three groups (1/2/4 cores) showed ring consistently 2.4%–4.5% worse than sem, with stable direction and beyond the noise band of short-connection. Final convergence: the ring path has no performance advantage in any scenario under LD_PRELOAD + FF_MULTI_SC; the code is retained only as a reserve capability for future "multi-threaded sc sharing within a single process" extension scenarios. Production recommendation reverts to sem. See §1.4 and §10 Appendix E**. Full lessons summary in §4.
3
+
> Revision history: v1 root cause H10/H11 (drain not present in sem) was falsified by the user; v2 root cause H15 (cache miss) was falsified by perf stat; v3 relocated the root cause to H17 based on F-Stack's official "event aggregation" theory; v3.1 (2026-05-21 morning) falsified H18 in measurement, plan A archived to §5.A; v3.2 (2026-05-21 evening) three sets of measurements jointly falsified H17/H21/H24, and the root cause converged to H19-final + H23; v3.3 (2026-05-22) plan C measured 4% regression and was discarded (H25 falsified), plan C+/D2 measured +9.7% QPS, plan D5 added; **v3.4 (2026-05-22 evening) plan D5 (+1.3%) + D6 (+0.9%) implementation closed out, QPS 91k → 102.2k for total +12.3% (reaching 97.3% of sem). The remaining 2.7% has been identified as ring SPSC architectural inherent overhead and cannot be eliminated**; v3.4.1 (2026-05-25) added §9 Appendix D documenting the multi-worker sem-mode `idle_sleep=0` startup starvation phenomenon; v3.4.2 (2026-05-25) §9.6 synced upstream fix progress (commit `8125beece6`, zero overhead under normal load); v3.5 (2026-05-25 evening) multi-core short-connection measurements across three groups (1/2/4 cores) jointly confirmed "ring has no performance advantage over sem under FF_MULTI_SC multi-worker short-connection scenarios"; **v3.6 (2026-05-25 evening) multi-core long-connection measurements across three groups (1/2/4 cores) showed ring consistently 2.4%–4.5% worse than sem, with stable direction and beyond the noise band of short-connection. Final convergence: the ring path has no performance advantage in any scenario under LD_PRELOAD + FF_MULTI_SC; the code is retained only as a reserve capability for future "multi-threaded sc sharing within a single process" and "cross-process sc sharing (where the worker count exceeds the fstack instance count)" extension scenarios. Production recommendation reverts to sem. See §1.4 and §10 Appendix E**. Full lessons summary in §4.
4
4
5
5
---
6
6
@@ -104,7 +104,7 @@ The author predicted in v3.5 §10.5: "Under long connections, sem holds the zone
104
104
**Final verdict**:
105
105
1.**Performance**: sem remains the optimal configuration for LD_PRELOAD + FF_MULTI_SC, ring has **no performance net win in any tested scenario**
106
106
2.**Robustness**: the theoretical value of ring's lock-free main loop (immunity to startup starvation) has been fixed at the source on the sem path by commit `8125beece6`, **so the robustness advantage has also been eliminated**
107
-
3.**Architecture**: the ring path **retains the code and compile flags** (`FF_USE_RING_IPC` + D2/D5/D6) as a reserve capability for future "multi-threaded sc sharing within a single process" extension scenarios. The current LD_PRELOAD fork-based multi-process scenario **does not enable ring by default**
107
+
3.**Architecture**: the ring path **retains the code and compile flags** (`FF_USE_RING_IPC` + D2/D5/D6) as a reserve capability for future "multi-threaded sc sharing within a single process" and "cross-process sc sharing (where the worker count exceeds the fstack instance count)" extension scenarios. The current LD_PRELOAD fork-based multi-process scenario **does not enable ring by default**
@@ -116,6 +116,7 @@ make FF_KERNEL_EVENT=1 FF_MULTI_SC=1
116
116
117
117
The ring path is enabled only in either of the following cases:
118
118
- Multiple threads inside a single process need to share sc (current LD_PRELOAD does not match this)
119
+
- Cross-process sc sharing (worker count exceeds fstack instance count, the current LD_PRELOAD 1:1 deployment does not match this)
119
120
- The user accepts a -2.4%~-4.5% performance loss in exchange for the lock-free main-loop design
120
121
121
122
---
@@ -1012,4 +1013,4 @@ The author predicted in v3.5 §10.5: "Under long connections, sem holds the zone
1012
1013
1013
1014
**Final verdict (already written into §1.4.4)**:
1014
1015
- Ring has **no performance net win in any tested scenario** under LD_PRELOAD + FF_MULTI_SC
1015
-
- Sem is the production recommended configuration; ring code is retained only as a reserve for future "multi-threaded sc sharing within a single process" extension scenarios
1016
+
- Sem is the production recommended configuration; ring code is retained only as a reserve for future "multi-threaded sc sharing within a single process" and "cross-process sc sharing (where the worker count exceeds the fstack instance count)" extension scenarios
0 commit comments