Skip to content

Commit 07de2f4

Browse files
committed
Add cross-process sc sharing as another reserved extension scenario.
Append the second reserved extension scenario for the ring path "cross-process sc sharing (where the worker count exceeds the fstack instance count)" alongside the existing "multi-threaded sc sharing within a single process". Both Chinese and English versions of docs/ld_preload_ring_spec/ring_ipc_perf_offline_analysis.md are updated in 4 locations (revision history, §1.4.4 final verdict, §1.4.4 enable-conditions list, §10.8 final verdict).
1 parent d9d6c74 commit 07de2f4

2 files changed

Lines changed: 8 additions & 6 deletions

File tree

docs/ld_preload_ring_spec/ring_ipc_perf_offline_analysis.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Ring IPC Performance Regression — Offline Deep Analysis (v3.6 · Final · Short/Long Connection Full-Scenario Convergence)
22

3-
> Revision history: v1 root cause H10/H11 (drain not present in sem) was falsified by the user; v2 root cause H15 (cache miss) was falsified by perf stat; v3 relocated the root cause to H17 based on F-Stack's official "event aggregation" theory; v3.1 (2026-05-21 morning) falsified H18 in measurement, plan A archived to §5.A; v3.2 (2026-05-21 evening) three sets of measurements jointly falsified H17/H21/H24, and the root cause converged to H19-final + H23; v3.3 (2026-05-22) plan C measured 4% regression and was discarded (H25 falsified), plan C+/D2 measured +9.7% QPS, plan D5 added; **v3.4 (2026-05-22 evening) plan D5 (+1.3%) + D6 (+0.9%) implementation closed out, QPS 91k → 102.2k for total +12.3% (reaching 97.3% of sem). The remaining 2.7% has been identified as ring SPSC architectural inherent overhead and cannot be eliminated**; v3.4.1 (2026-05-25) added §9 Appendix D documenting the multi-worker sem-mode `idle_sleep=0` startup starvation phenomenon; v3.4.2 (2026-05-25) §9.6 synced upstream fix progress (commit `8125beece6`, zero overhead under normal load); v3.5 (2026-05-25 evening) multi-core short-connection measurements across three groups (1/2/4 cores) jointly confirmed "ring has no performance advantage over sem under FF_MULTI_SC multi-worker short-connection scenarios"; **v3.6 (2026-05-25 evening) multi-core long-connection measurements across three groups (1/2/4 cores) showed ring consistently 2.4%–4.5% worse than sem, with stable direction and beyond the noise band of short-connection. Final convergence: the ring path has no performance advantage in any scenario under LD_PRELOAD + FF_MULTI_SC; the code is retained only as a reserve capability for future "multi-threaded sc sharing within a single process" extension scenarios. Production recommendation reverts to sem. See §1.4 and §10 Appendix E**. Full lessons summary in §4.
3+
> Revision history: v1 root cause H10/H11 (drain not present in sem) was falsified by the user; v2 root cause H15 (cache miss) was falsified by perf stat; v3 relocated the root cause to H17 based on F-Stack's official "event aggregation" theory; v3.1 (2026-05-21 morning) falsified H18 in measurement, plan A archived to §5.A; v3.2 (2026-05-21 evening) three sets of measurements jointly falsified H17/H21/H24, and the root cause converged to H19-final + H23; v3.3 (2026-05-22) plan C measured 4% regression and was discarded (H25 falsified), plan C+/D2 measured +9.7% QPS, plan D5 added; **v3.4 (2026-05-22 evening) plan D5 (+1.3%) + D6 (+0.9%) implementation closed out, QPS 91k → 102.2k for total +12.3% (reaching 97.3% of sem). The remaining 2.7% has been identified as ring SPSC architectural inherent overhead and cannot be eliminated**; v3.4.1 (2026-05-25) added §9 Appendix D documenting the multi-worker sem-mode `idle_sleep=0` startup starvation phenomenon; v3.4.2 (2026-05-25) §9.6 synced upstream fix progress (commit `8125beece6`, zero overhead under normal load); v3.5 (2026-05-25 evening) multi-core short-connection measurements across three groups (1/2/4 cores) jointly confirmed "ring has no performance advantage over sem under FF_MULTI_SC multi-worker short-connection scenarios"; **v3.6 (2026-05-25 evening) multi-core long-connection measurements across three groups (1/2/4 cores) showed ring consistently 2.4%–4.5% worse than sem, with stable direction and beyond the noise band of short-connection. Final convergence: the ring path has no performance advantage in any scenario under LD_PRELOAD + FF_MULTI_SC; the code is retained only as a reserve capability for future "multi-threaded sc sharing within a single process" and "cross-process sc sharing (where the worker count exceeds the fstack instance count)" extension scenarios. Production recommendation reverts to sem. See §1.4 and §10 Appendix E**. Full lessons summary in §4.
44

55
---
66

@@ -104,7 +104,7 @@ The author predicted in v3.5 §10.5: "Under long connections, sem holds the zone
104104
**Final verdict**:
105105
1. **Performance**: sem remains the optimal configuration for LD_PRELOAD + FF_MULTI_SC, ring has **no performance net win in any tested scenario**
106106
2. **Robustness**: the theoretical value of ring's lock-free main loop (immunity to startup starvation) has been fixed at the source on the sem path by commit `8125beece6`, **so the robustness advantage has also been eliminated**
107-
3. **Architecture**: the ring path **retains the code and compile flags** (`FF_USE_RING_IPC` + D2/D5/D6) as a reserve capability for future "multi-threaded sc sharing within a single process" extension scenarios. The current LD_PRELOAD fork-based multi-process scenario **does not enable ring by default**
107+
3. **Architecture**: the ring path **retains the code and compile flags** (`FF_USE_RING_IPC` + D2/D5/D6) as a reserve capability for future "multi-threaded sc sharing within a single process" and "cross-process sc sharing (where the worker count exceeds the fstack instance count)" extension scenarios. The current LD_PRELOAD fork-based multi-process scenario **does not enable ring by default**
108108

109109
**Production recommended configuration (2026-05-25 final)**:
110110

@@ -116,6 +116,7 @@ make FF_KERNEL_EVENT=1 FF_MULTI_SC=1
116116

117117
The ring path is enabled only in either of the following cases:
118118
- Multiple threads inside a single process need to share sc (current LD_PRELOAD does not match this)
119+
- Cross-process sc sharing (worker count exceeds fstack instance count, the current LD_PRELOAD 1:1 deployment does not match this)
119120
- The user accepts a -2.4%~-4.5% performance loss in exchange for the lock-free main-loop design
120121

121122
---
@@ -1012,4 +1013,4 @@ The author predicted in v3.5 §10.5: "Under long connections, sem holds the zone
10121013
10131014
**Final verdict (already written into §1.4.4)**:
10141015
- Ring has **no performance net win in any tested scenario** under LD_PRELOAD + FF_MULTI_SC
1015-
- Sem is the production recommended configuration; ring code is retained only as a reserve for future "multi-threaded sc sharing within a single process" extension scenarios
1016+
- Sem is the production recommended configuration; ring code is retained only as a reserve for future "multi-threaded sc sharing within a single process" and "cross-process sc sharing (where the worker count exceeds the fstack instance count)" extension scenarios

docs/ld_preload_ring_spec/zh_cn/ring_ipc_perf_offline_analysis.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Ring IPC 性能劣化离线深度分析(v3.6 · 终版 · 短/长连接全场景收敛)
22

3-
> 修订历史:v1 主因 H10/H11(drain 不存在于 sem)已被用户证伪;v2 主因 H15(cache miss)已被 perf stat 证伪;v3 基于 F-Stack 官方"事件匹配度"理论重定位主因为 H17;v3.1(2026-05-21 上午)实测证伪 H18,方案 A 废弃为 §5.A;v3.2(2026-05-21 晚)三组实测协同证伪 H17/H21/H24,主因收敛到 H19-final + H23;v3.3(2026-05-22)方案 C 实测劣化 4% 废弃(H25 证伪),方案 C+/D2 实测成功 +9.7% QPS,新增方案 D5;**v3.4(2026-05-22 晚)方案 D5 (+1.3%) + D6 (+0.9%) 实施收尾,QPS 9.1w → 10.22w 总收益 +12.3%(达 sem 97.3%),剩余 2.7% 已识别为 ring SPSC 架构固有开销不可消除**;v3.4.1(2026-05-25)补充 §9 附录 D,记录多 worker sem 模式 `idle_sleep=0` 启动饥饿现象;v3.4.2(2026-05-25)§9.6 同步源头修复进展(提交 `8125beece6`,正常负载零开销);v3.5(2026-05-25 晚)多核短连接实测三组(1/2/4 核)联合证实"ring 在 FF_MULTI_SC 多 worker 短连接场景下相对 sem 无性能优势";**v3.6(2026-05-25 晚)多核长连接实测三组(1/2/4 核)显示 ring 持续劣于 sem 2.4%–4.5%,差距方向稳定且大于短连接噪声范围。最终收敛:ring 路径在 LD_PRELOAD + FF_MULTI_SC 任何场景下均无性能优势,仅保留代码作为未来"多线程同进程共享 sc"扩展场景的预留能力。生产推荐配置回归 sem。详见 §1.4 与 §10 附录 E**。完整教训总结见 §4。
3+
> 修订历史:v1 主因 H10/H11(drain 不存在于 sem)已被用户证伪;v2 主因 H15(cache miss)已被 perf stat 证伪;v3 基于 F-Stack 官方"事件匹配度"理论重定位主因为 H17;v3.1(2026-05-21 上午)实测证伪 H18,方案 A 废弃为 §5.A;v3.2(2026-05-21 晚)三组实测协同证伪 H17/H21/H24,主因收敛到 H19-final + H23;v3.3(2026-05-22)方案 C 实测劣化 4% 废弃(H25 证伪),方案 C+/D2 实测成功 +9.7% QPS,新增方案 D5;**v3.4(2026-05-22 晚)方案 D5 (+1.3%) + D6 (+0.9%) 实施收尾,QPS 9.1w → 10.22w 总收益 +12.3%(达 sem 97.3%),剩余 2.7% 已识别为 ring SPSC 架构固有开销不可消除**;v3.4.1(2026-05-25)补充 §9 附录 D,记录多 worker sem 模式 `idle_sleep=0` 启动饥饿现象;v3.4.2(2026-05-25)§9.6 同步源头修复进展(提交 `8125beece6`,正常负载零开销);v3.5(2026-05-25 晚)多核短连接实测三组(1/2/4 核)联合证实"ring 在 FF_MULTI_SC 多 worker 短连接场景下相对 sem 无性能优势";**v3.6(2026-05-25 晚)多核长连接实测三组(1/2/4 核)显示 ring 持续劣于 sem 2.4%–4.5%,差距方向稳定且大于短连接噪声范围。最终收敛:ring 路径在 LD_PRELOAD + FF_MULTI_SC 任何场景下均无性能优势,仅保留代码作为未来"多线程同进程共享 sc"和"多进程间共享 sc(worker 数量多于 fstack 实例数量)"扩展场景的预留能力。生产推荐配置回归 sem。详见 §1.4 与 §10 附录 E**。完整教训总结见 §4。
44

55
---
66

@@ -104,7 +104,7 @@
104104
**最终判定**
105105
1. **性能层面**:sem 仍是 LD_PRELOAD + FF_MULTI_SC 的最优配置,ring 在**任何已测场景下均无性能 net win**
106106
2. **鲁棒性层面**:ring 主循环 lock-free 的理论价值(启动饥饿免疫)已被提交 `8125beece6` 在 sem 源头修复,**鲁棒性优势也已被消除**
107-
3. **架构层面**:ring 路径**保留代码与编译开关**`FF_USE_RING_IPC` + D2/D5/D6),作为"多线程同进程共享 sc"未来扩展场景的预留能力。当前 LD_PRELOAD fork 多进程场景**默认不启用 ring**
107+
3. **架构层面**:ring 路径**保留代码与编译开关**`FF_USE_RING_IPC` + D2/D5/D6),作为"多线程同进程共享 sc"和"多进程间共享 sc(worker 数量多于 fstack 实例数量)"未来扩展场景的预留能力。当前 LD_PRELOAD fork 多进程场景**默认不启用 ring**
108108

109109
**生产推荐配置(2026-05-25 终版)**
110110

@@ -116,6 +116,7 @@ make FF_KERNEL_EVENT=1 FF_MULTI_SC=1
116116

117117
ring 路径仅在以下任一情况启用:
118118
- 单进程内有多线程需共享 sc(当前 LD_PRELOAD 不命中)
119+
- 多进程间共享 sc(worker 数量多于 fstack 实例数量,当前 LD_PRELOAD 1:1 部署不命中)
119120
- 用户接受 -2.4%~-4.5% 性能损失换取主循环 lock-free 设计
120121

121122
---
@@ -1011,4 +1012,4 @@ ring 与 sem 衰减系数完全一致(Ring ×3.51 vs Sem 实际 ×3.45 = 35.9/
10111012
10121013
**最终判定(已写入 §1.4.4)**
10131014
- ring 在 LD_PRELOAD + FF_MULTI_SC **任何已测场景下均无性能 net win**
1014-
- sem 是生产推荐配置;ring 仅保留代码作未来"多线程同进程共享 sc"扩展场景预留
1015+
- sem 是生产推荐配置;ring 仅保留代码作未来"多线程同进程共享 sc"和"多进程间共享 sc(worker 数量多于 fstack 实例数量)"扩展场景预留

0 commit comments

Comments
 (0)