Add cross-process sc sharing as another reserved extension scenario.

jfb8856606 · jfb8856606 · commit 07de2f45eb72 · 2026-05-25T20:26:15.000+08:00
Append the second reserved extension scenario for the ring path
"cross-process sc sharing (where the worker count exceeds the fstack
instance count)" alongside the existing "multi-threaded sc sharing
within a single process". Both Chinese and English versions of
docs/ld_preload_ring_spec/ring_ipc_perf_offline_analysis.md are
updated in 4 locations (revision history, §1.4.4 final verdict,
§1.4.4 enable-conditions list, §10.8 final verdict).
diff --git a/docs/ld_preload_ring_spec/ring_ipc_perf_offline_analysis.md b/docs/ld_preload_ring_spec/ring_ipc_perf_offline_analysis.md
@@ -1,6 +1,6 @@
 # Ring IPC Performance Regression — Offline Deep Analysis (v3.6 · Final · Short/Long Connection Full-Scenario Convergence)
 
-> Revision history: v1 root cause H10/H11 (drain not present in sem) was falsified by the user; v2 root cause H15 (cache miss) was falsified by perf stat; v3 relocated the root cause to H17 based on F-Stack's official "event aggregation" theory; v3.1 (2026-05-21 morning) falsified H18 in measurement, plan A archived to §5.A; v3.2 (2026-05-21 evening) three sets of measurements jointly falsified H17/H21/H24, and the root cause converged to H19-final + H23; v3.3 (2026-05-22) plan C measured 4% regression and was discarded (H25 falsified), plan C+/D2 measured +9.7% QPS, plan D5 added; **v3.4 (2026-05-22 evening) plan D5 (+1.3%) + D6 (+0.9%) implementation closed out, QPS 91k → 102.2k for total +12.3% (reaching 97.3% of sem). The remaining 2.7% has been identified as ring SPSC architectural inherent overhead and cannot be eliminated**; v3.4.1 (2026-05-25) added §9 Appendix D documenting the multi-worker sem-mode `idle_sleep=0` startup starvation phenomenon; v3.4.2 (2026-05-25) §9.6 synced upstream fix progress (commit `8125beece6`, zero overhead under normal load); v3.5 (2026-05-25 evening) multi-core short-connection measurements across three groups (1/2/4 cores) jointly confirmed "ring has no performance advantage over sem under FF_MULTI_SC multi-worker short-connection scenarios"; **v3.6 (2026-05-25 evening) multi-core long-connection measurements across three groups (1/2/4 cores) showed ring consistently 2.4%–4.5% worse than sem, with stable direction and beyond the noise band of short-connection. Final convergence: the ring path has no performance advantage in any scenario under LD_PRELOAD + FF_MULTI_SC; the code is retained only as a reserve capability for future "multi-threaded sc sharing within a single process" extension scenarios. Production recommendation reverts to sem. See §1.4 and §10 Appendix E**. Full lessons summary in §4.
+> Revision history: v1 root cause H10/H11 (drain not present in sem) was falsified by the user; v2 root cause H15 (cache miss) was falsified by perf stat; v3 relocated the root cause to H17 based on F-Stack's official "event aggregation" theory; v3.1 (2026-05-21 morning) falsified H18 in measurement, plan A archived to §5.A; v3.2 (2026-05-21 evening) three sets of measurements jointly falsified H17/H21/H24, and the root cause converged to H19-final + H23; v3.3 (2026-05-22) plan C measured 4% regression and was discarded (H25 falsified), plan C+/D2 measured +9.7% QPS, plan D5 added; **v3.4 (2026-05-22 evening) plan D5 (+1.3%) + D6 (+0.9%) implementation closed out, QPS 91k → 102.2k for total +12.3% (reaching 97.3% of sem). The remaining 2.7% has been identified as ring SPSC architectural inherent overhead and cannot be eliminated**; v3.4.1 (2026-05-25) added §9 Appendix D documenting the multi-worker sem-mode `idle_sleep=0` startup starvation phenomenon; v3.4.2 (2026-05-25) §9.6 synced upstream fix progress (commit `8125beece6`, zero overhead under normal load); v3.5 (2026-05-25 evening) multi-core short-connection measurements across three groups (1/2/4 cores) jointly confirmed "ring has no performance advantage over sem under FF_MULTI_SC multi-worker short-connection scenarios"; **v3.6 (2026-05-25 evening) multi-core long-connection measurements across three groups (1/2/4 cores) showed ring consistently 2.4%–4.5% worse than sem, with stable direction and beyond the noise band of short-connection. Final convergence: the ring path has no performance advantage in any scenario under LD_PRELOAD + FF_MULTI_SC; the code is retained only as a reserve capability for future "multi-threaded sc sharing within a single process" and "cross-process sc sharing (where the worker count exceeds the fstack instance count)" extension scenarios. Production recommendation reverts to sem. See §1.4 and §10 Appendix E**. Full lessons summary in §4.
 
 ---
 
@@ -104,7 +104,7 @@ The author predicted in v3.5 §10.5: "Under long connections, sem holds the zone
 **Final verdict**:
 1. **Performance**: sem remains the optimal configuration for LD_PRELOAD + FF_MULTI_SC, ring has **no performance net win in any tested scenario**
 2. **Robustness**: the theoretical value of ring's lock-free main loop (immunity to startup starvation) has been fixed at the source on the sem path by commit `8125beece6`, **so the robustness advantage has also been eliminated**
-3. **Architecture**: the ring path **retains the code and compile flags** (`FF_USE_RING_IPC` + D2/D5/D6) as a reserve capability for future "multi-threaded sc sharing within a single process" extension scenarios. The current LD_PRELOAD fork-based multi-process scenario **does not enable ring by default**
+3. **Architecture**: the ring path **retains the code and compile flags** (`FF_USE_RING_IPC` + D2/D5/D6) as a reserve capability for future "multi-threaded sc sharing within a single process" and "cross-process sc sharing (where the worker count exceeds the fstack instance count)" extension scenarios. The current LD_PRELOAD fork-based multi-process scenario **does not enable ring by default**
 
 **Production recommended configuration (2026-05-25 final)**:
 
@@ -116,6 +116,7 @@ make FF_KERNEL_EVENT=1 FF_MULTI_SC=1
 
 The ring path is enabled only in either of the following cases:
 - Multiple threads inside a single process need to share sc (current LD_PRELOAD does not match this)
+- Cross-process sc sharing (worker count exceeds fstack instance count, the current LD_PRELOAD 1:1 deployment does not match this)
 - The user accepts a -2.4%~-4.5% performance loss in exchange for the lock-free main-loop design
 
 ---
@@ -1012,4 +1013,4 @@ The author predicted in v3.5 §10.5: "Under long connections, sem holds the zone
 
 **Final verdict (already written into §1.4.4)**:
 - Ring has **no performance net win in any tested scenario** under LD_PRELOAD + FF_MULTI_SC
-- Sem is the production recommended configuration; ring code is retained only as a reserve for future "multi-threaded sc sharing within a single process" extension scenarios
+- Sem is the production recommended configuration; ring code is retained only as a reserve for future "multi-threaded sc sharing within a single process" and "cross-process sc sharing (where the worker count exceeds the fstack instance count)" extension scenarios
diff --git a/docs/ld_preload_ring_spec/zh_cn/ring_ipc_perf_offline_analysis.md b/docs/ld_preload_ring_spec/zh_cn/ring_ipc_perf_offline_analysis.md
@@ -1,6 +1,6 @@
 # Ring IPC 性能劣化离线深度分析（v3.6 · 终版 · 短/长连接全场景收敛）
 
-> 修订历史：v1 主因 H10/H11（drain 不存在于 sem）已被用户证伪；v2 主因 H15（cache miss）已被 perf stat 证伪；v3 基于 F-Stack 官方"事件匹配度"理论重定位主因为 H17；v3.1（2026-05-21 上午）实测证伪 H18，方案 A 废弃为 §5.A；v3.2（2026-05-21 晚）三组实测协同证伪 H17/H21/H24，主因收敛到 H19-final + H23；v3.3（2026-05-22）方案 C 实测劣化 4% 废弃（H25 证伪），方案 C+/D2 实测成功 +9.7% QPS，新增方案 D5；**v3.4（2026-05-22 晚）方案 D5 (+1.3%) + D6 (+0.9%) 实施收尾，QPS 9.1w → 10.22w 总收益 +12.3%（达 sem 97.3%），剩余 2.7% 已识别为 ring SPSC 架构固有开销不可消除**；v3.4.1（2026-05-25）补充 §9 附录 D，记录多 worker sem 模式 `idle_sleep=0` 启动饥饿现象；v3.4.2（2026-05-25）§9.6 同步源头修复进展（提交 `8125beece6`，正常负载零开销）；v3.5（2026-05-25 晚）多核短连接实测三组（1/2/4 核）联合证实"ring 在 FF_MULTI_SC 多 worker 短连接场景下相对 sem 无性能优势"；**v3.6（2026-05-25 晚）多核长连接实测三组（1/2/4 核）显示 ring 持续劣于 sem 2.4%–4.5%，差距方向稳定且大于短连接噪声范围。最终收敛：ring 路径在 LD_PRELOAD + FF_MULTI_SC 任何场景下均无性能优势，仅保留代码作为未来"多线程同进程共享 sc"扩展场景的预留能力。生产推荐配置回归 sem。详见 §1.4 与 §10 附录 E**。完整教训总结见 §4。
+> 修订历史：v1 主因 H10/H11（drain 不存在于 sem）已被用户证伪；v2 主因 H15（cache miss）已被 perf stat 证伪；v3 基于 F-Stack 官方"事件匹配度"理论重定位主因为 H17；v3.1（2026-05-21 上午）实测证伪 H18，方案 A 废弃为 §5.A；v3.2（2026-05-21 晚）三组实测协同证伪 H17/H21/H24，主因收敛到 H19-final + H23；v3.3（2026-05-22）方案 C 实测劣化 4% 废弃（H25 证伪），方案 C+/D2 实测成功 +9.7% QPS，新增方案 D5；**v3.4（2026-05-22 晚）方案 D5 (+1.3%) + D6 (+0.9%) 实施收尾，QPS 9.1w → 10.22w 总收益 +12.3%（达 sem 97.3%），剩余 2.7% 已识别为 ring SPSC 架构固有开销不可消除**；v3.4.1（2026-05-25）补充 §9 附录 D，记录多 worker sem 模式 `idle_sleep=0` 启动饥饿现象；v3.4.2（2026-05-25）§9.6 同步源头修复进展（提交 `8125beece6`，正常负载零开销）；v3.5（2026-05-25 晚）多核短连接实测三组（1/2/4 核）联合证实"ring 在 FF_MULTI_SC 多 worker 短连接场景下相对 sem 无性能优势"；**v3.6（2026-05-25 晚）多核长连接实测三组（1/2/4 核）显示 ring 持续劣于 sem 2.4%–4.5%，差距方向稳定且大于短连接噪声范围。最终收敛：ring 路径在 LD_PRELOAD + FF_MULTI_SC 任何场景下均无性能优势，仅保留代码作为未来"多线程同进程共享 sc"和"多进程间共享 sc（worker 数量多于 fstack 实例数量）"扩展场景的预留能力。生产推荐配置回归 sem。详见 §1.4 与 §10 附录 E**。完整教训总结见 §4。
 
 ---
 
@@ -104,7 +104,7 @@
 **最终判定**：
 1. **性能层面**：sem 仍是 LD_PRELOAD + FF_MULTI_SC 的最优配置，ring 在**任何已测场景下均无性能 net win**
 2. **鲁棒性层面**：ring 主循环 lock-free 的理论价值（启动饥饿免疫）已被提交 `8125beece6` 在 sem 源头修复，**鲁棒性优势也已被消除**
-3. **架构层面**：ring 路径**保留代码与编译开关**（`FF_USE_RING_IPC` + D2/D5/D6），作为"多线程同进程共享 sc"未来扩展场景的预留能力。当前 LD_PRELOAD fork 多进程场景**默认不启用 ring**
+3. **架构层面**：ring 路径**保留代码与编译开关**（`FF_USE_RING_IPC` + D2/D5/D6），作为"多线程同进程共享 sc"和"多进程间共享 sc（worker 数量多于 fstack 实例数量）"未来扩展场景的预留能力。当前 LD_PRELOAD fork 多进程场景**默认不启用 ring**
 
 **生产推荐配置（2026-05-25 终版）**：
 
@@ -116,6 +116,7 @@ make FF_KERNEL_EVENT=1 FF_MULTI_SC=1
 
 ring 路径仅在以下任一情况启用：
 - 单进程内有多线程需共享 sc（当前 LD_PRELOAD 不命中）
+- 多进程间共享 sc（worker 数量多于 fstack 实例数量，当前 LD_PRELOAD 1:1 部署不命中）
 - 用户接受 -2.4%~-4.5% 性能损失换取主循环 lock-free 设计
 
 ---
@@ -1011,4 +1012,4 @@ ring 与 sem 衰减系数完全一致（Ring ×3.51 vs Sem 实际 ×3.45 = 35.9/
 
 **最终判定（已写入 §1.4.4）**：
 - ring 在 LD_PRELOAD + FF_MULTI_SC **任何已测场景下均无性能 net win**
-- sem 是生产推荐配置；ring 仅保留代码作未来"多线程同进程共享 sc"扩展场景预留
+- sem 是生产推荐配置；ring 仅保留代码作未来"多线程同进程共享 sc"和"多进程间共享 sc（worker 数量多于 fstack 实例数量）"扩展场景预留