drbd: fix false positive resync throttling in drbd_rs_c_min_rate_throttle#832
drbd: fix false positive resync throttling in drbd_rs_c_min_rate_throttle#832blktests-ci[bot] wants to merge 1 commit into
Conversation
|
Upstream branch: aa54b1d |
b1870f6 to
ca57796
Compare
|
Upstream branch: 70eda68 |
ec9698e to
eb52697
Compare
ca57796 to
c1feb59
Compare
|
Upstream branch: 8bc67e4 |
eb52697 to
3bba6f6
Compare
c1feb59 to
ea833a1
Compare
|
Upstream branch: 6779b50 |
3bba6f6 to
6d5ca9d
Compare
ea833a1 to
7af85d1
Compare
|
Upstream branch: 79bd2dd |
6d5ca9d to
7b07a8b
Compare
7af85d1 to
de94ac7
Compare
|
Upstream branch: eed108e |
7b07a8b to
154f397
Compare
de94ac7 to
86d8d37
Compare
|
Upstream branch: e8c2f9f |
154f397 to
7b95cb4
Compare
86d8d37 to
9805659
Compare
|
Upstream branch: eb3f4b7 |
7b95cb4 to
58fd6b6
Compare
9805659 to
3f4a345
Compare
|
Upstream branch: 8fde5d1 |
58fd6b6 to
0b612a8
Compare
3f4a345 to
c6dc343
Compare
|
Upstream branch: e43ffb6 |
0b612a8 to
2bff906
Compare
c6dc343 to
fc36596
Compare
|
Upstream branch: ba3e43a |
…ttle drbd_rs_c_min_rate_throttle() is intended to slow down resync when genuine application I/O is competing for the backing device. It used to detect "application I/O" by comparing the total sector count from the backing device (part_stat_read_accum) against the resync sector counter (rs_sect_ev), and throttling when the resync speed exceeds c-min-rate. That curr_events heuristic produces false positives: 1) On the receiver path, rs_sect_ev is incremented *after* the throttle check. The current resync I/O is already reflected in part_stat counters but not yet in rs_sect_ev, creating a persistent positive delta that looks like application I/O. 2) The per-cpu part_stat counters and the atomic rs_sect_ev are not read under any common lock, so transient skew between them can push the delta above 64 sectors even when no application I/O is present. When the false positive fires, the function compares the resync speed against c-min-rate (default 35840 KB/s ~ 35 MB/s). On modern hardware capable of 300+ MB/s resync the condition is almost always true, so the caller sleeps 100 ms (HZ/10) per resync request or stops issuing new requests, capping throughput at roughly c-min-rate. This was observed in production on a Distributed Cloud controller where drbd-dc-vault (100 GB) resynced at ~30 MB/s instead of the expected ~360 MB/s. Setting c-min-rate above the actual resync speed (e.g. 350 MB/s) or disabling the feature (c-min-rate 0) restored full throughput, confirming false-positive throttling as root cause. Switch the gate to ap_bio_cnt. inc_ap_bio() is called for every application bio at the top of drbd_make_request(), before any activity-log handling, and dec_ap_bio() runs on completion. That makes ap_bio_cnt the authoritative "application I/O in flight" signal, independent of part_stat update timing, per-cpu skew, and activity-log fastpath outcomes. Backport of the drbd 9.x fix to the in-tree drbd 8.4 driver. Suggested-by: Ionut Nechita <ionut.nechita@windriver.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> [inechita: backport to drbd 8.4 - ap_bio_cnt is scalar, not array] Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
2bff906 to
0afd4dc
Compare
Pull request for series with
subject: drbd: fix false positive resync throttling in drbd_rs_c_min_rate_throttle
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1094336