fix(satellite+drbd): per-peer SetGi --node-id + bounded DRBD teardown (Bug 81 + Bug 82)#2
fix(satellite+drbd): per-peer SetGi --node-id + bounded DRBD teardown (Bug 81 + Bug 82)#2kvaps wants to merge 2 commits into
Conversation
… (Bug 81 + Bug 82)
Two related satellite-side fixes:
**Bug 81**: DRBD 9.2+'s drbdmeta refuses the legacy single-call
set-gi form ("requires the --node-id option") because the GI tuple
lives in per-peer bitmap slots. The previous mitigation caught the
error and skipped the day0 GI seed, silently disabling Bug 77's
"skip initial sync" optimisation. SetGi now takes a peerNodeID and
the satellite iterates over every peer of every diskful volume,
stamping the deterministic day0 GI per peer — restoring the
optimisation on DRBD 9.2+.
**Bug 82**: An RD delete on a resource with `suspended:quorum` left
the DRBD kernel slot orphaned because the satellite's Adm.Down call
had no timeout — handleDelete blocked indefinitely, the finalizer
never released, and operators force-stripped the finalizer leaving
/dev/drbd<minor> stuck. The next RD that recycled the port/minor
failed create-md with "Device 'N' is configured!". New DownForce
helper wraps Down with a 15s budget and falls back to
`drbdsetup detach --force <minor>` per volume + `drbdsetup down`
on timeout — both bypass the quorum wait. handleDelete now always
returns within DownForceTimeout, leaving the kernel slot gone
before finalizer-strip.
Also covers the satellite preStop hang during DaemonSet rollouts —
the same DownForce path is now what preStop calls, so a stuck
quorum slot no longer blocks pod termination.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a DownForce mechanism to prevent resource deletion from hanging when DRBD quorum is suspended, utilizing a 15-second timeout and falling back to kernel-direct commands. It also updates SetGi to support DRBD 9.2+ by iterating over peers and providing the required --node-id flag. Review feedback suggests refactoring the error-string matching in DownForce for better maintainability and surfacing strconv.Atoi errors during peer ID parsing to prevent silent configuration failures.
| if strings.Contains(downErr.Error(), "No such resource") || | ||
| strings.Contains(downErr.Error(), "No currently configured DRBD") || | ||
| strings.Contains(downErr.Error(), "Unknown resource") { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The condition to check for expected errors from drbdsetup down is a bit long and relies on hardcoded strings. This is fragile and can be hard to maintain if the error messages from drbdsetup change in the future. To improve readability and maintainability, you could iterate over a slice of expected error substrings. This makes it easier to add or remove expected error messages in the future.
for _, msg := range []string{
"No such resource",
"No currently configured DRBD",
"Unknown resource",
} {
if strings.Contains(downErr.Error(), msg) {
return nil
}
}| peerID, parseErr := strconv.Atoi(peerIDStr) | ||
| if parseErr != nil { | ||
| continue | ||
| } |
There was a problem hiding this comment.
The code silently ignores an error when parsing peerIDStr using strconv.Atoi. If peer.node-id from DrbdOptions is present but not a valid integer, it's a configuration error that should be surfaced. Silently continuing could lead to a peer being skipped for GI seeding without any warning, causing an unexpected full resync for that peer. This contradicts the PR's goal of making failures "real errors". I suggest returning the parsing error to make misconfigurations visible.
| peerID, parseErr := strconv.Atoi(peerIDStr) | |
| if parseErr != nil { | |
| continue | |
| } | |
| peerID, parseErr := strconv.Atoi(peerIDStr) | |
| if parseErr != nil { | |
| return errors.Wrapf(parseErr, "invalid node-id for peer %s: %q", peer, peerIDStr) | |
| } |
Companion to the satellite-side bounded DownForce: the DaemonSet preStop hook now also handles the suspended:quorum case. Per resource, it tries `drbdadm down` with a 5s budget; on failure falls back to `drbdsetup detach --force <res>/<vol>` per volume plus `drbdsetup down <res>` — both bypass DRBD's quorum-aware wait so the kernel slot is gone before the pod terminates. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…ardown) Add tests/e2e/recovery-suspended-quorum.sh — a Tier 4 real-DRBD scenario that pins the bounded-teardown contract introduced by PR #2 (commits 795346c satellite DownForce + 27feb7e preStop kernel-direct fallback). The test creates a 3-replica RD, force-disconnects the local node from both peers so DRBD reports `suspended:quorum`, then issues an RD delete via the python-linstor CLI. With the fix the RD CRD must vanish inside 60s (DownForce 15s budget + cleanup), every satellite must report no kernel slot for the RD, and a fresh RD with the same backing size must reach UpToDate immediately — exercising the port/minor recycle path that pre-fix would trip `drbdadm create-md: Device 'N' is configured!`. Pre-fix this would hang forever on `drbdadm down` (no timeout) and leave /dev/drbd<minor> orphaned; the new envelope is enforced by the 60s teardown budget and the recycle assertion. Cleanup trap reconnects the disconnected peer + force-deletes both RDs via lib.sh's delete_rd so a partial-failure rerun finds a clean stand. Validated on the e2e-dualpri stand (3 workers, pool=stand): - RD-delete envelope: 1s (well inside 60s budget) - port/minor recycle: PASS - total wall time: 23s Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Summary
Two related satellite-side fixes surfaced by hands-on testing on the dev-kvaps stand today.
Bug 81 — DRBD 9.2+'s drbdmeta refuses the legacy single-call set-gi form (per-peer bitmap layout). The previous mitigation (
424728661) silently disabled Bug 77's skip-initial-sync optimisation by catching the error. This proper fix:Adm.SetGi(ctx, rsc, vol, dev, peerNodeID, gi)emits--node-id <peer>.seedInitialGiiterates overdr.GetPeers(), looking up each peer's NodeID frompeer.<name>.node-idin drbdOptions, and stamps the deterministic day0 GI per peer.Bug 82 — RD delete on a
suspended:quorumresource left the DRBD kernel slot orphaned becauseAdm.Downhad no timeout. Force-strip of the satellite finalizer then bricked the next port/minor allocation withDevice 'N' is configured!. The same pattern hung satellite podpreStopduring DaemonSet rollouts.Adm.DownForce(ctx, rsc, volumeNumbers)wraps Down with a 15s budget (DownForceTimeout).drbdsetup detach --force <minor>per volume +drbdsetup down— both bypass quorum.Test plan
go test ./pkg/satellite/... ./pkg/drbd/...— all green locally.TestApplyFirstActivationSeedsGiBeforeAdjustandTestApplyFirstActivationSkipsInitialSyncOnThinOrZFSto use a 2-replica DRBD shape so the per-peer set-gi loop fires; assertions now pin--node-id 1in the command.linstor rd c X; linstor vd c X 32M; linstor r c X --auto-place 2reaches UpToDate without falling through to full initial sync (checkdrbdsetup events2).linstor rd d Xon a quorum-suspended resource completes within 15s and removes/dev/drbd<minor>.