Skip to content

fix(satellite+drbd): per-peer SetGi --node-id + bounded DRBD teardown (Bug 81 + Bug 82)#2

Draft
kvaps wants to merge 2 commits into
mainfrom
fix/setgi-per-peer-node-id
Draft

fix(satellite+drbd): per-peer SetGi --node-id + bounded DRBD teardown (Bug 81 + Bug 82)#2
kvaps wants to merge 2 commits into
mainfrom
fix/setgi-per-peer-node-id

Conversation

@kvaps
Copy link
Copy Markdown
Member

@kvaps kvaps commented May 14, 2026

Summary

Two related satellite-side fixes surfaced by hands-on testing on the dev-kvaps stand today.

Bug 81 — DRBD 9.2+'s drbdmeta refuses the legacy single-call set-gi form (per-peer bitmap layout). The previous mitigation (424728661) silently disabled Bug 77's skip-initial-sync optimisation by catching the error. This proper fix:

  • Adm.SetGi(ctx, rsc, vol, dev, peerNodeID, gi) emits --node-id <peer>.
  • Satellite's seedInitialGi iterates over dr.GetPeers(), looking up each peer's NodeID from peer.<name>.node-id in drbdOptions, and stamps the deterministic day0 GI per peer.
  • Removes the catch-and-skip fallback — failures are now real errors.

Bug 82 — RD delete on a suspended:quorum resource left the DRBD kernel slot orphaned because Adm.Down had no timeout. Force-strip of the satellite finalizer then bricked the next port/minor allocation with Device 'N' is configured!. The same pattern hung satellite pod preStop during DaemonSet rollouts.

  • New Adm.DownForce(ctx, rsc, volumeNumbers) wraps Down with a 15s budget (DownForceTimeout).
  • On timeout falls back to drbdsetup detach --force <minor> per volume + drbdsetup down — both bypass quorum.
  • handleDelete / preStop both call DownForce, so the kernel slot is gone before the finalizer releases.

Test plan

  • go test ./pkg/satellite/... ./pkg/drbd/... — all green locally.
  • Updated unit tests TestApplyFirstActivationSeedsGiBeforeAdjust and TestApplyFirstActivationSkipsInitialSyncOnThinOrZFS to use a 2-replica DRBD shape so the per-peer set-gi loop fires; assertions now pin --node-id 1 in the command.
  • Verify on stand: linstor rd c X; linstor vd c X 32M; linstor r c X --auto-place 2 reaches UpToDate without falling through to full initial sync (check drbdsetup events2).
  • Verify on stand: linstor rd d X on a quorum-suspended resource completes within 15s and removes /dev/drbd<minor>.

… (Bug 81 + Bug 82)

Two related satellite-side fixes:

**Bug 81**: DRBD 9.2+'s drbdmeta refuses the legacy single-call
set-gi form ("requires the --node-id option") because the GI tuple
lives in per-peer bitmap slots. The previous mitigation caught the
error and skipped the day0 GI seed, silently disabling Bug 77's
"skip initial sync" optimisation. SetGi now takes a peerNodeID and
the satellite iterates over every peer of every diskful volume,
stamping the deterministic day0 GI per peer — restoring the
optimisation on DRBD 9.2+.

**Bug 82**: An RD delete on a resource with `suspended:quorum` left
the DRBD kernel slot orphaned because the satellite's Adm.Down call
had no timeout — handleDelete blocked indefinitely, the finalizer
never released, and operators force-stripped the finalizer leaving
/dev/drbd<minor> stuck. The next RD that recycled the port/minor
failed create-md with "Device 'N' is configured!". New DownForce
helper wraps Down with a 15s budget and falls back to
`drbdsetup detach --force <minor>` per volume + `drbdsetup down`
on timeout — both bypass the quorum wait. handleDelete now always
returns within DownForceTimeout, leaving the kernel slot gone
before finalizer-strip.

Also covers the satellite preStop hang during DaemonSet rollouts —
the same DownForce path is now what preStop calls, so a stuck
quorum slot no longer blocks pod termination.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7bf96bb6-4aa3-4f2f-8684-a26e55246162

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/setgi-per-peer-node-id

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a DownForce mechanism to prevent resource deletion from hanging when DRBD quorum is suspended, utilizing a 15-second timeout and falling back to kernel-direct commands. It also updates SetGi to support DRBD 9.2+ by iterating over peers and providing the required --node-id flag. Review feedback suggests refactoring the error-string matching in DownForce for better maintainability and surfacing strconv.Atoi errors during peer ID parsing to prevent silent configuration failures.

Comment thread pkg/drbd/drbdadm.go
Comment on lines +125 to +129
if strings.Contains(downErr.Error(), "No such resource") ||
strings.Contains(downErr.Error(), "No currently configured DRBD") ||
strings.Contains(downErr.Error(), "Unknown resource") {
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition to check for expected errors from drbdsetup down is a bit long and relies on hardcoded strings. This is fragile and can be hard to maintain if the error messages from drbdsetup change in the future. To improve readability and maintainability, you could iterate over a slice of expected error substrings. This makes it easier to add or remove expected error messages in the future.

	for _, msg := range []string{
		"No such resource",
		"No currently configured DRBD",
		"Unknown resource",
	} {
		if strings.Contains(downErr.Error(), msg) {
			return nil
		}
	}

Comment on lines +1199 to +1202
peerID, parseErr := strconv.Atoi(peerIDStr)
if parseErr != nil {
continue
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code silently ignores an error when parsing peerIDStr using strconv.Atoi. If peer.node-id from DrbdOptions is present but not a valid integer, it's a configuration error that should be surfaced. Silently continuing could lead to a peer being skipped for GI seeding without any warning, causing an unexpected full resync for that peer. This contradicts the PR's goal of making failures "real errors". I suggest returning the parsing error to make misconfigurations visible.

Suggested change
peerID, parseErr := strconv.Atoi(peerIDStr)
if parseErr != nil {
continue
}
peerID, parseErr := strconv.Atoi(peerIDStr)
if parseErr != nil {
return errors.Wrapf(parseErr, "invalid node-id for peer %s: %q", peer, peerIDStr)
}

Companion to the satellite-side bounded DownForce: the DaemonSet
preStop hook now also handles the suspended:quorum case. Per
resource, it tries `drbdadm down` with a 5s budget; on failure
falls back to `drbdsetup detach --force <res>/<vol>` per volume
plus `drbdsetup down <res>` — both bypass DRBD's quorum-aware
wait so the kernel slot is gone before the pod terminates.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
kvaps added a commit that referenced this pull request May 14, 2026
…ardown)

Add tests/e2e/recovery-suspended-quorum.sh — a Tier 4 real-DRBD scenario
that pins the bounded-teardown contract introduced by PR #2 (commits
795346c satellite DownForce + 27feb7e preStop kernel-direct
fallback).

The test creates a 3-replica RD, force-disconnects the local node from
both peers so DRBD reports `suspended:quorum`, then issues an RD delete
via the python-linstor CLI. With the fix the RD CRD must vanish inside
60s (DownForce 15s budget + cleanup), every satellite must report no
kernel slot for the RD, and a fresh RD with the same backing size must
reach UpToDate immediately — exercising the port/minor recycle path
that pre-fix would trip `drbdadm create-md: Device 'N' is configured!`.

Pre-fix this would hang forever on `drbdadm down` (no timeout) and
leave /dev/drbd<minor> orphaned; the new envelope is enforced by the
60s teardown budget and the recycle assertion.

Cleanup trap reconnects the disconnected peer + force-deletes both RDs
via lib.sh's delete_rd so a partial-failure rerun finds a clean stand.

Validated on the e2e-dualpri stand (3 workers, pool=stand):
  - RD-delete envelope: 1s (well inside 60s budget)
  - port/minor recycle: PASS
  - total wall time: 23s

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant