Skip to content

[agent] Replace drbdsetup status polling with events2 state store#643

Open
astef wants to merge 1 commit into
astef-prototypefrom
astef-optimize-with-events-state-store
Open

[agent] Replace drbdsetup status polling with events2 state store#643
astef wants to merge 1 commit into
astef-prototypefrom
astef-optimize-with-events-state-store

Conversation

@astef
Copy link
Copy Markdown
Member

@astef astef commented Apr 3, 2026

Description

Replace per-reconcile drbdsetup status + drbdsetup show subprocess calls with an in-memory state store backed by the events2 stream, plus a lazy show cache for configuration data.

Two new components are introduced in the drbdr package:

  • DRBDStateStore — thread-safe store of DRBD runtime state (role, disk-state, connections, paths, etc.), built incrementally from events2 by the Scanner. Provides Snapshot(resourceName) that returns a *drbdutils.Resource compatible with the existing actualState consumption.
  • ShowCache — per-resource cache of drbdsetup show results (configuration options, backing disk, net options). Fetched lazily on first access, invalidated after convergence actions.

The reconciler's initial observe phase reads from the store + show cache (zero subprocess calls in steady state). The post-convergence refresh still calls drbdsetup status directly for immediate accuracy, then re-fetches show from the invalidated cache.

--statistics flag added to events2 args so size is available in device events.

Why do we need it, and what problem does it solve?

Every reconcile of a DRBDResource spawned 2 subprocesses (drbdsetup status + drbdsetup show), and up to 2 more after convergence. With 20 concurrent reconcilers and frequent events, this created excessive process overhead.

The events2 stream already carries all runtime state data — the same data that drbdsetup status returns. By maintaining this state in memory and reading from it during reconciliation, we eliminate redundant subprocess calls.

What is the expected result?

Subprocess calls per reconcile cycle:

Scenario Before After
Non-converging (steady state) 2 (status + show) 0 (first time: 1 show)
Converging 4 (status + show + refresh status + refresh show) 2 (refresh status + refresh show)
Orphan check 1 (status) 0
Rename recovery 1 (status) 0

Functional behavior is preserved — the same ActualDRBDState interface is used by downstream code (Report, computeTargetDRBDActions, etc.) with no changes.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

@astef astef self-assigned this Apr 3, 2026
@astef astef force-pushed the astef-optimize-with-events-state-store branch from 3e7ad1d to ec3685e Compare April 3, 2026 07:48
Introduce DRBDStateStore — an in-memory representation of DRBD runtime
state maintained from the events2 stream by the Scanner — and ShowCache
for lazy per-resource caching of drbdsetup show results.

The reconciler's initial observe phase now reads from the store and show
cache (zero subprocess calls in steady state), while the post-convergence
refresh still calls drbdsetup status directly for immediate accuracy.

Key changes:
- Add --statistics to events2 args so device size is available
- Type Event.Kind and Event.Object as enums with validation in parseLine
- Scanner populates DRBDStateStore alongside the existing DRBDPortCache
- observeActualDRBDState reads from store + show cache
- New observeActualDRBDStateFresh for post-convergence (direct status)
- Orphan and rename existence checks use store.ResourceExists()
- ShowCache invalidated after convergence actions

Subprocess calls per reconcile:
  Non-converging: 2 → 0  (first time: 1 show)
  Converging:     4 → 2  (refresh status + refresh show)

Signed-off-by: Aleksandr Stefurishin <aleksandr.stefurishin@flant.com>
@astef astef force-pushed the astef-optimize-with-events-state-store branch from ec3685e to f620d04 Compare April 3, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant