Skip to content

Fix device plugin pod not evicted during node drain#1802

Merged
openshift-merge-bot[bot] merged 1 commit into
rh-ecosystem-edge:mainfrom
TomerNewman:bugfix/devicepluginnotevicted
Jun 2, 2026
Merged

Fix device plugin pod not evicted during node drain#1802
openshift-merge-bot[bot] merged 1 commit into
rh-ecosystem-edge:mainfrom
TomerNewman:bugfix/devicepluginnotevicted

Conversation

@TomerNewman
Copy link
Copy Markdown
Member

@TomerNewman TomerNewman commented Jun 2, 2026

When a node is drained, the device plugin DaemonSet pod remains running because DaemonSet pods auto-tolerate NoSchedule taints and the only nodeSelector was the kmm-ready label (which stays while the kmod is loaded). This creates a deadlock: the unloader cannot run while the device plugin holds device files, and the device plugin won't leave because kmm-ready is never removed.

Introduce a new device-plugin-target node label managed by the DevicePluginReconciler. The DaemonSet nodeSelector now requires both kmm-ready AND device-plugin-target. The DevicePluginReconciler watches node taint changes: it adds device-plugin-target to schedulable nodes matching the Module selector, and removes it from unschedulable nodes. This causes the DaemonSet controller to evict the device plugin pod during drain, breaking the deadlock.


fixes #1801


/cc @yevgeny-shnaidman @ybettan

Summary by CodeRabbit

  • New Features

    • Device plugins are now automatically labeled on targeted nodes, enabling their eviction from draining nodes.
    • Device-plugin DaemonSet targeting now includes both kernel-ready and device-plugin-specific node labels.
  • Improvements

    • Module reconciliation now exclusively targets schedulable nodes based on tolerations.
    • Device plugins monitor node taint changes for enhanced cluster responsiveness.

When a node is drained, the device plugin DaemonSet pod remains running
because DaemonSet pods auto-tolerate NoSchedule taints and the only
nodeSelector was the kmm-ready label (which stays while the kmod is
loaded). This creates a deadlock: the unloader cannot run while the
device plugin holds device files, and the device plugin won't leave
because kmm-ready is never removed.

Introduce a new device-plugin-target node label managed by the
DevicePluginReconciler. The DaemonSet nodeSelector now requires both
kmm-ready AND device-plugin-target. The DevicePluginReconciler watches
node taint changes: it adds device-plugin-target to schedulable nodes
matching the Module selector, and removes it from unschedulable nodes.
This causes the DaemonSet controller to evict the device plugin pod
during drain, breaking the deadlock.
@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 2, 2026

Deploy Preview for openshift-kmm ready!

Name Link
🔨 Latest commit 6f58208
🔍 Latest deploy log https://app.netlify.com/projects/openshift-kmm/deploys/6a1e76ff78bd7400072233d1
😎 Deploy Preview https://deploy-preview-1802--openshift-kmm.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: TomerNewman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved label Jun 2, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Walkthrough

The PR implements device-plugin pod eviction during node drain by refactoring node selection into separate list and filter operations, introducing a device-plugin-target label managed by the DevicePluginReconciler, and updating the device-plugin DaemonSet to require this label alongside the existing kernel-module-ready label. The device-plugin reconciler now watches node taint changes and dynamically labels schedulable nodes while removing the label from unschedulable nodes.

Changes

Device Plugin Pod Eviction During Node Drain

Layer / File(s) Summary
Node API Refactoring
internal/node/node.go, internal/node/mock_node.go, internal/node/node_test.go, internal/utils/kmmlabels.go
Node interface split from combined GetNodesListBySelector into separate GetAllNodesBySelector (list only) and GetSchedulableNodesBySelector (filter by tolerations) methods. Implementations refactored to apply IsNodeSchedulable selectively. Label utility GetDevicePluginTargetNodeLabel added to format device-plugin node labels.
Device Plugin Target Label Helpers
internal/controllers/device_plugin_reconciler.go (lines 140–265)
Reconciler helper interface extended with handleDevicePluginTargetLabels and removeDevicePluginTargetLabels methods. Implementations iterate nodes, determine schedulability, and accumulate per-node label update errors via errors.Join.
Device Plugin Reconciler Core
internal/controllers/device_plugin_reconciler.go (lines 37–113, 483–486), internal/filter/filter.go
Reconcile flow updated: deletion handling removes target labels before module deletion; non-deletion reconciliation calls handleDevicePluginTargetLabels. DaemonSet nodeSelector extended to require both kernel-module-ready and device-plugin-target labels. SetupWithManager adds Node watch with DevicePluginReconcilerNodePredicate to enqueue modules when node taints change.
Module Reconciler Update
internal/controllers/module_reconciler.go, internal/controllers/module_reconciler_test.go
Module reconciler uses new GetSchedulableNodesBySelector instead of combined method, ensuring nodes are filtered by schedulability when determining targeted nodes for MIC/NMC handling.
Test Infrastructure
internal/controllers/device_plugin_reconciler_test.go, internal/controllers/mock_device_plugin_reconciler.go
Device plugin reconciler tests expanded to verify target-label handling in error and success paths, deletion sequencing with label removal, and multi-node label updates with aggregated errors. Mock methods handleDevicePluginTargetLabels and removeDevicePluginTargetLabels added to support test expectations. All node selection test expectations updated to use new GetSchedulableNodesBySelector call.

Sequence Diagram

sequenceDiagram
  participant DevicePluginReconciler
  participant NodeWatch as Node Event Watch
  participant NodeAPI as Node API
  participant LabelHandler as Label Handler
  participant Kubernetes as Kubernetes API
  
  NodeWatch->>DevicePluginReconciler: Node taint changed (non-deletion)
  DevicePluginReconciler->>DevicePluginReconciler: Reconcile module
  DevicePluginReconciler->>LabelHandler: handleDevicePluginTargetLabels
  LabelHandler->>NodeAPI: GetAllNodesBySelector (module selector)
  NodeAPI-->>LabelHandler: nodes list
  LabelHandler->>NodeAPI: IsNodeSchedulable (per-node check)
  alt Schedulable
    LabelHandler->>Kubernetes: Add device-plugin-target label
  else Unschedulable
    LabelHandler->>Kubernetes: Remove device-plugin-target label
  end
  LabelHandler-->>DevicePluginReconciler: aggregated errors
  DevicePluginReconciler->>Kubernetes: Update DaemonSet (requires device-plugin-target label)
  Kubernetes->>Kubernetes: Evict pod from unschedulable nodes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the primary fix: preventing device plugin pod eviction failure during node drain.
Linked Issues check ✅ Passed The changes implement the upstream commit dcc567e0 to fix device plugin pod eviction during node drain by introducing device-plugin-target label management, addressing the requirement in issue #1801.
Out of Scope Changes check ✅ Passed All changes are within scope: node interface refactoring, DevicePluginReconciler enhancements for label management, filter predicate for node taint watching, and supporting infrastructure changes align with the device-plugin-target label feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yevgeny-shnaidman
Copy link
Copy Markdown
Member

/lgtm

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/node/node.go (1)

34-47: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle cordoned nodes in IsNodeSchedulable.

  • internal/node/node.go’s IsNodeSchedulable only checks node.Spec.Taints and ignores node.Spec.Unschedulable (set by kubectl cordon).
  • handleDevicePluginTargetLabels uses IsNodeSchedulable to decide when to add/remove the device-plugin-target label; since DaemonSets tolerate the implicit node.kubernetes.io/unschedulable:NoSchedule taint, cordoned nodes can keep the label unless Spec.Unschedulable is accounted for.
  • Add a regression test in internal/node/node_test.go asserting IsNodeSchedulable(...)=false when Spec.Unschedulable=true (even if there are no blocking taints).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/node/node.go` around lines 34 - 47, The IsNodeSchedulable function
currently only examines node.Spec.Taints; update IsNodeSchedulable to return
false when node.Spec.Unschedulable == true (i.e., treat cordoned nodes as
unschedulable before checking taints) so handleDevicePluginTargetLabels will
drop the label for cordoned nodes; then add a unit test in
internal/node/node_test.go that constructs a v1.Node with Spec.Unschedulable =
true (and no blocking taints) and asserts IsNodeSchedulable(...) == false to
prevent regressions.
🧹 Nitpick comments (1)
internal/controllers/module_reconciler_test.go (1)

125-128: ⚡ Quick win

Add one case that exercises non-empty mod.Spec.Tolerations.

These expectations only cover the nil/empty case, so they won't catch a regression where Reconcile stops forwarding user-defined tolerations and only passes module.InternalTolerations. A single happy-path test with a custom module toleration and an expectation on the combined slice would lock down the new contract.

Also applies to: 189-190, 209-209

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controllers/module_reconciler_test.go` around lines 125 - 128, Add a
test case that sets mod.Spec.Tolerations to a non-empty slice and assert
Reconcile forwards the combined tolerations (user-defined tolerations followed
by module.InternalTolerations) to mn.GetSchedulableNodesBySelector;
specifically, in the test create a module with a custom toleration, set an
expectation on GetSchedulableNodesBySelector to receive mod.Spec.Selector and
the combined slice (e.g., append(module.InternalTolerations... ) to
mod.Spec.Tolerations) and return targetedNodes,nil, then exercise the Reconcile
path; apply the same pattern to the other similar test blocks that currently
only cover the nil/empty tolerations case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal/node/node.go`:
- Around line 34-47: The IsNodeSchedulable function currently only examines
node.Spec.Taints; update IsNodeSchedulable to return false when
node.Spec.Unschedulable == true (i.e., treat cordoned nodes as unschedulable
before checking taints) so handleDevicePluginTargetLabels will drop the label
for cordoned nodes; then add a unit test in internal/node/node_test.go that
constructs a v1.Node with Spec.Unschedulable = true (and no blocking taints) and
asserts IsNodeSchedulable(...) == false to prevent regressions.

---

Nitpick comments:
In `@internal/controllers/module_reconciler_test.go`:
- Around line 125-128: Add a test case that sets mod.Spec.Tolerations to a
non-empty slice and assert Reconcile forwards the combined tolerations
(user-defined tolerations followed by module.InternalTolerations) to
mn.GetSchedulableNodesBySelector; specifically, in the test create a module with
a custom toleration, set an expectation on GetSchedulableNodesBySelector to
receive mod.Spec.Selector and the combined slice (e.g.,
append(module.InternalTolerations... ) to mod.Spec.Tolerations) and return
targetedNodes,nil, then exercise the Reconcile path; apply the same pattern to
the other similar test blocks that currently only cover the nil/empty
tolerations case.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 97a01d47-d5f3-4c2f-b9b6-20b5187aebd7

📥 Commits

Reviewing files that changed from the base of the PR and between 88d1433 and 6f58208.

📒 Files selected for processing (10)
  • internal/controllers/device_plugin_reconciler.go
  • internal/controllers/device_plugin_reconciler_test.go
  • internal/controllers/mock_device_plugin_reconciler.go
  • internal/controllers/module_reconciler.go
  • internal/controllers/module_reconciler_test.go
  • internal/filter/filter.go
  • internal/node/mock_node.go
  • internal/node/node.go
  • internal/node/node_test.go
  • internal/utils/kmmlabels.go

@openshift-merge-bot openshift-merge-bot Bot merged commit a4a5d0b into rh-ecosystem-edge:main Jun 2, 2026
21 checks passed
@TomerNewman
Copy link
Copy Markdown
Member Author

/cherry-pick release-2.16

@TomerNewman TomerNewman deleted the bugfix/devicepluginnotevicted branch June 2, 2026 12:55
@openshift-cherrypick-robot
Copy link
Copy Markdown

@TomerNewman: cannot checkout release-2.16: error checking out "release-2.16": exit status 1 error: pathspec 'release-2.16' did not match any file(s) known to git

Details

In response to this:

/cherry-pick release-2.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@TomerNewman
Copy link
Copy Markdown
Member Author

/cherry-pick release-2.6

@openshift-cherrypick-robot
Copy link
Copy Markdown

@TomerNewman: new pull request created: #1803

Details

In response to this:

/cherry-pick release-2.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@TomerNewman
Copy link
Copy Markdown
Member Author

/cherry-pick release-2.5

@openshift-cherrypick-robot
Copy link
Copy Markdown

@TomerNewman: #1802 failed to apply on top of branch "release-2.5":

Applying: Fix device plugin pod not evicted during node drain
Using index info to reconstruct a base tree...
M	internal/controllers/device_plugin_reconciler.go
M	internal/controllers/device_plugin_reconciler_test.go
M	internal/controllers/module_reconciler.go
M	internal/controllers/module_reconciler_test.go
M	internal/node/node.go
M	internal/utils/kmmlabels.go
Falling back to patching base and 3-way merge...
Auto-merging internal/utils/kmmlabels.go
Auto-merging internal/node/node.go
Auto-merging internal/controllers/module_reconciler_test.go
CONFLICT (content): Merge conflict in internal/controllers/module_reconciler_test.go
Auto-merging internal/controllers/module_reconciler.go
CONFLICT (content): Merge conflict in internal/controllers/module_reconciler.go
Auto-merging internal/controllers/device_plugin_reconciler_test.go
Auto-merging internal/controllers/device_plugin_reconciler.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Fix device plugin pod not evicted during node drain

Details

In response to this:

/cherry-pick release-2.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@TomerNewman
Copy link
Copy Markdown
Member Author

/cherry-pick release-2.5

@openshift-cherrypick-robot
Copy link
Copy Markdown

@TomerNewman: #1802 failed to apply on top of branch "release-2.5":

Applying: Fix device plugin pod not evicted during node drain
Using index info to reconstruct a base tree...
M	internal/controllers/device_plugin_reconciler.go
M	internal/controllers/device_plugin_reconciler_test.go
M	internal/controllers/module_reconciler.go
M	internal/controllers/module_reconciler_test.go
M	internal/node/node.go
M	internal/utils/kmmlabels.go
Falling back to patching base and 3-way merge...
Auto-merging internal/utils/kmmlabels.go
Auto-merging internal/node/node.go
Auto-merging internal/controllers/module_reconciler_test.go
CONFLICT (content): Merge conflict in internal/controllers/module_reconciler_test.go
Auto-merging internal/controllers/module_reconciler.go
CONFLICT (content): Merge conflict in internal/controllers/module_reconciler.go
Auto-merging internal/controllers/device_plugin_reconciler_test.go
Auto-merging internal/controllers/device_plugin_reconciler.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Fix device plugin pod not evicted during node drain

Details

In response to this:

/cherry-pick release-2.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cherry-picking error for dcc567e05b604f88b589b6dab7f73a17e7d4ecbc

3 participants