Skip to content

[release-2.6] Fix device plugin pod not evicted during node drain#1803

Open
openshift-cherrypick-robot wants to merge 1 commit into
rh-ecosystem-edge:release-2.6from
openshift-cherrypick-robot:cherry-pick-1802-to-release-2.6
Open

[release-2.6] Fix device plugin pod not evicted during node drain#1803
openshift-cherrypick-robot wants to merge 1 commit into
rh-ecosystem-edge:release-2.6from
openshift-cherrypick-robot:cherry-pick-1802-to-release-2.6

Conversation

@openshift-cherrypick-robot
Copy link
Copy Markdown

This is an automated cherry-pick of #1802

/assign TomerNewman

When a node is drained, the device plugin DaemonSet pod remains running
because DaemonSet pods auto-tolerate NoSchedule taints and the only
nodeSelector was the kmm-ready label (which stays while the kmod is
loaded). This creates a deadlock: the unloader cannot run while the
device plugin holds device files, and the device plugin won't leave
because kmm-ready is never removed.

Introduce a new device-plugin-target node label managed by the
DevicePluginReconciler. The DaemonSet nodeSelector now requires both
kmm-ready AND device-plugin-target. The DevicePluginReconciler watches
node taint changes: it adds device-plugin-target to schedulable nodes
matching the Module selector, and removes it from unschedulable nodes.
This causes the DaemonSet controller to evict the device plugin pod
during drain, breaking the deadlock.
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Jun 2, 2026

Hi @openshift-cherrypick-robot. Thanks for your PR.

I'm waiting for a rh-ecosystem-edge member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@TomerNewman
Copy link
Copy Markdown
Member

/ok-to-test
/approve

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: openshift-cherrypick-robot, TomerNewman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved label Jun 2, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dacf15df-3e74-4f43-ac96-9ed355468c3c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@TomerNewman
Copy link
Copy Markdown
Member

/retest

@TomerNewman
Copy link
Copy Markdown
Member

/override ci/prow/security
failed on a test file, does not affect the actual KMM code

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Jun 3, 2026

@TomerNewman: Overrode contexts on behalf of TomerNewman: ci/prow/security

Details

In response to this:

/override ci/prow/security
failed on a test file, does not affect the actual KMM code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@TomerNewman
Copy link
Copy Markdown
Member

/assign @ybettan @yevgeny-shnaidman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants