[release-2.6] Fix device plugin pod not evicted during node drain#1803
Conversation
When a node is drained, the device plugin DaemonSet pod remains running because DaemonSet pods auto-tolerate NoSchedule taints and the only nodeSelector was the kmm-ready label (which stays while the kmod is loaded). This creates a deadlock: the unloader cannot run while the device plugin holds device files, and the device plugin won't leave because kmm-ready is never removed. Introduce a new device-plugin-target node label managed by the DevicePluginReconciler. The DaemonSet nodeSelector now requires both kmm-ready AND device-plugin-target. The DevicePluginReconciler watches node taint changes: it adds device-plugin-target to schedulable nodes matching the Module selector, and removes it from unschedulable nodes. This causes the DaemonSet controller to evict the device plugin pod during drain, breaking the deadlock.
|
Hi @openshift-cherrypick-robot. Thanks for your PR. I'm waiting for a rh-ecosystem-edge member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: openshift-cherrypick-robot, TomerNewman The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/retest |
|
/override ci/prow/security |
|
@TomerNewman: Overrode contexts on behalf of TomerNewman: ci/prow/security DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/assign @ybettan @yevgeny-shnaidman |
This is an automated cherry-pick of #1802
/assign TomerNewman