Skip to content

Remove sh -c wrappers from operand-asset DaemonSets (Part 2) #2437

@rajathagasthya

Description

@rajathagasthya

Part of NVIDIA/cloud-native-team#299.

Once the operand images (mig-parted, nvidia-container-toolkit,
k8s-driver-manager, k8s-device-plugin) drop their /bin/sh busybox
symlink, the remaining sh -c wrappers in gpu-operator's operand
asset DaemonSets will break. These need to be converted to direct
binary invocations or to rmglob-style static helpers (modeled on
the rmglob introduced in PR #2434).

Scope (assets/state-*/):

  • state-driver/0500_daemonset.yamlnvidia-driver probe_nvidia_peermem, lsmod | grep nvidia_fs, lsmod | grep gdrdrv, rm -f /run/.../driver-ctr-ready preStop
  • state-vfio-manager/0500_daemonset.yamlvfio-manage bind --all && while true; do sleep …
  • state-mig-manager/0600_daemonset.yaml
  • state-vgpu-manager/0500_daemonset.yaml
  • state-vgpu-device-manager/0600_daemonset.yaml
  • state-sandbox-device-plugin/0500_daemonset.yaml
  • state-cc-manager/0500_daemonset.yaml
  • state-dcgm/0400_dcgm.yml
  • state-dcgm-exporter/0800_daemonset.yaml
  • state-mps-control-daemon/0400_daemonset.yaml
  • state-container-toolkit/0500_daemonset.yaml
  • state-device-plugin/0500_daemonset.yaml
  • gpu-feature-discovery/0500_daemonset.yaml

Acceptance:

  • All listed manifests no longer wrap operand binaries in sh -c
  • lsmod | grep <module> checks replaced by Go-based module
    checks or sentinel-file-based readiness
  • preStop rm -f calls replaced with rmglob or equivalent
    static binary
  • e2e against a real GPU node passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovements to existing features, performance, or usability (not bug fixes or new features).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions