Skip to content

Remove shell dependency from validator pods#2434

Draft
rajathagasthya wants to merge 1 commit intoNVIDIA:mainfrom
rajathagasthya:worktree-distroless-dev
Draft

Remove shell dependency from validator pods#2434
rajathagasthya wants to merge 1 commit intoNVIDIA:mainfrom
rajathagasthya:worktree-distroless-dev

Conversation

@rajathagasthya
Copy link
Copy Markdown
Contributor

@rajathagasthya rajathagasthya commented May 6, 2026

Description

This is Part 1 of a multi-part effort to move the gpu-operator image off the
nvcr.io/nvidia/distroless/cc:v4.0.0-dev base. NVIDIA's STIG policy is dropping
-dev distroless tags as approved parent images. The non--dev variants ship
no shell. A primary motivation for the migration is the recurring shell-related
CVE backlog the operator inherits today, so re-adding a shell to side-step the
move would defeat the purpose.

This PR addresses the shell dependencies inside the gpu-operator image
itself
— specifically the validator daemonsets and the workload validation
pods packaged into the image.

What's changed in this PR

Replace shell wrappers with direct binary invocation. The
operator-validator and sandbox-validator init containers invoke
nvidia-validator directly. Their pause containers use a new top-level
--sleep flag that prints the validator-success message and blocks on
SIGTERM. Workload pod main containers run nvidia-validator --version
as a no-op exit-0; the per-workload success message now prints from
(c *CUDA).runWorkload and (p *Plugin).runWorkload after waitForPod
succeeds — surfacing in the operator-validator init container logs where
success is actually established.

For preStop cleanup, add a small static helper rmglob that takes glob
patterns and removes matching paths. Modeled on k8s-cc-manager's vendored
static /bin/rm, shipped at /usr/bin/rmglob. Both validator daemonsets
keep their lifecycle.preStop blocks; they now call this binary instead
of sh -c rm.

Flip the Dockerfile base to nvcr.io/nvidia/distroless/cc:v4.0.0.

What's remaining (follow-up PRs)

  • Operand image Dockerfile cleanups. Each of these repos uses the
    same pattern in its runtime Dockerfile — FROM ...:vX-dev plus
    SHELL ["/busybox/sh", "-c"] plus RUN ln -s /busybox/sh /bin/sh
    and needs the -dev tag dropped along with the busybox symlink.
    Mechanical, ~5-line Dockerfile diff per repo, independent of each
    other:
    - [ ] NVIDIA/mig-parteddeployments/container/Dockerfile
    - [ ] NVIDIA/nvidia-container-toolkit
    deployments/container/Dockerfile (×2 stages: packaging,
    application)
    - [ ] NVIDIA/k8s-driver-manager
    deployments/container/Dockerfile.distroless
    - [ ] NVIDIA/k8s-device-plugin
    deployments/container/Dockerfile
  • Second-pass gpu-operator manifest cleanup. Once the operand
    images above stop shipping /bin/sh, the remaining sh -c
    wrappers in gpu-operator's operand asset DaemonSets will break.
    These need to be converted to direct binary invocations or to
    rmglob-style static helpers (modeled on the same pattern as
    this PR):
    - [ ] assets/state-driver/0500_daemonset.yamlnvidia-driver probe_nvidia_peermem, lsmod | grep nvidia_fs, lsmod | grep gdrdrv, rm -f /run/.../driver-ctr-ready preStop
    - [ ] assets/state-vfio-manager/0500_daemonset.yaml
    vfio-manage bind --all && while true; do sleep …
    - [ ] assets/state-mig-manager/0600_daemonset.yaml
    - [ ] assets/state-vgpu-manager/0500_daemonset.yaml
    - [ ] assets/state-vgpu-device-manager/0600_daemonset.yaml
    - [ ] assets/state-sandbox-device-plugin/0500_daemonset.yaml
    - [ ] assets/state-cc-manager/0500_daemonset.yaml
    - [ ] assets/state-dcgm/0400_dcgm.yml,
    assets/state-dcgm-exporter/0800_daemonset.yaml
    - [ ] assets/state-mps-control-daemon/0400_daemonset.yaml
    - [ ] assets/state-container-toolkit/0500_daemonset.yaml
    - [ ] assets/state-device-plugin/0500_daemonset.yaml
    - [ ] assets/gpu-feature-discovery/0500_daemonset.yaml

Checklist

  • No secrets, sensitive information, or unrelated changes
  • Lint checks passing (make lint)
  • Generated assets in-sync (make validate-generated-assets)
  • Go mod artifacts in-sync (make validate-modules)
  • Test cases are added for new code paths

Testing

  • New unit tests in cmd/nvidia-validator/main_test.go:
    • Test_validateFlags_standaloneSleep — validates that an empty
      --component is permitted only when --sleep is set.
    • Test_runSleep_returnsOnSignal — sends SIGTERM and confirms
      runSleep returns nil within 2s.
    • Test_runSleep_contextCancel — confirms ctx cancellation also
      unblocks runSleep cleanly.
  • New unit tests in cmd/rmglob/main_test.go:
    • TestRmglob — builds the binary, creates a-ready, b-ready,
      keep.txt in a tempdir, runs rmglob "<tempdir>/*-ready",
      asserts the -ready files are gone and keep.txt remains.
    • TestRmglobNoArgs — confirms invocation with no args exits
      non-zero.
  • make cmds builds all five binaries (gpu-operator, gpuop-cfg,
    manage-crds, nvidia-validator, rmglob) cleanly.
  • Host smoke test: nvidia-validator --version exits 0 with the
    expected version output.
  • grep -rn "sh -c\|/bin/sh" assets/state-{operator,sandbox}-validation/ validator/manifests/ returns zero hits.
  • e2e on a real GPU node is not yet performed .

Part of NVIDIA/cloud-native-team#299

Resolves #2435
Resolves #2436

NVIDIA's distroless-cc `-dev` tag (the gpu-operator image base) will no
longer be approved as a STIG parent image. The non-`-dev` variant ships
no shell, so the validator daemonsets and workload validation pods —
which wrapped binaries in `sh -c` and used shell-based preStop hooks —
would break on the new base. Re-adding a shell to the image would only
swap one CVE source for another.

Replace shell wrappers with direct binary invocation. The
operator-validator and sandbox-validator init containers invoke
`nvidia-validator` directly. Their pause containers use a new top-level
`--sleep` flag that prints the validator-success message and blocks on
SIGTERM. Workload pod main containers run `nvidia-validator --version`
as a no-op exit-0; the per-workload success message now prints from
`(c *CUDA).runWorkload` and `(p *Plugin).runWorkload` after
`waitForPod` succeeds — surfacing in the operator-validator init
container logs where success is actually established.

For preStop cleanup, add a small static helper `rmglob` that takes
glob patterns and removes matching paths. Modeled on k8s-cc-manager's
vendored static `/bin/rm`, shipped at `/usr/bin/rmglob`. Both
validator daemonsets keep their `lifecycle.preStop` blocks; they now
call this binary instead of `sh -c rm`.

Drop `hack/must-gather.sh` from the image entrypoint at
`/usr/bin/gather`. It depended on `bash`, `kubectl`, and `oc` — none
of which ship in the distroless base. Customers already run the
script from outside the cluster against an existing kubeconfig;
removing the in-image copy doesn't change that workflow.

Flip the Dockerfile base to `nvcr.io/nvidia/distroless/cc:v4.0.4`.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from 14f5202 to 20e9691 Compare May 7, 2026 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decide on must-gather.sh inclusion in Dockerfile after distroless migration Remove shell dependency from validator pods (Part 1)

1 participant