Remove shell dependency from validator pods by rajathagasthya · Pull Request #2434 · NVIDIA/gpu-operator

rajathagasthya · 2026-05-06T19:11:23Z

Description

This is Part 1 of a multi-part effort to move the gpu-operator image off the
nvcr.io/nvidia/distroless/cc:v4.0.0-dev base. NVIDIA's STIG policy is dropping
-dev distroless tags as approved parent images. The non--dev variants ship
no shell. A primary motivation for the migration is the recurring shell-related
CVE backlog the operator inherits today, so re-adding a shell to side-step the
move would defeat the purpose.

This PR addresses the shell dependencies inside the gpu-operator image
itself — specifically the validator daemonsets and the workload validation
pods packaged into the image.

What's changed in this PR

Replace shell wrappers with direct binary invocation. The
operator-validator and sandbox-validator init containers invoke
nvidia-validator directly. Their pause containers use a new top-level
--sleep flag that prints the validator-success message and blocks on
SIGTERM. Workload pod main containers run nvidia-validator --version
as a no-op exit-0; the per-workload success message now prints from
(c *CUDA).runWorkload and (p *Plugin).runWorkload after waitForPod
succeeds — surfacing in the operator-validator init container logs where
success is actually established.

For preStop cleanup, add a small static helper rmglob that takes glob
patterns and removes matching paths. Modeled on k8s-cc-manager's vendored
static /bin/rm, shipped at /usr/bin/rmglob. Both validator daemonsets
keep their lifecycle.preStop blocks; they now call this binary instead
of sh -c rm.

Flip the Dockerfile base to nvcr.io/nvidia/distroless/cc:v4.0.0.

What's remaining (follow-up PRs)

Operand image Dockerfile cleanups. Each of these repos uses the
same pattern in its runtime Dockerfile — FROM ...:vX-dev plus
SHELL ["/busybox/sh", "-c"] plus RUN ln -s /busybox/sh /bin/sh —
and needs the -dev tag dropped along with the busybox symlink.
Mechanical, ~5-line Dockerfile diff per repo, independent of each
other:
- [ ] NVIDIA/mig-parted — deployments/container/Dockerfile
- [ ] NVIDIA/nvidia-container-toolkit —
deployments/container/Dockerfile (×2 stages: packaging,
application)
- [ ] NVIDIA/k8s-driver-manager —
deployments/container/Dockerfile.distroless
- [ ] NVIDIA/k8s-device-plugin —
deployments/container/Dockerfile
Second-pass gpu-operator manifest cleanup. Once the operand
images above stop shipping /bin/sh, the remaining sh -c
wrappers in gpu-operator's operand asset DaemonSets will break.
These need to be converted to direct binary invocations or to
rmglob-style static helpers (modeled on the same pattern as
this PR):
- [ ] assets/state-driver/0500_daemonset.yaml — nvidia-driver probe_nvidia_peermem, lsmod | grep nvidia_fs, lsmod | grep gdrdrv, rm -f /run/.../driver-ctr-ready preStop
- [ ] assets/state-vfio-manager/0500_daemonset.yaml —
vfio-manage bind --all && while true; do sleep …
- [ ] assets/state-mig-manager/0600_daemonset.yaml
- [ ] assets/state-vgpu-manager/0500_daemonset.yaml
- [ ] assets/state-vgpu-device-manager/0600_daemonset.yaml
- [ ] assets/state-sandbox-device-plugin/0500_daemonset.yaml
- [ ] assets/state-cc-manager/0500_daemonset.yaml
- [ ] assets/state-dcgm/0400_dcgm.yml,
assets/state-dcgm-exporter/0800_daemonset.yaml
- [ ] assets/state-mps-control-daemon/0400_daemonset.yaml
- [ ] assets/state-container-toolkit/0500_daemonset.yaml
- [ ] assets/state-device-plugin/0500_daemonset.yaml
- [ ] assets/gpu-feature-discovery/0500_daemonset.yaml

Checklist

No secrets, sensitive information, or unrelated changes
Lint checks passing (make lint)
Generated assets in-sync (make validate-generated-assets)
Go mod artifacts in-sync (make validate-modules)
Test cases are added for new code paths

Testing

New unit tests in cmd/nvidia-validator/main_test.go:
- Test_validateFlags_standaloneSleep — validates that an empty
  --component is permitted only when --sleep is set.
- Test_runSleep_returnsOnSignal — sends SIGTERM and confirms
  runSleep returns nil within 2s.
- Test_runSleep_contextCancel — confirms ctx cancellation also
  unblocks runSleep cleanly.
New unit tests in cmd/rmglob/main_test.go:
- TestRmglob — builds the binary, creates a-ready, b-ready,
  keep.txt in a tempdir, runs rmglob "<tempdir>/*-ready",
  asserts the -ready files are gone and keep.txt remains.
- TestRmglobNoArgs — confirms invocation with no args exits
  non-zero.
make cmds builds all five binaries (gpu-operator, gpuop-cfg,
manage-crds, nvidia-validator, rmglob) cleanly.
Host smoke test: nvidia-validator --version exits 0 with the
expected version output.
grep -rn "sh -c\|/bin/sh" assets/state-{operator,sandbox}-validation/ validator/manifests/ returns zero hits.
e2e on a real GPU node is not yet performed .

Part of NVIDIA/cloud-native-team#299

Resolves #2435
Resolves #2436

NVIDIA's distroless-cc `-dev` tag (the gpu-operator image base) will no longer be approved as a STIG parent image. The non-`-dev` variant ships no shell, so the validator daemonsets and workload validation pods — which wrapped binaries in `sh -c` and used shell-based preStop hooks — would break on the new base. Re-adding a shell to the image would only swap one CVE source for another. Replace shell wrappers with direct binary invocation. The operator-validator and sandbox-validator init containers invoke `nvidia-validator` directly. Their pause containers use a new top-level `--sleep` flag that prints the validator-success message and blocks on SIGTERM. Workload pod main containers run `nvidia-validator --version` as a no-op exit-0; the per-workload success message now prints from `(c *CUDA).runWorkload` and `(p *Plugin).runWorkload` after `waitForPod` succeeds — surfacing in the operator-validator init container logs where success is actually established. For preStop cleanup, add a small static helper `rmglob` that takes glob patterns and removes matching paths. Modeled on k8s-cc-manager's vendored static `/bin/rm`, shipped at `/usr/bin/rmglob`. Both validator daemonsets keep their `lifecycle.preStop` blocks; they now call this binary instead of `sh -c rm`. Drop `hack/must-gather.sh` from the image entrypoint at `/usr/bin/gather`. It depended on `bash`, `kubectl`, and `oc` — none of which ship in the distroless base. Customers already run the script from outside the cluster against an existing kubeconfig; removing the in-image copy doesn't change that workflow. Flip the Dockerfile base to `nvcr.io/nvidia/distroless/cc:v4.0.4`. Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>

rajathagasthya force-pushed the worktree-distroless-dev branch 5 times, most recently from 19fd65d to 14f5202 Compare May 6, 2026 20:25

This was referenced May 6, 2026

Remove shell dependency from validator pods (Part 1) #2435

Open

Decide on must-gather.sh inclusion in Dockerfile after distroless migration #2436

Open

Remove sh -c wrappers from operand-asset DaemonSets (Part 2) #2437

Open

rajathagasthya force-pushed the worktree-distroless-dev branch from 14f5202 to 20e9691 Compare May 7, 2026 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove shell dependency from validator pods#2434

Remove shell dependency from validator pods#2434
rajathagasthya wants to merge 1 commit intoNVIDIA:mainfrom
rajathagasthya:worktree-distroless-dev

rajathagasthya commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rajathagasthya commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What's changed in this PR

What's remaining (follow-up PRs)

Checklist

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rajathagasthya commented May 6, 2026 •

edited

Loading