You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On the sandbox / vGPU workload path, changing MIG mode and creating SR-IOV vGPU VFs does not survive a node reboot, and the operator does not re-establish this state on boot. After a reboot of an NVSwitch node configured for MIG-backed vGPU:
MIG mode / MIG instances are not present until re-applied;
SR-IOV VFs (created via sriov-manage -e) are gone — VFs are runtime state and are never persisted by SR-IOV;
the vGPU devices created on those VFs are gone.
The operator's MIG manager can enable MIG mode, but committing a MIG-mode change on these GPUs requires a GPU reset. The only reset mechanism the Kubernetes mig-manager has is WITH_REBOOT ("reboot the node if changing the MIG mode fails for any reason") — it never performs a targeted nvidia-smi --gpu-reset. And VF creation (sriov-manage) is not performed by the operator at all — sriov-manage appears nowhere in gpu-operator. So on the vGPU path, after a reboot, MIG + VFs + vGPUs have to be re-established out-of-band (host units that enable MIG, run --gpu-reset, run sriov-manage -e per PF, then release the operand readiness gate).
By contrast, the systemd deployment of mig-parted does handle reboot persistence: nvidia-mig-manager.service persists the selected config across reboot (persist_config_across_reboot) and orders itself via nvidia-gpu-reset.target (After=nvidia-fabricmanager.service, Before=nvidia-gpu-reset.target) so MIG reconfiguration, GPU reset, and driver-service ordering are coordinated at boot. The Kubernetes / operator path has no equivalent.
Proposal / discussion
Give the operator a way to commit MIG-mode changes via a targeted GPU reset instead of only a full node reboot (WITH_REBOOT). This likely needs coordination with nvidia-persistenced (see nvidia-persistenced keeps a handle on the GPU and blocks --gpu-reset #118), which holds a handle that blocks --gpu-reset.
On the vGPU / sandbox path, have the operator own re-establishing SR-IOV VFs on boot (the equivalent of sriov-manage -e per PF) and re-applying the MIG + vGPU device configuration, so a reboot recovers hands-off without host units. This is the smallest useful seam and could be a first change on its own (VF re-enablement in the vGPU device-manager operand before it applies the vGPU config).
Port the systemd deployment's boot-ordering guarantees (config persistence + nvidia-gpu-reset.target ordering) to the operator's operands, or document the boot units as a supported companion for the Kubernetes path.
Motivation
On the sandbox / vGPU workload path, changing MIG mode and creating SR-IOV vGPU VFs does not survive a node reboot, and the operator does not re-establish this state on boot. After a reboot of an NVSwitch node configured for MIG-backed vGPU:
sriov-manage -e) are gone — VFs are runtime state and are never persisted by SR-IOV;The operator's MIG manager can enable MIG mode, but committing a MIG-mode change on these GPUs requires a GPU reset. The only reset mechanism the Kubernetes mig-manager has is
WITH_REBOOT("reboot the node if changing the MIG mode fails for any reason") — it never performs a targetednvidia-smi --gpu-reset. And VF creation (sriov-manage) is not performed by the operator at all —sriov-manageappears nowhere ingpu-operator. So on the vGPU path, after a reboot, MIG + VFs + vGPUs have to be re-established out-of-band (host units that enable MIG, run--gpu-reset, runsriov-manage -eper PF, then release the operand readiness gate).By contrast, the systemd deployment of
mig-parteddoes handle reboot persistence:nvidia-mig-manager.servicepersists the selected config across reboot (persist_config_across_reboot) and orders itself vianvidia-gpu-reset.target(After=nvidia-fabricmanager.service,Before=nvidia-gpu-reset.target) so MIG reconfiguration, GPU reset, and driver-service ordering are coordinated at boot. The Kubernetes / operator path has no equivalent.Proposal / discussion
WITH_REBOOT). This likely needs coordination withnvidia-persistenced(see nvidia-persistenced keeps a handle on the GPU and blocks --gpu-reset #118), which holds a handle that blocks--gpu-reset.sriov-manage -eper PF) and re-applying the MIG + vGPU device configuration, so a reboot recovers hands-off without host units. This is the smallest useful seam and could be a first change on its own (VF re-enablement in the vGPU device-manager operand before it applies the vGPU config).nvidia-gpu-reset.targetordering) to the operator's operands, or document the boot units as a supported companion for the Kubernetes path.Related
nvidia-persistencedkeeps a handle on the GPU and blocks--gpu-reset.This is the reboot-recovery glue the operator lacks on the vGPU path today.
Environment
NVSwitch HGX 8-GPU node, MIG-backed and whole-card vGPU, host-installed driver, GPU Operator sandbox path (
gpu.workload.config=vm-vgpu).