You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The vGPU Device Manager (nvidia-vgpu-device-manager) is built around the legacy mediated-device (mdev) framework only. On Ada Lovelace and newer GPUs (L40/L40S, RTX Ada, H100, H200, Blackwell) running the vGPU release 17/18+ host driver, the driver no longer exposes mdev — vGPU profiles are assigned per SR-IOV Virtual Function through the vendor-specific VFIO sysfs (/sys/bus/pci/devices/<VF>/nvidia/current_vgpu_type). As a result the GPU Operator has no way to create SR-IOV vGPU devices on these GPUs.
GPU Operator in sandbox mode: sandboxWorkloads.enabled=true, node label nvidia.com/gpu.workload.config=vm-vgpu
vgpu-device-manager image v0.4.2
Problem
On this GPU /sys/bus/mdev/ does not exist by design — the driver uses the vendor-specific VFIO framework (nvidia-smi -q reports Host VGPU Mode : SR-IOV). vGPU is configured per VF: after /usr/lib/nvidia/sriov-manage -e <BDF>, each VF exposes /sys/bus/pci/devices/<VF>/nvidia/{creatable_vgpu_types, current_vgpu_type, gpu_instance_id, placement_id, ...}, and a vGPU is created by writing a type id to current_vgpu_type. There is no mdev bus for the device manager to walk, so it cannot enumerate or create vGPU devices — matching the failure reported in #591:
error getting vGPU config: error getting all vGPU devices: unable to read MDEV devices directory: open /sys/bus/mdev/devices: no such file or directory
With a host-installed driver the vgpu-device-manager init container instead blocks indefinitely on waiting for NVIDIA vGPU Manager to be setup. Either way there is no operator path to create SR-IOV vGPUs on Ada/Hopper.
The rest of the stack is ready — only operator-side creation is missing
KubeVirt consumes these VFs. Its PCI device plugin recognizes a VF bound to the nvidia driver once current_vgpu_type != 0 and advertises it — vGPU: SRIOV support kubevirt/kubevirt#16890 (merged 2026-04-14), available in KubeVirt v1.9 (v1.9.0-beta.0; the added file pkg/virt-handler/device-manager/nvidia.go is present in v1.9.0-beta.0 but not in the v1.8.x releases / release-1.8). The design discussion in Support NVIDIA vGPU "vendor-specific VFIO framework" (SR-IOV vGPU on Ada Lovelace, Hopper, kernel 6.8+) kubevirt/kubevirt#17642 explicitly scopes vGPU profile assignment (current_vgpu_type) as a node-level concern for gpu-operator or a custom DaemonSet — i.e. exactly what this issue asks the operator to do.
Manual creation works:sriov-manage -e + echo <type-id> > .../nvidia/current_vgpu_type yields a working, KubeVirt-consumable vGPU (validated on H200 with a whole-card profile).
Rebinding VFs to vfio-pci is not a workaround — unbinding the VF resets current_vgpu_type to 0.
Request
Add support in vgpu-device-manager (or a dedicated component) for the vendor-specific VFIO / SR-IOV per-VF model (current_vgpu_type), so vGPU devices are declaratively created on Ada Lovelace / Hopper / Blackwell, analogous to what already works for mdev GPUs today. The downstream discovery/advertisement side is already handled by KubeVirt.
Summary
The vGPU Device Manager (
nvidia-vgpu-device-manager) is built around the legacy mediated-device (mdev) framework only. On Ada Lovelace and newer GPUs (L40/L40S, RTX Ada, H100, H200, Blackwell) running the vGPU release 17/18+ host driver, the driver no longer exposes mdev — vGPU profiles are assigned per SR-IOV Virtual Function through the vendor-specific VFIO sysfs (/sys/bus/pci/devices/<VF>/nvidia/current_vgpu_type). As a result the GPU Operator has no way to create SR-IOV vGPU devices on these GPUs.Environment
10de:2335)580.159.01(NVIDIA vGPU release 18,-vgpu-kvm)sandboxWorkloads.enabled=true, node labelnvidia.com/gpu.workload.config=vm-vgpuvgpu-device-managerimagev0.4.2Problem
On this GPU
/sys/bus/mdev/does not exist by design — the driver uses the vendor-specific VFIO framework (nvidia-smi -qreportsHost VGPU Mode : SR-IOV). vGPU is configured per VF: after/usr/lib/nvidia/sriov-manage -e <BDF>, each VF exposes/sys/bus/pci/devices/<VF>/nvidia/{creatable_vgpu_types, current_vgpu_type, gpu_instance_id, placement_id, ...}, and a vGPU is created by writing a type id tocurrent_vgpu_type. There is no mdev bus for the device manager to walk, so it cannot enumerate or create vGPU devices — matching the failure reported in #591:With a host-installed driver the
vgpu-device-managerinit container instead blocks indefinitely onwaiting for NVIDIA vGPU Manager to be setup. Either way there is no operator path to create SR-IOV vGPUs on Ada/Hopper.The rest of the stack is ready — only operator-side creation is missing
nvidiadriver oncecurrent_vgpu_type != 0and advertises it — vGPU: SRIOV support kubevirt/kubevirt#16890 (merged 2026-04-14), available in KubeVirt v1.9 (v1.9.0-beta.0; the added filepkg/virt-handler/device-manager/nvidia.gois present in v1.9.0-beta.0 but not in the v1.8.x releases /release-1.8). The design discussion in Support NVIDIA vGPU "vendor-specific VFIO framework" (SR-IOV vGPU on Ada Lovelace, Hopper, kernel 6.8+) kubevirt/kubevirt#17642 explicitly scopes vGPU profile assignment (current_vgpu_type) as a node-level concern forgpu-operatoror a custom DaemonSet — i.e. exactly what this issue asks the operator to do.sriov-manage -e+echo <type-id> > .../nvidia/current_vgpu_typeyields a working, KubeVirt-consumable vGPU (validated on H200 with a whole-card profile).vfio-pciis not a workaround — unbinding the VF resetscurrent_vgpu_typeto 0.Request
Add support in
vgpu-device-manager(or a dedicated component) for the vendor-specific VFIO / SR-IOV per-VF model (current_vgpu_type), so vGPU devices are declaratively created on Ada Lovelace / Hopper / Blackwell, analogous to what already works for mdev GPUs today. The downstream discovery/advertisement side is already handled by KubeVirt.References