When I installed the GPU operator v26.3.3 using Helm, I encountered some issues. I installed the driver and toolkit locally. When installing the gpu operator, I set driver:enable: false and toolkit:enable: false. However, during the installation process, the nvidia-operator-validator kept getting stuck at the initialization stage. The driver-validation was unable to pass the certification.
Environment
- GPU Operator Version: v26.3.3
- OS: ubuntu26.04
- Container Runtime Version: containerd://2.3.2
- Kubernetes Distro and Version: v1.35.6
The container log of the "driver-validation" shows that nvidia-smi has been called, but the subsequent prompt reads: "No pre-installed driver detected on the host: exit status 14".
time="2026-07-03T07:54:31Z" level=info msg="version: b0a49c0e-amd64, commit: b0a49c0"
time="2026-07-03T07:54:31Z" level=info msg="Attempting to validate a pre-installed driver on the host"
Fri Jul 3 15:54:32 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.159.03 Driver Version: 580.159.03 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla M60 Off | 00000000:06:00.0 Off | Off |
| N/A 64C P0 43W / 150W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla M60 Off | 00000000:07:00.0 Off | Off |
| N/A 52C P0 40W / 150W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla M60 Off | 00000000:84:00.0 Off | Off |
| N/A 61C P0 41W / 150W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla M60 Off | 00000000:85:00.0 Off | Off |
| N/A 47C P0 40W / 150W | 0MiB / 8192MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:07:00.0
time="2026-07-03T07:54:33Z" level=info msg="No pre-installed driver detected on the host: exit status 14"
time="2026-07-03T07:54:33Z" level=info msg="Validating containerized driver installation"
time="2026-07-03T07:54:33Z" level=info msg="Attempting to validate a driver container installation"
time="2026-07-03T07:54:33Z" level=warning msg="failed to validate the driver, retrying after 5 seconds\n"
time="2026-07-03T07:54:38Z" level=info msg="Attempting to validate a driver container installation"
time="2026-07-03T07:54:38Z" level=warning msg="failed to validate the driver, retrying after 5 seconds\n"
time="2026-07-03T07:54:43Z" level=info msg="Attempting to validate a driver container installation"
Below are my configuration details.
ccManager:
defaultMode: 'on'
enabled: true
hostNetwork: false
image: k8s-cc-manager
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia/cloud-native
resources: {}
version: v0.4.0
cdi:
enabled: true
nriPluginEnabled: false
daemonsets:
annotations: {}
labels: {}
priorityClassName: system-node-critical
rollingUpdate:
maxUnavailable: '1'
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
updateStrategy: RollingUpdate
dcgm:
args: []
enabled: false
env: []
hostNetwork: false
image: dcgm
imagePullPolicy: IfNotPresent
repository: nvcr.io/nvidia/cloud-native
resources: {}
version: 4.5.2-1-ubuntu22.04
dcgmExporter:
annotations: {}
enablePodLabels: false
enablePodUID: false
enabled: true
env: []
hostNetwork: false
hostPID: false
image: dcgm-exporter
imagePullPolicy: IfNotPresent
repository: nvcr.io/nvidia/k8s
resources: {}
service:
internalTrafficPolicy: Cluster
serviceMonitor:
additionalLabels: {}
enabled: false
honorLabels: false
interval: 15s
relabelings: []
version: 4.5.3-4.8.2-distroless
devicePlugin:
args: []
config:
create: false
data: {}
default: ''
name: ''
enabled: true
env: []
hostNetwork: false
image: k8s-device-plugin
imagePullPolicy: IfNotPresent
imagePullSecrets: []
mps:
root: /run/nvidia/mps
repository: nvcr.io/nvidia
resources: {}
version: v0.19.3
driver:
certConfig:
name: ''
enabled: false
env: []
hostNetwork: false
image: driver
imagePullPolicy: IfNotPresent
imagePullSecrets: []
kernelModuleConfig:
name: ''
kernelModuleType: auto
licensingConfig:
nlsEnabled: true
secretName: ''
manager:
env: []
image: k8s-driver-manager
imagePullPolicy: IfNotPresent
repository: nvcr.io/nvidia/cloud-native
version: v0.11.0
nvidiaDriverCRD:
deployDefaultCR: true
driverType: gpu
enabled: false
nodeSelector: {}
rdma:
enabled: false
useHostMofed: false
repoConfig:
configMapName: ''
repository: nvcr.io/nvidia
resources: {}
secretEnv: ''
startupProbe:
failureThreshold: 120
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 60
upgradePolicy:
autoUpgrade: true
drain:
deleteEmptyDir: false
enable: false
force: false
podSelector: ''
timeoutSeconds: 300
gpuPodDeletion:
deleteEmptyDir: false
force: false
timeoutSeconds: 300
maxParallelUpgrades: 1
maxUnavailable: 25%
waitForCompletion:
podSelector: ''
timeoutSeconds: 0
usePrecompiled: false
version: 580.126.20
virtualTopology:
config: ''
extraObjects: []
gdrcopy:
args: []
enabled: false
env: []
image: gdrdrv
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia/cloud-native
version: v2.5.2
gds:
args: []
enabled: false
env: []
image: nvidia-fs
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia/cloud-native
version: 2.27.3
gfd:
enabled: true
env: []
hostNetwork: false
image: k8s-device-plugin
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia
resources: {}
version: v0.19.3
hostPaths:
driverInstallDir: /run/nvidia/driver
rootFS: /
kataManager:
config: {}
enabled: false
env: []
hostNetwork: false
imagePullPolicy: IfNotPresent
imagePullSecrets: []
resources: {}
kataSandboxDevicePlugin:
args: []
enabled: false
env: []
hostNetwork: false
image: nvidia-sandbox-device-plugin
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia/cloud-native
resources: {}
version: v0.0.3
mig:
strategy: none
migManager:
config:
create: false
data: {}
default: all-disabled
name: ''
enabled: false
env: []
gpuClientsConfig:
name: ''
hostNetwork: false
image: k8s-mig-manager
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia/cloud-native
resources: {}
version: v0.14.2
nfd:
enabled: true
nodefeaturerules: false
node-feature-discovery:
gc:
enable: true
replicaCount: 1
serviceAccount:
create: false
name: node-feature-discovery
master:
config:
extraLabelNs:
- nvidia.com
serviceAccount:
create: true
name: node-feature-discovery
priorityClassName: system-node-critical
worker:
config:
sources:
pci:
deviceClassWhitelist:
- '02'
- '0200'
- '0207'
- '0300'
- '0302'
deviceLabelFields:
- vendor
serviceAccount:
create: false
name: node-feature-discovery
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Equal
value: ''
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
nodeStatusExporter:
enabled: false
hostNetwork: false
image: gpu-operator
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia
resources: {}
operator:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: In
values:
- ''
weight: 1
annotations:
openshift.io/scc: restricted-readonly
cleanupCRD: false
image: gpu-operator
imagePullPolicy: IfNotPresent
imagePullSecrets: []
logging:
develMode: false
level: info
timeEncoding: epoch
priorityClassName: system-node-critical
repository: nvcr.io/nvidia
resources:
limits:
cpu: 500m
memory: 350Mi
requests:
cpu: 200m
memory: 100Mi
runtimeClass: nvidia
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Equal
value: ''
upgradeCRD: true
use_ocp_driver_toolkit: false
platform:
openshift: false
psa:
enabled: false
sandboxDevicePlugin:
args: []
enabled: false
env: []
hostNetwork: false
image: kubevirt-gpu-device-plugin
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia
resources: {}
version: v1.5.0
sandboxWorkloads:
defaultWorkload: container
enabled: false
mode: kubevirt
toolkit:
enabled: false
env: []
hostNetwork: false
image: container-toolkit
imagePullPolicy: IfNotPresent
imagePullSecrets: []
installDir: /usr/local/nvidia
repository: nvcr.io/nvidia/k8s
resources: {}
version: v1.19.1
validator:
args: []
env: []
hostNetwork: false
image: gpu-operator
imagePullPolicy: IfNotPresent
imagePullSecrets: []
plugin:
env: []
repository: nvcr.io/nvidia
resources: {}
vfioManager:
driverManager:
env: []
image: k8s-driver-manager
imagePullPolicy: IfNotPresent
repository: nvcr.io/nvidia/cloud-native
version: v0.11.0
enabled: false
env: []
hostNetwork: false
image: k8s-driver-manager
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia/cloud-native
resources: {}
version: v0.11.0
vgpuDeviceManager:
config:
default: default
name: ''
enabled: false
env: []
hostNetwork: false
image: vgpu-device-manager
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia/cloud-native
version: v0.4.2
vgpuManager:
driverManager:
env: []
image: k8s-driver-manager
imagePullPolicy: IfNotPresent
repository: nvcr.io/nvidia/cloud-native
version: v0.11.0
enabled: false
env: []
hostNetwork: false
image: vgpu-manager
imagePullPolicy: IfNotPresent
imagePullSecrets: []
kernelModuleConfig:
name: ''
repository: ''
resources: {}
version: ''
global:
cattle:
systemProjectId: p-5hmjt
root@dell:/data/tmp# cat /etc/containerd/config.toml
disabled_plugins = []
imports = ["/etc/containerd/conf.d/*.toml"]
oom_score = 0
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 4
[cgroup]
path = ""
[debug]
format = ""
level = ""
log_trace_id = false
[plugins]
[plugins."io.containerd.cri.v1.images"]
concurrent_layer_fetch_buffer = 0
disable_snapshot_annotations = true
discard_unpacked_layers = false
image_pull_progress_timeout = "5m0s"
image_pull_with_sync_fs = false
max_concurrent_downloads = 3
snapshotter = "overlayfs"
stats_collect_period = 10
use_local_image_pull = false
[plugins."io.containerd.cri.v1.images".image_decryption]
key_model = "node"
[plugins."io.containerd.cri.v1.images".pinned_images]
sandbox = "registry.aliyuncs.com/chenby/pause:3.10.2"
[plugins."io.containerd.cri.v1.images".registry]
config_path = "/etc/containerd/certs.d"
[plugins."io.containerd.cri.v1.runtime"]
cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
device_ownership_from_security_context = false
disable_apparmor = false
disable_hugetlb_controller = true
disable_proc_mount = false
drain_exec_sync_io_timeout = "0s"
enable_cdi = true
enable_selinux = false
enable_unprivileged_icmp = true
enable_unprivileged_ports = true
ignore_deprecation_warnings = []
ignore_image_defined_volumes = false
max_container_log_line_size = 16384
netns_mounts_under_state_dir = false
restrict_oom_score_adj = false
selinux_category_range = 1024
stats_collect_period = ""
stats_retention_period = ""
tolerate_missing_hugetlb_controller = true
unset_seccomp_profile = ""
[plugins."io.containerd.cri.v1.runtime".cni]
bin_dir = ""
bin_dirs = ["/opt/cni/bin"]
conf_dir = "/etc/cni/net.d"
conf_template = ""
ip_pref = ""
max_conf_num = 1
setup_serially = false
use_internal_loopback = false
[plugins."io.containerd.cri.v1.runtime".containerd]
default_runtime_name = "runc"
ignore_blockio_not_enabled_errors = false
ignore_rdt_not_enabled_errors = false
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes]
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.runc]
base_runtime_spec = ""
cgroup_writable = false
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
io_type = ""
pod_annotations = []
privileged_without_host_devices = false
privileged_without_host_devices_all_devices_allowed = false
runtime_path = ""
runtime_type = "io.containerd.runc.v2"
sandboxer = "podsandbox"
snapshotter = ""
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.runc.options]
BinaryName = ""
CriuImagePath = ""
CriuWorkPath = ""
IoGid = 0
IoUid = 0
NoNewKeyring = false
Root = ""
ShimCgroup = ""
SystemdCgroup = true
[plugins."io.containerd.differ.v1.erofs"]
enable_dmverity = false
enable_tar_index = false
mkfs_options = []
[plugins."io.containerd.gc.v1.scheduler"]
deletion_threshold = 0
mutation_threshold = 100
pause_threshold = 0.02
schedule_delay = "0s"
startup_delay = "100ms"
[plugins."io.containerd.grpc.v1.cri"]
disable_tcp_service = true
enable_tls_streaming = false
stream_idle_timeout = "4h0m0s"
stream_server_address = "127.0.0.1"
stream_server_port = "0"
[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = ""
[plugins."io.containerd.image-verifier.v1.bindir"]
bin_dir = "/opt/containerd/image-verifier/bin"
max_verifiers = 10
per_verifier_timeout = "10s"
[plugins."io.containerd.internal.v1.opt"]
path = "/opt/containerd"
[plugins."io.containerd.internal.v1.tracing"]
[plugins."io.containerd.metadata.v1.bolt"]
content_sharing_policy = "shared"
no_sync = false
[plugins."io.containerd.metrics.v1.grpc-prometheus"]
grpc_histogram = false
[plugins."io.containerd.monitor.container.v1.restart"]
interval = "10s"
[plugins."io.containerd.monitor.task.v1.cgroups"]
no_prometheus = false
[plugins."io.containerd.mount-handler.v1.erofs"]
[plugins."io.containerd.nri.v1.nri"]
disable = false
disable_connections = false
plugin_config_path = "/etc/nri/conf.d"
plugin_path = "/opt/nri/plugins"
plugin_registration_timeout = "5s"
plugin_request_timeout = "2s"
socket_path = "/var/run/nri/nri.sock"
[plugins."io.containerd.nri.v1.nri".default_validator]
enable = false
reject_custom_seccomp_adjustment = false
reject_namespace_adjustment = false
reject_oci_hook_adjustment = false
reject_runtime_default_seccomp_adjustment = false
reject_sysctl_adjustment = false
reject_unconfined_seccomp_adjustment = false
required_plugins = []
tolerate_missing_plugins_annotation = ""
[plugins."io.containerd.runtime.v2.task"]
platforms = ["linux/amd64"]
[plugins."io.containerd.server.v1.debug"]
address = ""
gid = 0
uid = 0
[plugins."io.containerd.server.v1.grpc"]
address = "/run/containerd/containerd.sock"
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
uid = 0
[plugins."io.containerd.server.v1.grpc-tcp"]
address = ""
max_recv_message_size = 16777216
max_send_message_size = 16777216
tls_ca = ""
tls_cert = ""
tls_common_name = ""
tls_key = ""
[plugins."io.containerd.server.v1.metrics"]
address = ""
[plugins."io.containerd.server.v1.ttrpc"]
address = "/run/containerd/containerd.sock.ttrpc"
gid = 0
uid = 0
[plugins."io.containerd.service.v1.diff-service"]
default = ["walking"]
sync_fs = false
[plugins."io.containerd.service.v1.tasks-service"]
blockio_config_file = ""
rdt_config_file = ""
[plugins."io.containerd.shim.v1.manager"]
env = []
socket_dir = ""
[plugins."io.containerd.snapshotter.v1.blockfile"]
fs_type = ""
mount_options = []
recreate_scratch = false
root_path = ""
scratch_file = ""
[plugins."io.containerd.snapshotter.v1.btrfs"]
root_path = ""
[plugins."io.containerd.snapshotter.v1.devmapper"]
async_remove = false
base_image_size = ""
discard_blocks = false
fs_options = ""
fs_type = ""
pool_name = ""
root_path = ""
[plugins."io.containerd.snapshotter.v1.erofs"]
default_size = ""
dmverity_mode = ""
enable_fsverity = false
ovl_mount_options = []
root_path = ""
set_immutable = false
[plugins."io.containerd.snapshotter.v1.native"]
root_path = ""
[plugins."io.containerd.snapshotter.v1.overlayfs"]
mount_options = []
root_path = ""
slow_chown = false
sync_remove = false
upperdir_label = false
[plugins."io.containerd.snapshotter.v1.zfs"]
root_path = ""
[plugins."io.containerd.tracing.processor.v1.otlp"]
[plugins."io.containerd.transfer.v1.local"]
check_platform_supported = false
concurrent_layer_fetch_buffer = 0
config_path = ""
max_concurrent_downloads = 3
max_concurrent_unpacks = 1
max_concurrent_uploaded_layers = 3
[stream_processors]
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar"]
accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/vnd.oci.image.layer.v1.tar"
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"]
accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/vnd.oci.image.layer.v1.tar+gzip"
[timeouts]
"io.containerd.timeout.bolt.open" = "0s"
"io.containerd.timeout.cri.defercleanup" = "1m0s"
"io.containerd.timeout.metrics.shimstats" = "2s"
"io.containerd.timeout.shim.cleanup" = "5s"
"io.containerd.timeout.shim.load" = "5s"
"io.containerd.timeout.shim.shutdown" = "3s"
"io.containerd.timeout.task.state" = "2s"
root@dell:/data/tmp# cat /etc/containerd/c
certs.d/ conf.d/ config.toml config.toml.bak
root@dell:/data/tmp# cat /etc/containerd/conf.d/99-nvidia.toml
version = 4
[plugins]
[plugins."io.containerd.cri.v1.runtime"]
cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
device_ownership_from_security_context = false
disable_apparmor = false
disable_hugetlb_controller = true
disable_proc_mount = false
drain_exec_sync_io_timeout = "0s"
enable_cdi = true
enable_selinux = false
enable_unprivileged_icmp = true
enable_unprivileged_ports = true
ignore_deprecation_warnings = []
ignore_image_defined_volumes = false
max_container_log_line_size = 16384
netns_mounts_under_state_dir = false
restrict_oom_score_adj = false
selinux_category_range = 1024
stats_collect_period = ""
stats_retention_period = ""
tolerate_missing_hugetlb_controller = true
unset_seccomp_profile = ""
[plugins."io.containerd.cri.v1.runtime".cni]
bin_dir = ""
bin_dirs = ["/opt/cni/bin"]
conf_dir = "/etc/cni/net.d"
conf_template = ""
ip_pref = ""
max_conf_num = 1
setup_serially = false
use_internal_loopback = false
[plugins."io.containerd.cri.v1.runtime".containerd]
default_runtime_name = "runc"
ignore_blockio_not_enabled_errors = false
ignore_rdt_not_enabled_errors = false
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes]
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.nvidia]
base_runtime_spec = ""
cgroup_writable = false
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
io_type = ""
pod_annotations = []
privileged_without_host_devices = false
privileged_without_host_devices_all_devices_allowed = false
runtime_path = ""
runtime_type = "io.containerd.runc.v2"
sandboxer = "podsandbox"
snapshotter = ""
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
CriuImagePath = ""
CriuWorkPath = ""
IoGid = 0
IoUid = 0
NoNewKeyring = false
Root = ""
ShimCgroup = ""
SystemdCgroup = true
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.runc]
base_runtime_spec = ""
cgroup_writable = false
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
io_type = ""
pod_annotations = []
privileged_without_host_devices = false
privileged_without_host_devices_all_devices_allowed = false
runtime_path = ""
runtime_type = "io.containerd.runc.v2"
sandboxer = "podsandbox"
snapshotter = ""
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.runc.options]
BinaryName = ""
CriuImagePath = ""
CriuWorkPath = ""
IoGid = 0
IoUid = 0
NoNewKeyring = false
Root = ""
ShimCgroup = ""
SystemdCgroup = true
When I installed the GPU operator v26.3.3 using Helm, I encountered some issues. I installed the driver and toolkit locally. When installing the gpu operator, I set driver:enable: false and toolkit:enable: false. However, during the installation process, the nvidia-operator-validator kept getting stuck at the initialization stage. The driver-validation was unable to pass the certification.
Environment
The container log of the "driver-validation" shows that nvidia-smi has been called, but the subsequent prompt reads: "No pre-installed driver detected on the host: exit status 14".
Below are my configuration details.