-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Labels
RCA doneRoot Cause Analysis doneRoot Cause Analysis donebugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Description
Problem Description
Hi! The operator is on but fail with error:
E0709 16:15:22.567458 1 cmdutils.go:43] "problem running manager" err="failed to wait for DriverAndPluginReconciler caches to sync kind source: *v1beta1.NodeModulesConfig: timed out waiting for cache to be synced for Kind *v1beta1.NodeModulesConfig" logger="amd-gpu.setup"
The helm is installed with values:
node-feature-discovery:
enabled: true
kmm:
enabled: false
installdefaultNFDRule: true
controllerManager:
manager:
image:
repository: registry.${var.secrets.general.localdomainname}/images/rocm-gpu-operator
imagePullSecrets: "regcred"
With kmm disabled.
The operator is v1.3.0, the helm version is also v1.3.0.
The operator search the CRD resources of the disabled kmm.
https://github.com/ROCm/gpu-operator/blob/main/helm-charts-k8s/charts/kmm/crds/nodemodulesconfig-crd.yaml
Extended logs:
I0709 16:20:23.975313 1 main.go:97] "Creating manager" logger="amd-gpu.setup" version="v1.3.0" git commit="b6ec613e" build tag="latest"
I0709 16:20:23.975392 1 main.go:99] "Parsing configuration file" logger="amd-gpu.setup" path="controller_manager_config.yaml"
I0709 16:20:23.982549 1 utils.go:198] "IsOpenShift: false" logger="amd-gpu.setup"
I0709 16:20:23.982821 1 main.go:145] "starting manager" logger="amd-gpu.setup"
I0709 16:20:23.982898 1 server.go:208] "Starting metrics server" logger="amd-gpu.controller-runtime.metrics"
I0709 16:20:23.982948 1 server.go:83] "starting server" logger="amd-gpu" name="health probe" addr="[::]:8081"
I0709 16:20:23.982960 1 server.go:247] "Serving metrics server" logger="amd-gpu.controller-runtime.metrics" bindAddress="127.0.0.1:8080" secure=false
I0709 16:20:23.983058 1 leaderelection.go:257] attempting to acquire leader lease infrastructure/gpu.amd.com...
I0709 16:20:39.195249 1 leaderelection.go:271] successfully acquired lease infrastructure/gpu.amd.com
I0709 16:20:39.195631 1 controller.go:198] "Starting EventSource" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1alpha1.DeviceConfig"
I0709 16:20:39.195656 1 controller.go:198] "Starting EventSource" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1.Pod"
I0709 16:20:39.195681 1 controller.go:198] "Starting EventSource" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1.Service"
I0709 16:20:39.195731 1 controller.go:198] "Starting EventSource" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1.Node"
I0709 16:20:39.195727 1 controller.go:198] "Starting EventSource" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1.Secret"
I0709 16:20:39.195791 1 controller.go:198] "Starting EventSource" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1beta1.NodeModulesConfig"
I0709 16:20:39.195828 1 controller.go:198] "Starting EventSource" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1.DaemonSet"
I0709 16:20:39.195825 1 controller.go:198] "Starting EventSource" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1beta1.Module"
E0709 16:20:39.212154 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:20:39.215704 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:20:49.197898 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:20:49.199870 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:20:59.198931 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:20:59.200973 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:21:09.198204 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:21:09.200110 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:21:19.198266 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:21:19.200857 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:21:29.201430 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:21:29.203700 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:21:39.199018 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:21:39.201032 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:21:49.198098 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:21:49.200081 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:21:59.198257 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:21:59.200101 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:22:09.198809 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:22:09.202906 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:22:19.198572 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:22:19.200475 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:22:29.198003 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
E0709 16:22:29.199931 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Module\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="Module.kmm.sigs.x-k8s.io"
E0709 16:22:39.196885 1 controller.go:210] "Could not wait for Cache to sync" err="failed to wait for DriverAndPluginReconciler caches to sync kind source: *v1beta1.Module: timed out waiting for cache to be synced for Kind *v1beta1.Module" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1beta1.Module"
E0709 16:22:39.196986 1 controller.go:210] "Could not wait for Cache to sync" err="failed to wait for DriverAndPluginReconciler caches to sync kind source: *v1beta1.NodeModulesConfig: timed out waiting for cache to be synced for Kind *v1beta1.NodeModulesConfig" logger="amd-gpu" controller="DriverAndPluginReconciler" controllerGroup="amd.com" controllerKind="DeviceConfig" source="kind source: *v1beta1.NodeModulesConfig"
I0709 16:22:39.197067 1 internal.go:538] "Stopping and waiting for non leader election runnables" logger="amd-gpu"
I0709 16:22:39.197097 1 internal.go:542] "Stopping and waiting for leader election runnables" logger="amd-gpu"
I0709 16:22:39.197119 1 internal.go:550] "Stopping and waiting for caches" logger="amd-gpu"
E0709 16:22:39.199161 1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"NodeModulesConfig\" in version \"kmm.sigs.x-k8s.io/v1beta1\"" logger="amd-gpu.controller-runtime.source.EventHandler" kind="NodeModulesConfig.kmm.sigs.x-k8s.io"
I0709 16:22:39.199214 1 internal.go:554] "Stopping and waiting for webhooks" logger="amd-gpu"
I0709 16:22:39.199231 1 internal.go:557] "Stopping and waiting for HTTP servers" logger="amd-gpu"
I0709 16:22:39.199256 1 server.go:254] "Shutting down metrics server with timeout of 1 minute" logger="amd-gpu.controller-runtime.metrics"
I0709 16:22:39.199263 1 server.go:68] "shutting down server" logger="amd-gpu" name="health probe" addr="[::]:8081"
I0709 16:22:39.199344 1 internal.go:561] "Wait completed, proceeding to shutdown the manager" logger="amd-gpu"
E0709 16:22:39.199392 1 cmdutils.go:43] "problem running manager" err="failed to wait for DriverAndPluginReconciler caches to sync kind source: *v1beta1.Module: timed out waiting for cache to be synced for Kind *v1beta1.Module" logger="amd-gpu.setup"
Operating System
Gentoo
CPU
Intel(R) Core(TM) i7-6700HQ
GPU
not on master node
ROCm Version
not on master node
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Metadata
Metadata
Assignees
Labels
RCA doneRoot Cause Analysis doneRoot Cause Analysis donebugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request