Skip to content

The AMD GPU fail to init in the RHEL96 VM #18

@yanghangliu

Description

@yanghangliu

Test environment:
VM:
rocm-7.1.1.70101-38.el8.x86_64
amdgpu-dkms-firmware-30.20.1.0.30200100-2255209.el9.noarch
amdgpu-dkms-6.16.6-2255209.el9.noarch
amdgpu-core-7.1.70101-2255337.el9.noarch
5.14.0-570.75.1.el9_6.x86_64

Test steps to reproduce the issue:
[1] make sure the inbox amdgpu is blacklisted
[2] start a RHEL96 VM with a passthrough AMD MI300X GPU
[3] install the amdgpu related packages

[VM] # os_version=${1:-"9.6"}
[VM] # cat > /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/$os_version/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
[ROCm]
name=ROCm
baseurl=https://repo.radeon.com/rocm/el9/latest/main/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF

[4] reboot the VM
[5] load amdgpu driver in the VM
[6] check the VM dmesg

[VM] # dmesg
...
[ 70.449258] amdkcl: loading out-of-tree module taints kernel.
[ 70.449304] amdkcl: module verification failed: signature and/or required key missing - tainting kernel
[ 73.210070] [drm] amdgpu kernel modesetting enabled.
[ 73.210096] [drm] amdgpu version: 6.16.6
[ 73.210107] [drm] OS DRM version: 6.12.0
[ 73.211415] amdgpu: Virtual CRAT table created for CPU
[ 73.211500] amdgpu: Topology: Add CPU node
[ 73.222175] amdgpu 0000:04:00.0: amdgpu: initializing kernel modesetting (IP DISCOVERY 0x1002:0x74A1 0x1002:0x74A1 0x00).
[ 73.222518] amdgpu 0000:04:00.0: amdgpu: register mmio base: 0x82400000
[ 73.222545] amdgpu 0000:04:00.0: amdgpu: register mmio size: 2097152
[ 79.226575] amdgpu 0000:04:00.0: amdgpu: failed to read discovery info from memory, vram size read: 0
[ 79.226669] amdgpu 0000:04:00.0: amdgpu: [drm] ERROR discovery failed: -2
[ 79.226707] amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
[ 79.226820] amdgpu 0000:04:00.0: amdgpu: amdgpu: finishing device.
[ 79.229911] amdgpu: probe of 0000:04:00.0 failed with error -2
[ 79.230021] amdgpu: legacy kernel without apple_gmux_detect()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions