summary refs log tree commit diff stats
path: root/results/classifier/accel-gemma3:12b/kvm/1998
blob: 7849f2057ca2b2832f18d02f9ae6b59ee692d151 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
acpihp does not work with some common guest kernels
Description of problem:
for pc-q35 6.1, 7.2, any guest kernel with `ACPI: Core revision` < 20230331, can not hot plug the nvidia GPUs.
So basically only guest kernel >= 6.5 can make it work so far.
But majority of server kernels are still at 4.18, 5.x. I wonder if it possible to be fixed?
I also don't know is this qemu bug? bios bug? or actually ACPIA's bug?

journal -k report error like following:
```
Nov 11 17:53:00 VMTEST kernel: pci 0000:08:00.0: BAR 0: no space for [mem size 0x01000000]
Nov 11 17:53:00 VMTEST kernel: pci 0000:08:00.0: BAR 0: failed to assign [mem size 0x01000000]
Nov 11 17:53:00 VMTEST kernel: pci 0000:08:00.0: BAR 6: assigned [mem 0x81800000-0x8187ffff pref]
Nov 11 17:53:00 VMTEST kernel: pci 0000:08:00.0: BAR 5: assigned [io  0xa000-0xa07f]
Nov 11 17:53:00 VMTEST kernel: nvidia 0000:08:00.0: enabling device (0000 -> 0003)
Nov 11 17:53:00 VMTEST kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
                                                NVRM: BAR0 is 0M @ 0x0 (PCI:0000:08:00.0)
Nov 11 17:53:00 VMTEST kernel: nvidia: probe of 0000:08:00.0 failed with error -1
```
Steps to reproduce:
1. run the instance as I described above
2. in qemu monitor: device_add vfio-pci,host=0000:06:00.0,id=gpu0,bus=pci.8
3. login to the vm console then nvidia-smi to see the failure

workaround:
`ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off` to disable the acpihp then pciehp can make it work.
Additional information: