1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
|
semantic: 0.927
debug: 0.921
permissions: 0.919
assembly: 0.891
mistranslation: 0.888
register: 0.888
performance: 0.886
user-level: 0.882
virtual: 0.870
device: 0.868
graphic: 0.863
architecture: 0.861
arm: 0.855
kernel: 0.845
risc-v: 0.844
network: 0.842
PID: 0.836
vnc: 0.831
boot: 0.816
ppc: 0.799
KVM: 0.792
peripherals: 0.786
socket: 0.785
hypervisor: 0.784
files: 0.776
VMM: 0.776
x86: 0.765
TCG: 0.760
i386: 0.601
Regression: QEMU 4.0 hangs the host (*bisect included*)
The commit b2fc91db84470a78f8e93f5b5f913c17188792c8 seemingly introduced a regression on my system.
When I start QEMU, the guest and the host hang (I need a hard reset to get back to a working system), before anything shows on the guest.
I use QEMU with GPU passthrough (which worked perfectly until the commit above). This is the command I use:
```
/path/to/qemu-system-x86_64
-drive if=pflash,format=raw,readonly,file=/path/to/OVMF_CODE.fd
-drive if=pflash,format=raw,file=/tmp/OVMF_VARS.fd.tmp
-enable-kvm
-machine q35,accel=kvm,mem-merge=off
-cpu host,kvm=off,hv_vendor_id=vgaptrocks,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time
-smp 4,cores=4,sockets=1,threads=1
-m 10240
-vga none
-rtc base=localtime
-serial none
-parallel none
-usb
-device usb-tablet
-device vfio-pci,host=01:00.0,multifunction=on
-device vfio-pci,host=01:00.1
-device usb-host,vendorid=<vid>,productid=<pid>
-device usb-host,vendorid=<vid>,productid=<pid>
-device usb-host,vendorid=<vid>,productid=<pid>
-device usb-host,vendorid=<vid>,productid=<pid>
-device usb-host,vendorid=<vid>,productid=<pid>
-device usb-host,vendorid=<vid>,productid=<pid>
-device virtio-scsi-pci,id=scsi
-drive file=/path/to/guest.img,id=hdd1,format=qcow2,if=none,cache=writeback
-device scsi-hd,drive=hdd1
-net nic,model=virtio
-net user,smb=/path/to/shared
```
If I run QEMU without GPU passthrough, it runs fine.
Some details about my system:
- O/S: Mint 19.1 x86-64 (it's based on Ubuntu 18.04)
- Kernel: 4.15
- `configure` options: `--target-list=x86_64-softmmu --enable-gtk --enable-spice --audio-drv-list=pa`
- EDK2 version: 1a734ed85fda71630c795832e6d24ea560caf739 (20/Apr/2019)
- CPU: i7-6700k
- Motherboard: ASRock Z170 Gaming-ITX/ac
- VGA: Gigabyte GTX 960 Mini-ITX
Does adding "kernel_irqchip=on" to the comma separated list of options for -machine resolve it?
> Does adding "kernel_irqchip=on" to the comma separated list of options for -machine resolve it?
Yes, that solved it, thanks!
This seems related to INTx (legacy) interrupt mode, which NVIDIA GeForce will use by default. Using regedit in a Windows VM or adjusting nvidia.ko module parameters of a Linux VM can enable the driver to use MSI, which seems unaffected. We also have the vfio-pci device option x-no-kvm-intx=on, which is probably a good compliment to configuring the driver to use MSI until we get this figured out, as the Windows driver likes to occasional switch MSI off, which would leave you in a bad state. Routing INTx through QEMU would be a performance regression though, so while a workaround, having it routed through QEMU and not using MSI, is not a great combination.
Not just NVIDIA, forcing a NIC to use INTx also fails and it's apparent from the host that the device is stuck with DisINTx+. Looks like the resampling mechanism that allows KVM to unmask the interrupt is broken with split irqchip.
ok, so, if I understand correctly, the workaround is:
- set `x-no-kvm-intx=on` and enable MSI in the guest (via regedit or module params)
which may lead to a performance regression (at least under certain circumstances).
Is it therefore preferrable, performance and configuration-wise, to use QEMU 3.1.0, if there are no 4.0.0 feature requirements, until this issue is sorted out?
The change in QEMU 4.0 is only a change in defaults of the machine type, it can be entirely reverted in the VM config with kernel_irqchip=on or <ioapic driver='kvm'/> with libvirt. Using a machine type prior to the q35 4.0 machine type would also avoid it. There are no performance issues with these configurations that would favor using 3.1 over 4.0.
> The change in QEMU 4.0 is only a change in defaults of the machine type, it can be entirely reverted in the VM config with kernel_irqchip=on or <ioapic driver='kvm'/> with libvirt. Using a machine type prior to the q35 4.0 machine type would also avoid it. There are no performance issues with these configurations that would favor using 3.1 over 4.0.
Thanks for the detailed answer :-)
Just to provide an update, patches are posted to revert this change in both the q35 4.1 machine type for the next release as well as introduce a q35 4.0.1 machine type making the same change for 4.0-stable. References:
https://patchwork.ozlabs.org/patch/1099695/
https://patchwork.ozlabs.org/patch/1099659/
Fix has been included here:
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c87759ce876a7a0b17c2b
|