diff options
Diffstat (limited to 'results/classifier/118/permissions/1867519')
| -rw-r--r-- | results/classifier/118/permissions/1867519 | 280 |
1 files changed, 280 insertions, 0 deletions
diff --git a/results/classifier/118/permissions/1867519 b/results/classifier/118/permissions/1867519 new file mode 100644 index 000000000..188b01e92 --- /dev/null +++ b/results/classifier/118/permissions/1867519 @@ -0,0 +1,280 @@ +permissions: 0.858 +user-level: 0.843 +semantic: 0.842 +register: 0.837 +architecture: 0.835 +socket: 0.830 +peripherals: 0.829 +risc-v: 0.826 +debug: 0.826 +ppc: 0.826 +performance: 0.819 +VMM: 0.810 +hypervisor: 0.805 +assembly: 0.801 +device: 0.791 +vnc: 0.787 +graphic: 0.776 +TCG: 0.773 +network: 0.773 +mistranslation: 0.766 +arm: 0.754 +PID: 0.751 +files: 0.750 +virtual: 0.744 +kernel: 0.726 +KVM: 0.721 +boot: 0.662 +x86: 0.644 +i386: 0.570 + +qemu 4.2 segfaults on VF detach + +After updating Ubuntu 20.04 to the Beta version, we get the following error and the virtual machines stucks when detaching PCI devices using virsh command: + +Error: +error: Failed to detach device from /tmp/vf_interface_attached.xml +error: internal error: End of file from qemu monitor + +steps to reproduce: + 1. create a VM over Ubuntu 20.04 (5.4.0-14-generic) + 2. attach PCI device to this VM (Mellanox VF for example) + 3. try to detaching the PCI device using virsh command: + a. create a pci interface xml file: + + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address type='pci' domain='0x0000' bus='0x11' slot='0x00' function='0x2' /> + </source> + </hostdev> + + b. #virsh detach-device <VM-Doman-name> <pci interface xml file> + + + +- Ubuntu release: + Description: Ubuntu Focal Fossa (development branch) + Release: 20.04 + +- Package ver: + libvirt0: + Installed: 6.0.0-0ubuntu3 + Candidate: 6.0.0-0ubuntu5 + Version table: + 6.0.0-0ubuntu5 500 + 500 http://il.archive.ubuntu.com/ubuntu focal/main amd64 Packages + *** 6.0.0-0ubuntu3 100 + 100 /var/lib/dpkg/status + +- What you expected to happen: + PCI device detached without any errors. + +- What happened instead: + getting the errors above and he VM stuck + +additional info: +after downgrading the libvirt0 package and all the dependent packages to 5.4 the previous, version, seems that the issue disappeared + +Hi Mohammad, +I'll to recreate your issue, but while that goes on I already would want to ask if you could report the following tracked while you try to detach the device: + +1. host dmesg -w +2. journalctl -f +3. /var/log/libvirt/qemu/<questname>.log + +Please report all these in case there is something that helps to pinpoint the root cause. + +Could you also please try more devices to know which kind this issue it restricted to? +1. try other VFs on the same device +2. try VFs on a different device (if you have any) +3. try non-VF but full PCI passthrough + +Does the issue occor on your system for all of these ? + +For the time being I only found a system with a full device to passthrough not a VF. +That worked fine for me, the guest gets the dev and can load its drivers. +[ 297.340525] mlx5_core 0000:00:08.0: Port module event: module 0, Cable unplugged +[ 297.361111] mlx5_core 0000:00:08.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) +[ 297.572313] mlx5_core 0000:00:08.0 ens8: renamed from eth0 + +But since that was "only" PCI-passthrough and not yet VFs I'm looking forward to your answers on your case. + +Once more thing you could track and report is the guests "dmesg -w" while trying to attach to see if there is anything appearing there or aborting much earlier. + +I got VFs enabled now, attach works fine as well. + +But I can confirm that detach breaks it. + +XML used for the device: + <interface type='hostdev' managed='yes'> + <driver name='vfio'/> + <mac address='52:54:00:c3:0e:32'/> + <source> + <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x2'/> + </source> + </interface> + +$ virsh detach-device focal-pttest VF-pass-through.xml +error: Failed to detach device from VF-pass-through.xml +error: internal error: End of file from qemu monitor + +Logs: +1. Guest dmesg (doesn't exist as it dies immediately). + +2. qemu log: +2020-03-18 11:02:18.221+0000: shutting down, reason=crashed + +This one is interesting, Host dmesg: +[ 5819.223023] CPU 0/KVM[2763]: segfault at 0 ip 000055d37b4b245d sp 00007f2f5fffe188 error 6 in qemu-system-x86_64[55d37b008000+529000] +[ 5819.223030] Code: 08 48 89 50 10 48 89 37 48 89 7e 10 c3 f3 0f 1e fa 48 8b 47 08 48 8b 57 10 48 85 c0 74 0c 48 89 50 10 48 8b 57 10 48 8b 47 08 <48> 89 02 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e + +Afterwards I see the device come back to the host, but the segfault is the reason it died. + +I re-run the above, full PCI passthrough still attaches/detaches fine. + +VFs attach fine +VFs break on detach + +I've thrown qemu into GDB and this is the backtrace +Thread 4 "CPU 0/KVM" received signal SIGSEGV, Segmentation fault. +[Switching to Thread 0x7f82f0e31700 (LWP 3998)] +0x000055d2f322d45d in notifier_remove (notifier=notifier@entry=0x55d2f40c5078) at ./util/notify.c:31 +31 QLIST_REMOVE(notifier, node); +(gdb) bt +#0 0x000055d2f322d45d in notifier_remove (notifier=notifier@entry=0x55d2f40c5078) at ./util/notify.c:31 +#1 0x000055d2f2df8df9 in kvm_irqchip_remove_change_notifier (n=n@entry=0x55d2f40c5078) at ./accel/kvm/kvm-all.c:1409 +#2 0x000055d2f2e56989 in vfio_exitfn (pdev=<optimized out>) at ./hw/vfio/pci.c:3079 +#3 0x000055d2f3025c1b in pci_qdev_unrealize (dev=<optimized out>, errp=<optimized out>) at ./hw/pci/pci.c:1131 +#4 0x000055d2f2f8c6e2 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x0) at ./hw/core/qdev.c:932 +#5 0x000055d2f312449b in property_set_bool (obj=0x55d2f40c4430, v=<optimized out>, name=<optimized out>, opaque=0x55d2f4083ee0, errp=0x0) at ./qom/object.c:2078 +#6 0x000055d2f3128c84 in object_property_set_qobject (obj=obj@entry=0x55d2f40c4430, value=value@entry=0x7f82dc2f7130, name=name@entry=0x55d2f330d85d "realized", errp=errp@entry=0x0) + at ./qom/qom-qobject.c:26 +#7 0x000055d2f31264ba in object_property_set_bool (obj=0x55d2f40c4430, value=<optimized out>, name=0x55d2f330d85d "realized", errp=0x0) at ./qom/object.c:1336 +#8 0x000055d2f2f56bca in acpi_pcihp_device_unplug_cb (hotplug_dev=<optimized out>, s=<optimized out>, dev=0x55d2f40c4430, errp=<optimized out>) at ./hw/acpi/pcihp.c:269 +#9 0x000055d2f2f56253 in acpi_pcihp_eject_slot (s=<optimized out>, bsel=<optimized out>, slots=slots@entry=256) at ./hw/acpi/pcihp.c:170 +#10 0x000055d2f2f56383 in pci_write (size=<optimized out>, data=256, addr=8, opaque=<optimized out>) at ./hw/acpi/pcihp.c:341 +#11 pci_write (opaque=<optimized out>, addr=<optimized out>, data=256, size=<optimized out>) at ./hw/acpi/pcihp.c:332 +#12 0x000055d2f2de9cfb in memory_region_write_accessor (mr=mr@entry=0x55d2f4780970, addr=8, value=value@entry=0x7f82f0e304f8, size=size@entry=4, shift=<optimized out>, + mask=mask@entry=4294967295, attrs=...) at ./memory.c:483 +#13 0x000055d2f2de79ee in access_with_adjusted_size (addr=addr@entry=8, value=value@entry=0x7f82f0e304f8, size=size@entry=4, access_size_min=<optimized out>, + access_size_max=<optimized out>, access_fn=access_fn@entry=0x55d2f2de9bd0 <memory_region_write_accessor>, mr=0x55d2f4780970, attrs=...) at ./memory.c:544 +#14 0x000055d2f2debfc3 in memory_region_dispatch_write (mr=mr@entry=0x55d2f4780970, addr=8, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...) at ./memory.c:1475 +#15 0x000055d2f2d96a30 in flatview_write_continue (fv=fv@entry=0x7f82dc14bbc0, addr=addr@entry=44552, attrs=..., buf=buf@entry=0x7f82f17e9000 "", len=len@entry=4, addr1=<optimized out>, + l=<optimized out>, mr=0x55d2f4780970) at ./include/qemu/host-utils.h:164 +#16 0x000055d2f2d96c46 in flatview_write (fv=0x7f82dc14bbc0, addr=44552, attrs=..., buf=0x7f82f17e9000 "", len=4) at ./exec.c:3169 +#17 0x000055d2f2d9b01f in address_space_write (as=as@entry=0x55d2f3956960 <address_space_io>, addr=addr@entry=44552, attrs=..., buf=<optimized out>, len=len@entry=4) at ./exec.c:3259 +#18 0x000055d2f2d9b09e in address_space_rw (as=as@entry=0x55d2f3956960 <address_space_io>, addr=addr@entry=44552, attrs=..., attrs@entry=..., buf=<optimized out>, len=len@entry=4, + is_write=is_write@entry=true) at ./exec.c:3269 +#19 0x000055d2f2dfc94f in kvm_handle_io (count=1, size=4, direction=<optimized out>, data=<optimized out>, attrs=..., port=44552) at ./accel/kvm/kvm-all.c:2104 +#20 kvm_cpu_exec (cpu=cpu@entry=0x55d2f3dc9090) at ./accel/kvm/kvm-all.c:2350 +#21 0x000055d2f2dde53e in qemu_kvm_cpu_thread_fn (arg=0x55d2f3dc9090) at ./cpus.c:1318 +#22 qemu_kvm_cpu_thread_fn (arg=arg@entry=0x55d2f3dc9090) at ./cpus.c:1290 +#23 0x000055d2f321fe13 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:519 +#24 0x00007f82f4290609 in start_thread (arg=<optimized out>) at pthread_create.c:477 +#25 0x00007f82f41b7153 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 + +I changed the bug task to Qemu (Ubuntu) as this isn't a libvirt error. + +I also added an upstream qemu task in case this is a known issue for the developers there. Someone might be able to point us at a known discussion/fix. + +The Backtrace I added in the last comment should help to identify known cases. + +Hi Christian, + +yes that exactly what we see in our tests, +so are the logs that you asked for in comment#1 still needed? +also if you fix it can you please provide us a link for a package or even a workaround until the issue resolved, since this issue stuck our QA from testing ASAP over Focal. + +At the breaking function we have: + +29 void notifier_remove(Notifier *notifier) +30 { +31 QLIST_REMOVE(notifier, node); +32 } + + + +(gdb) p notifier +$1 = (Notifier *) 0x55d2f40c5078 +(gdb) p *notifier +$2 = {notify = 0x0, node = {le_next = 0x0, le_prev = 0x0}} + +And since QLIST_REMOVE is defined as: +140 #define QLIST_REMOVE(elm, field) do { \ +141 if ((elm)->field.le_next != NULL) \ +142 (elm)->field.le_next->field.le_prev = \ +143 (elm)->field.le_prev; \ +144 *(elm)->field.le_prev = (elm)->field.le_next; \ +145 } while (/*CONSTCOND*/0) + +(gdb) p (notifier)->node.le_next +$5 = (struct Notifier *) 0x0 +(gdb) p &(notifier->node) +$11 = (struct {...} *) 0x55d2f40c5080 + +There actually is a != NULL check, might it have changed on the fly. +I need to look at it more thoroughly, but it should be enough to recognize a known issue. + +Might be https://git.qemu.org/?p=qemu.git;a=commit;h=0446f8121723b134ca1d1ed0b73e96d4a0a8689d + +This would also match the backtrace path. + +That commit you mention is confirmed to solve a bug reported against Fedora with almost the same stack trace you see here. + +Hi Christian, + +Yes, +seems that the patch you mentioned fixing the issue i rebuilt the qemu with the patch and it's work fine now. +Thank you guys. + + +I regularly before a release pull in fixes that were posted for qemu-stable. +This is one of them, I'll again do such a build and retest this issue with it. + +I identified and backported (only one needed modification) 33 patches. +But as usual there might be some context needed on top - I have build that over night in [1] + +Testing that on my reproducer + +Attach-host: +[84652.671123] vfio-pci 0000:08:00.2: enabling device (0000 -> 0002) + +Attach-guest: +[ 45.199920] pci 0000:00:08.0: [15b3:1016] type 00 class 0x020000 +[ 45.200374] pci 0000:00:08.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit pref] +[ 45.201358] pci 0000:00:08.0: enabling Extended Tags +[ 45.202726] pci 0000:00:08.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown speed x0 link at 0000:00:08.0 (capable of 63.008 Gb/s with 8 GT/s x8 link) +[ 45.208316] pci 0000:00:08.0: BAR 0: assigned [mem 0x100000000-0x1000fffff 64bit pref] +[ 45.256566] mlx5_core 0000:00:08.0: enabling device (0000 -> 0002) +[ 45.262103] mlx5_core 0000:00:08.0: firmware version: 14.27.1016 +[ 45.544010] mlx5_core 0000:00:08.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) +[ 45.710123] mlx5_core 0000:00:08.0 ens8: renamed from eth0 +[ 60.992547] random: crng init done +[ 60.992552] random: 3 urandom warning(s) missed due to ratelimiting + +Detach-host: +[84926.767411] mlx5_core 0000:08:00.2: enabling device (0000 -> 0002) +[84926.767514] mlx5_core 0000:08:00.2: firmware version: 14.27.1016 +[84927.036146] mlx5_core 0000:08:00.2: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) +[84927.208523] mlx5_core 0000:08:00.2 ens1v1: renamed from eth0 + +Detach-guest: +<nothing> + + +So yes, these changes fix the issue here (and a bunch of others). +I'll open up an MP to get these changes into Ubuntu 20.04. + +[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3981 + +This bug was fixed in the package qemu - 1:4.2-3ubuntu3 + +--------------- +qemu (1:4.2-3ubuntu3) focal; urgency=medium + + * d/p/stable/lp-1867519-*: Stabilize qemu 4.2 with upstream + patches @qemu-stable (LP: #1867519) + + -- Christian Ehrhardt <email address hidden> Wed, 18 Mar 2020 13:57:57 +0100 + |