diff options
Diffstat (limited to 'results/classifier/001/other/88225572')
| -rw-r--r-- | results/classifier/001/other/88225572 | 2900 |
1 files changed, 2900 insertions, 0 deletions
diff --git a/results/classifier/001/other/88225572 b/results/classifier/001/other/88225572 new file mode 100644 index 000000000..b3cb5b265 --- /dev/null +++ b/results/classifier/001/other/88225572 @@ -0,0 +1,2900 @@ +other: 0.987 +semantic: 0.976 +instruction: 0.974 +mistranslation: 0.942 + +[BUG qemu 4.0] segfault when unplugging virtio-blk-pci device + +Hi, + +I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +think it's because io completion hits use-after-free when device is +already gone. Is this a known bug that has been fixed? (I went through +the git log but didn't find anything obvious). + +gdb backtrace is: + +Core was generated by `/usr/local/libexec/qemu-kvm -name +sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +Program terminated with signal 11, Segmentation fault. +#0 object_get_class (obj=obj@entry=0x0) at +/usr/src/debug/qemu-4.0/qom/object.c:903 +903 return obj->class; +(gdb) bt +#0 object_get_class (obj=obj@entry=0x0) at +/usr/src/debug/qemu-4.0/qom/object.c:903 +#1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +Â Â vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +#2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +Â Â opaque=0x558a2f2fd420, ret=0) +Â Â at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +#3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +Â Â at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +#4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +Â Â i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +#5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +#6 Â 0x00007fff9ed75780 in ?? () +#7 Â 0x0000000000000000 in ?? () + +It seems like qemu was completing a discard/write_zero request, but +parent BusState was already freed & set to NULL. + +Do we need to drain all pending request before unrealizing virtio-blk +device? Like the following patch proposed? +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +If more info is needed, please let me know. + +Thanks, +Eryu + +On Tue, 31 Dec 2019 18:34:34 +0800 +Eryu Guan <address@hidden> wrote: + +> +Hi, +> +> +I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +think it's because io completion hits use-after-free when device is +> +already gone. Is this a known bug that has been fixed? (I went through +> +the git log but didn't find anything obvious). +> +> +gdb backtrace is: +> +> +Core was generated by `/usr/local/libexec/qemu-kvm -name +> +sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +Program terminated with signal 11, Segmentation fault. +> +#0 object_get_class (obj=obj@entry=0x0) at +> +/usr/src/debug/qemu-4.0/qom/object.c:903 +> +903 return obj->class; +> +(gdb) bt +> +#0 object_get_class (obj=obj@entry=0x0) at +> +/usr/src/debug/qemu-4.0/qom/object.c:903 +> +#1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +Â Â vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +#2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +Â Â opaque=0x558a2f2fd420, ret=0) +> +Â Â at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +#3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +Â Â at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +#4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +Â Â i1=<optimized out>) at +> +/usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +#5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +#6 Â 0x00007fff9ed75780 in ?? () +> +#7 Â 0x0000000000000000 in ?? () +> +> +It seems like qemu was completing a discard/write_zero request, but +> +parent BusState was already freed & set to NULL. +> +> +Do we need to drain all pending request before unrealizing virtio-blk +> +device? Like the following patch proposed? +> +> +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> +If more info is needed, please let me know. +may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> +Thanks, +> +Eryu +> + +On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +On Tue, 31 Dec 2019 18:34:34 +0800 +> +Eryu Guan <address@hidden> wrote: +> +> +> Hi, +> +> +> +> I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> think it's because io completion hits use-after-free when device is +> +> already gone. Is this a known bug that has been fixed? (I went through +> +> the git log but didn't find anything obvious). +> +> +> +> gdb backtrace is: +> +> +> +> Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> Program terminated with signal 11, Segmentation fault. +> +> #0 object_get_class (obj=obj@entry=0x0) at +> +> /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> 903 return obj->class; +> +> (gdb) bt +> +> #0 object_get_class (obj=obj@entry=0x0) at +> +> /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> #1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> Â Â vector=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> #2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> Â Â opaque=0x558a2f2fd420, ret=0) +> +> Â Â at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> #3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> Â Â at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> #4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> Â Â i1=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> #5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> #6 Â 0x00007fff9ed75780 in ?? () +> +> #7 Â 0x0000000000000000 in ?? () +> +> +> +> It seems like qemu was completing a discard/write_zero request, but +> +> parent BusState was already freed & set to NULL. +> +> +> +> Do we need to drain all pending request before unrealizing virtio-blk +> +> device? Like the following patch proposed? +> +> +> +> +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> +> +> If more info is needed, please let me know. +> +> +may be this will help: +https://patchwork.kernel.org/patch/11213047/ +Yeah, this looks promising! I'll try it out (though it's a one-time +crash for me). Thanks! + +Eryu + +On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> On Tue, 31 Dec 2019 18:34:34 +0800 +> +> Eryu Guan <address@hidden> wrote: +> +> +> +> > Hi, +> +> > +> +> > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > think it's because io completion hits use-after-free when device is +> +> > already gone. Is this a known bug that has been fixed? (I went through +> +> > the git log but didn't find anything obvious). +> +> > +> +> > gdb backtrace is: +> +> > +> +> > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > Program terminated with signal 11, Segmentation fault. +> +> > #0 object_get_class (obj=obj@entry=0x0) at +> +> > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > 903 return obj->class; +> +> > (gdb) bt +> +> > #0 object_get_class (obj=obj@entry=0x0) at +> +> > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > #1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > Â Â vector=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > #2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > Â Â opaque=0x558a2f2fd420, ret=0) +> +> > Â Â at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > #3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > Â Â at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > #4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > Â Â i1=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > #5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > #6 Â 0x00007fff9ed75780 in ?? () +> +> > #7 Â 0x0000000000000000 in ?? () +> +> > +> +> > It seems like qemu was completing a discard/write_zero request, but +> +> > parent BusState was already freed & set to NULL. +> +> > +> +> > Do we need to drain all pending request before unrealizing virtio-blk +> +> > device? Like the following patch proposed? +> +> > +> +> > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > +> +> > If more info is needed, please let me know. +> +> +> +> may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> +Yeah, this looks promising! I'll try it out (though it's a one-time +> +crash for me). Thanks! +After applying this patch, I don't see the original segfaut and +backtrace, but I see this crash + +[Thread debugging using libthread_db enabled] +Using host libthread_db library "/lib64/libthread_db.so.1". +Core was generated by `/usr/local/libexec/qemu-kvm -name +sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +Program terminated with signal 11, Segmentation fault. +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +addr=0, val=<optimized out>, size=<optimized out>) at +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +1324 VirtIOPCIProxy *proxy = +VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +Missing separate debuginfos, use: debuginfo-install +glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +(gdb) bt +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +addr=0, val=<optimized out>, size=<optimized out>) at +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +#1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +addr=<optimized out>, value=<optimized out>, size=<optimized out>, +shift=<optimized out>, mask=<optimized out>, attrs=...) at +/usr/src/debug/qemu-4.0/memory.c:502 +#2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized +out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 +<memory_region_write_accessor>, mr=0x56121846d340, attrs=...) + at /usr/src/debug/qemu-4.0/memory.c:568 +#3 0x0000561216837c66 in memory_region_dispatch_write +(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +#4 0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, +addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address +0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, +l=<optimized out>, mr=0x56121846d340) + at /usr/src/debug/qemu-4.0/exec.c:3279 +#5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, +attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at +/usr/src/debug/qemu-4.0/exec.c:3318 +#6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at +/usr/src/debug/qemu-4.0/exec.c:3408 +#7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized +out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address +0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) +at /usr/src/debug/qemu-4.0/exec.c:3419 +#8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +#9 0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) +at /usr/src/debug/qemu-4.0/cpus.c:1281 +#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 + +And I searched and found +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +bug. + +But I can still hit the bug even after applying the commit. Do I miss +anything? + +Thanks, +Eryu +> +Eryu + +On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> +On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > Eryu Guan <address@hidden> wrote: +> +> > +> +> > > Hi, +> +> > > +> +> > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > > think it's because io completion hits use-after-free when device is +> +> > > already gone. Is this a known bug that has been fixed? (I went through +> +> > > the git log but didn't find anything obvious). +> +> > > +> +> > > gdb backtrace is: +> +> > > +> +> > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > Program terminated with signal 11, Segmentation fault. +> +> > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > 903 return obj->class; +> +> > > (gdb) bt +> +> > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > vector=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > opaque=0x558a2f2fd420, ret=0) +> +> > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > i1=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > #6 0x00007fff9ed75780 in ?? () +> +> > > #7 0x0000000000000000 in ?? () +> +> > > +> +> > > It seems like qemu was completing a discard/write_zero request, but +> +> > > parent BusState was already freed & set to NULL. +> +> > > +> +> > > Do we need to drain all pending request before unrealizing virtio-blk +> +> > > device? Like the following patch proposed? +> +> > > +> +> > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > +> +> > > If more info is needed, please let me know. +> +> > +> +> > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> +> +> Yeah, this looks promising! I'll try it out (though it's a one-time +> +> crash for me). Thanks! +> +> +After applying this patch, I don't see the original segfaut and +> +backtrace, but I see this crash +> +> +[Thread debugging using libthread_db enabled] +> +Using host libthread_db library "/lib64/libthread_db.so.1". +> +Core was generated by `/usr/local/libexec/qemu-kvm -name +> +sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +Program terminated with signal 11, Segmentation fault. +> +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +addr=0, val=<optimized out>, size=<optimized out>) at +> +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +1324 VirtIOPCIProxy *proxy = +> +VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +Missing separate debuginfos, use: debuginfo-install +> +glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +(gdb) bt +> +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +addr=0, val=<optimized out>, size=<optimized out>) at +> +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +#1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +> +addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +/usr/src/debug/qemu-4.0/memory.c:502 +> +#2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +access_size_min=<optimized out>, access_size_max=<optimized out>, +> +access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, +> +attrs=...) +> +at /usr/src/debug/qemu-4.0/memory.c:568 +> +#3 0x0000561216837c66 in memory_region_dispatch_write +> +(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +#4 0x00005612167e036f in flatview_write_continue +> +(fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340) +> +at /usr/src/debug/qemu-4.0/exec.c:3279 +> +#5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out +> +of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +#6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at +> +/usr/src/debug/qemu-4.0/exec.c:3408 +> +#7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +addr=<optimized out>, attrs=..., attrs@entry=..., +> +buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +len=<optimized out>, is_write=<optimized out>) at +> +/usr/src/debug/qemu-4.0/exec.c:3419 +> +#8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +#9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +(arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> +And I searched and found +> +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +bug. +> +> +But I can still hit the bug even after applying the commit. Do I miss +> +anything? +Hi Eryu, +This backtrace seems to be caused by this bug (there were two bugs in +1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +Although the solution hasn't been tested on virtio-blk yet, you may +want to apply this patch: +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +Let me know if this works. + +Best regards, Julia Suvorova. + +On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> +> +> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > Eryu Guan <address@hidden> wrote: +> +> > > +> +> > > > Hi, +> +> > > > +> +> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > > > think it's because io completion hits use-after-free when device is +> +> > > > already gone. Is this a known bug that has been fixed? (I went through +> +> > > > the git log but didn't find anything obvious). +> +> > > > +> +> > > > gdb backtrace is: +> +> > > > +> +> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > Program terminated with signal 11, Segmentation fault. +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > 903 return obj->class; +> +> > > > (gdb) bt +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > > vector=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > > i1=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > #7 0x0000000000000000 in ?? () +> +> > > > +> +> > > > It seems like qemu was completing a discard/write_zero request, but +> +> > > > parent BusState was already freed & set to NULL. +> +> > > > +> +> > > > Do we need to drain all pending request before unrealizing virtio-blk +> +> > > > device? Like the following patch proposed? +> +> > > > +> +> > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > +> +> > > > If more info is needed, please let me know. +> +> > > +> +> > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > +> +> > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > crash for me). Thanks! +> +> +> +> After applying this patch, I don't see the original segfaut and +> +> backtrace, but I see this crash +> +> +> +> [Thread debugging using libthread_db enabled] +> +> Using host libthread_db library "/lib64/libthread_db.so.1". +> +> Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> Program terminated with signal 11, Segmentation fault. +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> 1324 VirtIOPCIProxy *proxy = +> +> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> Missing separate debuginfos, use: debuginfo-install +> +> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> (gdb) bt +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +> +> addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +> shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> /usr/src/debug/qemu-4.0/memory.c:502 +> +> #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, +> +> attrs=...) +> +> at /usr/src/debug/qemu-4.0/memory.c:568 +> +> #3 0x0000561216837c66 in memory_region_dispatch_write +> +> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> #4 0x00005612167e036f in flatview_write_continue +> +> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> mr=0x56121846d340) +> +> at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 +> +> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +> #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) +> +> at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., attrs@entry=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=<optimized out>, is_write=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/exec.c:3419 +> +> #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> +> +> And I searched and found +> +> +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> bug. +> +> +> +> But I can still hit the bug even after applying the commit. Do I miss +> +> anything? +> +> +Hi Eryu, +> +This backtrace seems to be caused by this bug (there were two bugs in +> +1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +Although the solution hasn't been tested on virtio-blk yet, you may +> +want to apply this patch: +> +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +Let me know if this works. +Will try it out, thanks a lot! + +Eryu + +On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> +> +> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > Eryu Guan <address@hidden> wrote: +> +> > > +> +> > > > Hi, +> +> > > > +> +> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > > > think it's because io completion hits use-after-free when device is +> +> > > > already gone. Is this a known bug that has been fixed? (I went through +> +> > > > the git log but didn't find anything obvious). +> +> > > > +> +> > > > gdb backtrace is: +> +> > > > +> +> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > Program terminated with signal 11, Segmentation fault. +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > 903 return obj->class; +> +> > > > (gdb) bt +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > > vector=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > > i1=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > #7 0x0000000000000000 in ?? () +> +> > > > +> +> > > > It seems like qemu was completing a discard/write_zero request, but +> +> > > > parent BusState was already freed & set to NULL. +> +> > > > +> +> > > > Do we need to drain all pending request before unrealizing virtio-blk +> +> > > > device? Like the following patch proposed? +> +> > > > +> +> > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > +> +> > > > If more info is needed, please let me know. +> +> > > +> +> > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > +> +> > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > crash for me). Thanks! +> +> +> +> After applying this patch, I don't see the original segfaut and +> +> backtrace, but I see this crash +> +> +> +> [Thread debugging using libthread_db enabled] +> +> Using host libthread_db library "/lib64/libthread_db.so.1". +> +> Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> Program terminated with signal 11, Segmentation fault. +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> 1324 VirtIOPCIProxy *proxy = +> +> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> Missing separate debuginfos, use: debuginfo-install +> +> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> (gdb) bt +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +> +> addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +> shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> /usr/src/debug/qemu-4.0/memory.c:502 +> +> #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, +> +> attrs=...) +> +> at /usr/src/debug/qemu-4.0/memory.c:568 +> +> #3 0x0000561216837c66 in memory_region_dispatch_write +> +> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> #4 0x00005612167e036f in flatview_write_continue +> +> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> mr=0x56121846d340) +> +> at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 +> +> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +> #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) +> +> at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., attrs@entry=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=<optimized out>, is_write=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/exec.c:3419 +> +> #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> +> +> And I searched and found +> +> +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> bug. +> +> +> +> But I can still hit the bug even after applying the commit. Do I miss +> +> anything? +> +> +Hi Eryu, +> +This backtrace seems to be caused by this bug (there were two bugs in +> +1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +Although the solution hasn't been tested on virtio-blk yet, you may +> +want to apply this patch: +> +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +Let me know if this works. +Unfortunately, I still see the same segfault & backtrace after applying +commit 421afd2fe8dd ("virtio: reset region cache when on queue +deletion") + +Anything I can help to debug? + +Thanks, +Eryu + +On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > +> +> > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > Eryu Guan <address@hidden> wrote: +> +> > > > +> +> > > > > Hi, +> +> > > > > +> +> > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, +> +> > > > > I +> +> > > > > think it's because io completion hits use-after-free when device is +> +> > > > > already gone. Is this a known bug that has been fixed? (I went +> +> > > > > through +> +> > > > > the git log but didn't find anything obvious). +> +> > > > > +> +> > > > > gdb backtrace is: +> +> > > > > +> +> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > 903 return obj->class; +> +> > > > > (gdb) bt +> +> > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > > > vector=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > > > i1=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > #7 0x0000000000000000 in ?? () +> +> > > > > +> +> > > > > It seems like qemu was completing a discard/write_zero request, but +> +> > > > > parent BusState was already freed & set to NULL. +> +> > > > > +> +> > > > > Do we need to drain all pending request before unrealizing +> +> > > > > virtio-blk +> +> > > > > device? Like the following patch proposed? +> +> > > > > +> +> > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > +> +> > > > > If more info is needed, please let me know. +> +> > > > +> +> > > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > > +> +> > > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > > crash for me). Thanks! +> +> > +> +> > After applying this patch, I don't see the original segfaut and +> +> > backtrace, but I see this crash +> +> > +> +> > [Thread debugging using libthread_db enabled] +> +> > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > Program terminated with signal 11, Segmentation fault. +> +> > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> > addr=0, val=<optimized out>, size=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > 1324 VirtIOPCIProxy *proxy = +> +> > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > Missing separate debuginfos, use: debuginfo-install +> +> > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> > (gdb) bt +> +> > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> > addr=0, val=<optimized out>, size=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized +> +> > out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +> > shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> > /usr/src/debug/qemu-4.0/memory.c:502 +> +> > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> > access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> > access_fn=0x561216835ac0 <memory_region_write_accessor>, +> +> > mr=0x56121846d340, attrs=...) +> +> > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > #4 0x00005612167e036f in flatview_write_continue +> +> > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> > mr=0x56121846d340) +> +> > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 +> +> > out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized +> +> > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > len=<optimized out>, is_write=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +> > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > +> +> > And I searched and found +> +> > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> > bug. +> +> > +> +> > But I can still hit the bug even after applying the commit. Do I miss +> +> > anything? +> +> +> +> Hi Eryu, +> +> This backtrace seems to be caused by this bug (there were two bugs in +> +> 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> Although the solution hasn't been tested on virtio-blk yet, you may +> +> want to apply this patch: +> +> +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> Let me know if this works. +> +> +Unfortunately, I still see the same segfault & backtrace after applying +> +commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +deletion") +> +> +Anything I can help to debug? +Please post the QEMU command-line and the QMP commands use to remove the +device. + +The backtrace shows a vcpu thread submitting a request. The device +seems to be partially destroyed. That's surprising because the monitor +and the vcpu thread should use the QEMU global mutex to avoid race +conditions. Maybe seeing the QMP commands will make it clearer... + +Stefan +signature.asc +Description: +PGP signature + +On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: +> +On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +> On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > > +> +> > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > > Eryu Guan <address@hidden> wrote: +> +> > > > > +> +> > > > > > Hi, +> +> > > > > > +> +> > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata +> +> > > > > > sandbox, I +> +> > > > > > think it's because io completion hits use-after-free when device +> +> > > > > > is +> +> > > > > > already gone. Is this a known bug that has been fixed? (I went +> +> > > > > > through +> +> > > > > > the git log but didn't find anything obvious). +> +> > > > > > +> +> > > > > > gdb backtrace is: +> +> > > > > > +> +> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > 903 return obj->class; +> +> > > > > > (gdb) bt +> +> > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector +> +> > > > > > (vdev=0x558a2e7751d0, +> +> > > > > > vector=<optimized out>) at +> +> > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > > #2 0x0000558a2bfdcb1e in +> +> > > > > > virtio_blk_discard_write_zeroes_complete ( +> +> > > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized +> +> > > > > > out>, +> +> > > > > > i1=<optimized out>) at +> +> > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > > #7 0x0000000000000000 in ?? () +> +> > > > > > +> +> > > > > > It seems like qemu was completing a discard/write_zero request, +> +> > > > > > but +> +> > > > > > parent BusState was already freed & set to NULL. +> +> > > > > > +> +> > > > > > Do we need to drain all pending request before unrealizing +> +> > > > > > virtio-blk +> +> > > > > > device? Like the following patch proposed? +> +> > > > > > +> +> > > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > > +> +> > > > > > If more info is needed, please let me know. +> +> > > > > +> +> > > > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > > > +> +> > > > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > > > crash for me). Thanks! +> +> > > +> +> > > After applying this patch, I don't see the original segfaut and +> +> > > backtrace, but I see this crash +> +> > > +> +> > > [Thread debugging using libthread_db enabled] +> +> > > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > > Program terminated with signal 11, Segmentation fault. +> +> > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > 1324 VirtIOPCIProxy *proxy = +> +> > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > > Missing separate debuginfos, use: debuginfo-install +> +> > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> > > (gdb) bt +> +> > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized +> +> > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized +> +> > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> > > /usr/src/debug/qemu-4.0/memory.c:502 +> +> > > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> > > access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> > > access_fn=0x561216835ac0 <memory_region_write_accessor>, +> +> > > mr=0x56121846d340, attrs=...) +> +> > > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > > #4 0x00005612167e036f in flatview_write_continue +> +> > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> > > mr=0x56121846d340) +> +> > > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address +> +> > > 0x7fce7dd97028 out of bounds>, len=2) at +> +> > > /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized +> +> > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > len=<optimized out>, is_write=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) +> +> > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > > +> +> > > And I searched and found +> +> > > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> > > bug. +> +> > > +> +> > > But I can still hit the bug even after applying the commit. Do I miss +> +> > > anything? +> +> > +> +> > Hi Eryu, +> +> > This backtrace seems to be caused by this bug (there were two bugs in +> +> > 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> > Although the solution hasn't been tested on virtio-blk yet, you may +> +> > want to apply this patch: +> +> > +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> > Let me know if this works. +> +> +> +> Unfortunately, I still see the same segfault & backtrace after applying +> +> commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +> deletion") +> +> +> +> Anything I can help to debug? +> +> +Please post the QEMU command-line and the QMP commands use to remove the +> +device. +It's a normal kata instance using virtio-fs as rootfs. + +/usr/local/libexec/qemu-kvm -name +sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ + -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine +q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ + -cpu host -qmp +unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait + \ + -qmp +unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait + \ + -m 2048M,slots=10,maxmem=773893M -device +pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ + -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device +virtconsole,chardev=charconsole0,id=console0 \ + -chardev +socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait + \ + -device +virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ + -chardev +socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait + \ + -device nvdimm,id=nv0,memdev=mem0 -object +memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 + \ + -object rng-random,id=rng0,filename=/dev/urandom -device +virtio-rng,rng=rng0,romfile= \ + -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ + -chardev +socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait + \ + -chardev +socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock + \ + -device +vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M +-netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ + -device +driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= + \ + -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults +-nographic -daemonize \ + -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on +-numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ + -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 +i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 +console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 +root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro +rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 +agent.use_vsock=false init=/usr/lib/systemd/systemd +systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service +systemd.mask=systemd-networkd.socket \ + -pidfile +/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid +\ + -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 + +QMP command to delete device (the device id is just an example, not the +one caused the crash): + +"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" + +which has been hot plugged by: +"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" +"{\"return\": {}}" +"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" +"{\"return\": {}}" + +> +> +The backtrace shows a vcpu thread submitting a request. The device +> +seems to be partially destroyed. That's surprising because the monitor +> +and the vcpu thread should use the QEMU global mutex to avoid race +> +conditions. Maybe seeing the QMP commands will make it clearer... +> +> +Stefan +Thanks! + +Eryu + +On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: +> +On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: +> +> On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +> > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > > > +> +> > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > > > Eryu Guan <address@hidden> wrote: +> +> > > > > > +> +> > > > > > > Hi, +> +> > > > > > > +> +> > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata +> +> > > > > > > sandbox, I +> +> > > > > > > think it's because io completion hits use-after-free when +> +> > > > > > > device is +> +> > > > > > > already gone. Is this a known bug that has been fixed? (I went +> +> > > > > > > through +> +> > > > > > > the git log but didn't find anything obvious). +> +> > > > > > > +> +> > > > > > > gdb backtrace is: +> +> > > > > > > +> +> > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > 903 return obj->class; +> +> > > > > > > (gdb) bt +> +> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector +> +> > > > > > > (vdev=0x558a2e7751d0, +> +> > > > > > > vector=<optimized out>) at +> +> > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > > > #2 0x0000558a2bfdcb1e in +> +> > > > > > > virtio_blk_discard_write_zeroes_complete ( +> +> > > > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized +> +> > > > > > > out>, +> +> > > > > > > i1=<optimized out>) at +> +> > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > > > #7 0x0000000000000000 in ?? () +> +> > > > > > > +> +> > > > > > > It seems like qemu was completing a discard/write_zero request, +> +> > > > > > > but +> +> > > > > > > parent BusState was already freed & set to NULL. +> +> > > > > > > +> +> > > > > > > Do we need to drain all pending request before unrealizing +> +> > > > > > > virtio-blk +> +> > > > > > > device? Like the following patch proposed? +> +> > > > > > > +> +> > > > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > > > +> +> > > > > > > If more info is needed, please let me know. +> +> > > > > > +> +> > > > > > may be this will help: +> +> > > > > > +https://patchwork.kernel.org/patch/11213047/ +> +> > > > > +> +> > > > > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > > > > crash for me). Thanks! +> +> > > > +> +> > > > After applying this patch, I don't see the original segfaut and +> +> > > > backtrace, but I see this crash +> +> > > > +> +> > > > [Thread debugging using libthread_db enabled] +> +> > > > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > > > Program terminated with signal 11, Segmentation fault. +> +> > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > 1324 VirtIOPCIProxy *proxy = +> +> > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > > > Missing separate debuginfos, use: debuginfo-install +> +> > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > > > libstdc++-4.8.5-28.alios7.1.x86_64 +> +> > > > numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 +> +> > > > zlib-1.2.7-16.2.alios7.x86_64 +> +> > > > (gdb) bt +> +> > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized +> +> > > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized +> +> > > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> > > > /usr/src/debug/qemu-4.0/memory.c:502 +> +> > > > #2 0x0000561216833c5d in access_with_adjusted_size +> +> > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, +> +> > > > size=size@entry=2, access_size_min=<optimized out>, +> +> > > > access_size_max=<optimized out>, access_fn=0x561216835ac0 +> +> > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...) +> +> > > > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > > > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > > > #4 0x00005612167e036f in flatview_write_continue +> +> > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> > > > mr=0x56121846d340) +> +> > > > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address +> +> > > > 0x7fce7dd97028 out of bounds>, len=2) at +> +> > > > /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized +> +> > > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > > > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > > len=<optimized out>, is_write=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) +> +> > > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > > > +> +> > > > And I searched and found +> +> > > > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the +> +> > > > same +> +> > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this +> +> > > > particular +> +> > > > bug. +> +> > > > +> +> > > > But I can still hit the bug even after applying the commit. Do I miss +> +> > > > anything? +> +> > > +> +> > > Hi Eryu, +> +> > > This backtrace seems to be caused by this bug (there were two bugs in +> +> > > 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> > > Although the solution hasn't been tested on virtio-blk yet, you may +> +> > > want to apply this patch: +> +> > > +> +> > > +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> > > Let me know if this works. +> +> > +> +> > Unfortunately, I still see the same segfault & backtrace after applying +> +> > commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +> > deletion") +> +> > +> +> > Anything I can help to debug? +> +> +> +> Please post the QEMU command-line and the QMP commands use to remove the +> +> device. +> +> +It's a normal kata instance using virtio-fs as rootfs. +> +> +/usr/local/libexec/qemu-kvm -name +> +sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ +> +-uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine +> +q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ +> +-cpu host -qmp +> +unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +\ +> +-qmp +> +unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +\ +> +-m 2048M,slots=10,maxmem=773893M -device +> +pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ +> +-device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device +> +virtconsole,chardev=charconsole0,id=console0 \ +> +-chardev +> +socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait +> +\ +> +-device +> +virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ +> +-chardev +> +socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait +> +\ +> +-device nvdimm,id=nv0,memdev=mem0 -object +> +memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 +> +\ +> +-object rng-random,id=rng0,filename=/dev/urandom -device +> +virtio-rng,rng=rng0,romfile= \ +> +-device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ +> +-chardev +> +socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait +> +\ +> +-chardev +> +socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock +> +\ +> +-device +> +vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M +> +-netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ +> +-device +> +driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= +> +\ +> +-global kvm-pit.lost_tick_policy=discard -vga none -no-user-config +> +-nodefaults -nographic -daemonize \ +> +-object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on +> +-numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ +> +-append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 +> +i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k +> +console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 +> +pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro +> +ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 +> +agent.use_vsock=false init=/usr/lib/systemd/systemd +> +systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service +> +systemd.mask=systemd-networkd.socket \ +> +-pidfile +> +/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid +> +\ +> +-smp 1,cores=1,threads=1,sockets=96,maxcpus=96 +> +> +QMP command to delete device (the device id is just an example, not the +> +one caused the crash): +> +> +"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" +> +> +which has been hot plugged by: +> +"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" +> +"{\"return\": {}}" +> +"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" +> +"{\"return\": {}}" +Thanks. I wasn't able to reproduce this crash with qemu.git/master. + +One thing that is strange about the latest backtrace you posted: QEMU is +dispatching the memory access instead of using the ioeventfd code that +that virtio-blk-pci normally takes when a virtqueue is notified. I +guess this means ioeventfd has already been disabled due to the hot +unplug. + +Could you try with machine type "i440fx" instead of "q35"? I wonder if +pci-bridge/shpc is part of the problem. + +Stefan +signature.asc +Description: +PGP signature + +On Tue, Jan 14, 2020 at 04:16:24PM +0000, Stefan Hajnoczi wrote: +> +On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: +> +> On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: +> +> > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +> > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > > > > +> +> > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > > > > Eryu Guan <address@hidden> wrote: +> +> > > > > > > +> +> > > > > > > > Hi, +> +> > > > > > > > +> +> > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata +> +> > > > > > > > sandbox, I +> +> > > > > > > > think it's because io completion hits use-after-free when +> +> > > > > > > > device is +> +> > > > > > > > already gone. Is this a known bug that has been fixed? (I +> +> > > > > > > > went through +> +> > > > > > > > the git log but didn't find anything obvious). +> +> > > > > > > > +> +> > > > > > > > gdb backtrace is: +> +> > > > > > > > +> +> > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > > 903 return obj->class; +> +> > > > > > > > (gdb) bt +> +> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector +> +> > > > > > > > (vdev=0x558a2e7751d0, +> +> > > > > > > > vector=<optimized out>) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > > > > #2 0x0000558a2bfdcb1e in +> +> > > > > > > > virtio_blk_discard_write_zeroes_complete ( +> +> > > > > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete +> +> > > > > > > > (acb=0x558a2eed7420) +> +> > > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized +> +> > > > > > > > out>, +> +> > > > > > > > i1=<optimized out>) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > > > > #7 0x0000000000000000 in ?? () +> +> > > > > > > > +> +> > > > > > > > It seems like qemu was completing a discard/write_zero +> +> > > > > > > > request, but +> +> > > > > > > > parent BusState was already freed & set to NULL. +> +> > > > > > > > +> +> > > > > > > > Do we need to drain all pending request before unrealizing +> +> > > > > > > > virtio-blk +> +> > > > > > > > device? Like the following patch proposed? +> +> > > > > > > > +> +> > > > > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > > > > +> +> > > > > > > > If more info is needed, please let me know. +> +> > > > > > > +> +> > > > > > > may be this will help: +> +> > > > > > > +https://patchwork.kernel.org/patch/11213047/ +> +> > > > > > +> +> > > > > > Yeah, this looks promising! I'll try it out (though it's a +> +> > > > > > one-time +> +> > > > > > crash for me). Thanks! +> +> > > > > +> +> > > > > After applying this patch, I don't see the original segfaut and +> +> > > > > backtrace, but I see this crash +> +> > > > > +> +> > > > > [Thread debugging using libthread_db enabled] +> +> > > > > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, +> +> > > > > size=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > > 1324 VirtIOPCIProxy *proxy = +> +> > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > > > > Missing separate debuginfos, use: debuginfo-install +> +> > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > > > > libstdc++-4.8.5-28.alios7.1.x86_64 +> +> > > > > numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> > > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> > > > > (gdb) bt +> +> > > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, +> +> > > > > size=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > > #1 0x0000561216835b22 in memory_region_write_accessor +> +> > > > > (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, +> +> > > > > size=<optimized out>, shift=<optimized out>, mask=<optimized out>, +> +> > > > > attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502 +> +> > > > > #2 0x0000561216833c5d in access_with_adjusted_size +> +> > > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, +> +> > > > > size=size@entry=2, access_size_min=<optimized out>, +> +> > > > > access_size_max=<optimized out>, access_fn=0x561216835ac0 +> +> > > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...) +> +> > > > > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > > > > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > > > > #4 0x00005612167e036f in flatview_write_continue +> +> > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, +> +> > > > > attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out +> +> > > > > of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized +> +> > > > > out>, mr=0x56121846d340) +> +> > > > > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address +> +> > > > > 0x7fce7dd97028 out of bounds>, len=2) at +> +> > > > > /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > > > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > > > > addr=<optimized out>, attrs=..., buf=<optimized out>, +> +> > > > > len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > > > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > > > > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of +> +> > > > > bounds>, len=<optimized out>, is_write=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > > > > #8 0x0000561216849da1 in kvm_cpu_exec +> +> > > > > (cpu=cpu@entry=0x56121849aa00) at +> +> > > > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > > > > (arg=arg@entry=0x56121849aa00) at +> +> > > > > /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) +> +> > > > > at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > > > > #11 0x00007fce7bef6e25 in start_thread () from +> +> > > > > /lib64/libpthread.so.0 +> +> > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > > > > +> +> > > > > And I searched and found +> +> > > > > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the +> +> > > > > same +> +> > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: +> +> > > > > Add +> +> > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this +> +> > > > > particular +> +> > > > > bug. +> +> > > > > +> +> > > > > But I can still hit the bug even after applying the commit. Do I +> +> > > > > miss +> +> > > > > anything? +> +> > > > +> +> > > > Hi Eryu, +> +> > > > This backtrace seems to be caused by this bug (there were two bugs in +> +> > > > 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> > > > Although the solution hasn't been tested on virtio-blk yet, you may +> +> > > > want to apply this patch: +> +> > > > +> +> > > > +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> > > > Let me know if this works. +> +> > > +> +> > > Unfortunately, I still see the same segfault & backtrace after applying +> +> > > commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +> > > deletion") +> +> > > +> +> > > Anything I can help to debug? +> +> > +> +> > Please post the QEMU command-line and the QMP commands use to remove the +> +> > device. +> +> +> +> It's a normal kata instance using virtio-fs as rootfs. +> +> +> +> /usr/local/libexec/qemu-kvm -name +> +> sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ +> +> -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine +> +> q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ +> +> -cpu host -qmp +> +> unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +> \ +> +> -qmp +> +> unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +> \ +> +> -m 2048M,slots=10,maxmem=773893M -device +> +> pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ +> +> -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device +> +> virtconsole,chardev=charconsole0,id=console0 \ +> +> -chardev +> +> socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait +> +> \ +> +> -device +> +> virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 +> +> \ +> +> -chardev +> +> socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait +> +> \ +> +> -device nvdimm,id=nv0,memdev=mem0 -object +> +> memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 +> +> \ +> +> -object rng-random,id=rng0,filename=/dev/urandom -device +> +> virtio-rng,rng=rng0,romfile= \ +> +> -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ +> +> -chardev +> +> socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait +> +> \ +> +> -chardev +> +> socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock +> +> \ +> +> -device +> +> vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M +> +> -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ +> +> -device +> +> driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= +> +> \ +> +> -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config +> +> -nodefaults -nographic -daemonize \ +> +> -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on +> +> -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ +> +> -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 +> +> i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp +> +> reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests +> +> net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 +> +> rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet +> +> systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false +> +> init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target +> +> systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \ +> +> -pidfile +> +> /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid +> +> \ +> +> -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 +> +> +> +> QMP command to delete device (the device id is just an example, not the +> +> one caused the crash): +> +> +> +> "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" +> +> +> +> which has been hot plugged by: +> +> "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" +> +> "{\"return\": {}}" +> +> "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" +> +> "{\"return\": {}}" +> +> +Thanks. I wasn't able to reproduce this crash with qemu.git/master. +> +> +One thing that is strange about the latest backtrace you posted: QEMU is +> +dispatching the memory access instead of using the ioeventfd code that +> +that virtio-blk-pci normally takes when a virtqueue is notified. I +> +guess this means ioeventfd has already been disabled due to the hot +> +unplug. +> +> +Could you try with machine type "i440fx" instead of "q35"? I wonder if +> +pci-bridge/shpc is part of the problem. +Sure, will try it. But it may take some time, as the test bed is busy +with other testing tasks. I'll report back once I got the results. + +Thanks, +Eryu + |