permissions: 0.992 debug: 0.986 architecture: 0.985 PID: 0.984 assembly: 0.981 semantic: 0.976 graphic: 0.974 alpha: 0.973 arm: 0.973 device: 0.970 boot: 0.969 kernel: 0.968 performance: 0.965 vnc: 0.958 files: 0.957 TCG: 0.955 socket: 0.955 virtual: 0.950 user-level: 0.950 network: 0.950 risc-v: 0.950 register: 0.947 ppc: 0.942 mistranslation: 0.942 peripherals: 0.934 VMM: 0.930 operating system: 0.930 x86: 0.928 KVM: 0.924 hypervisor: 0.923 i386: 0.882 [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device Hi, I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I think it's because io completion hits use-after-free when device is already gone. Is this a known bug that has been fixed? (I went through the git log but didn't find anything obvious). gdb backtrace is: Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. Program terminated with signal 11, Segmentation fault. #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903 903 return obj->class; (gdb) bt #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903 #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,     vector=) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (     opaque=0x558a2f2fd420, ret=0)     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 #4  0x0000558a2c3031db in coroutine_trampoline (i0=,     i1=) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 #6  0x00007fff9ed75780 in ?? () #7  0x0000000000000000 in ?? () It seems like qemu was completing a discard/write_zero request, but parent BusState was already freed & set to NULL. Do we need to drain all pending request before unrealizing virtio-blk device? Like the following patch proposed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html If more info is needed, please let me know. Thanks, Eryu On Tue, 31 Dec 2019 18:34:34 +0800 Eryu Guan wrote: > Hi, > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I > think it's because io completion hits use-after-free when device is > already gone. Is this a known bug that has been fixed? (I went through > the git log but didn't find anything obvious). > > gdb backtrace is: > > Core was generated by `/usr/local/libexec/qemu-kvm -name > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > Program terminated with signal 11, Segmentation fault. > #0 object_get_class (obj=obj@entry=0x0) at > /usr/src/debug/qemu-4.0/qom/object.c:903 > 903 return obj->class; > (gdb) bt > #0 object_get_class (obj=obj@entry=0x0) at > /usr/src/debug/qemu-4.0/qom/object.c:903 > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, >     vector=) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( >     opaque=0x558a2f2fd420, ret=0) >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > #4  0x0000558a2c3031db in coroutine_trampoline (i0=, >     i1=) at > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > #6  0x00007fff9ed75780 in ?? () > #7  0x0000000000000000 in ?? () > > It seems like qemu was completing a discard/write_zero request, but > parent BusState was already freed & set to NULL. > > Do we need to drain all pending request before unrealizing virtio-blk > device? Like the following patch proposed? > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > If more info is needed, please let me know. may be this will help: https://patchwork.kernel.org/patch/11213047/ > > Thanks, > Eryu > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: > On Tue, 31 Dec 2019 18:34:34 +0800 > Eryu Guan wrote: > > > Hi, > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I > > think it's because io completion hits use-after-free when device is > > already gone. Is this a known bug that has been fixed? (I went through > > the git log but didn't find anything obvious). > > > > gdb backtrace is: > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > > Program terminated with signal 11, Segmentation fault. > > #0 object_get_class (obj=obj@entry=0x0) at > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > 903 return obj->class; > > (gdb) bt > > #0 object_get_class (obj=obj@entry=0x0) at > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, > >     vector=) at > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( > >     opaque=0x558a2f2fd420, ret=0) > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=, > >     i1=) at > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > > #6  0x00007fff9ed75780 in ?? () > > #7  0x0000000000000000 in ?? () > > > > It seems like qemu was completing a discard/write_zero request, but > > parent BusState was already freed & set to NULL. > > > > Do we need to drain all pending request before unrealizing virtio-blk > > device? Like the following patch proposed? > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > > > If more info is needed, please let me know. > > may be this will help: https://patchwork.kernel.org/patch/11213047/ Yeah, this looks promising! I'll try it out (though it's a one-time crash for me). Thanks! Eryu On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: > > On Tue, 31 Dec 2019 18:34:34 +0800 > > Eryu Guan wrote: > > > > > Hi, > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I > > > think it's because io completion hits use-after-free when device is > > > already gone. Is this a known bug that has been fixed? (I went through > > > the git log but didn't find anything obvious). > > > > > > gdb backtrace is: > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > > > Program terminated with signal 11, Segmentation fault. > > > #0 object_get_class (obj=obj@entry=0x0) at > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > 903 return obj->class; > > > (gdb) bt > > > #0 object_get_class (obj=obj@entry=0x0) at > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, > > >     vector=) at > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( > > >     opaque=0x558a2f2fd420, ret=0) > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=, > > >     i1=) at > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > > > #6  0x00007fff9ed75780 in ?? () > > > #7  0x0000000000000000 in ?? () > > > > > > It seems like qemu was completing a discard/write_zero request, but > > > parent BusState was already freed & set to NULL. > > > > > > Do we need to drain all pending request before unrealizing virtio-blk > > > device? Like the following patch proposed? > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > > > > > If more info is needed, please let me know. > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/ > > Yeah, this looks promising! I'll try it out (though it's a one-time > crash for me). Thanks! After applying this patch, I don't see the original segfaut and backtrace, but I see this crash [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. Program terminated with signal 11, Segmentation fault. #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=, size=) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 1324 VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 (gdb) bt #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=, size=) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 #1 0x0000561216835b22 in memory_region_write_accessor (mr=, addr=, value=, size=, shift=, mask=, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502 #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=, access_size_max=, access_fn=0x561216835ac0 , mr=0x56121846d340, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:568 #3 0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 #4 0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028
, len=len@entry=2, addr1=, l=, mr=0x56121846d340) at /usr/src/debug/qemu-4.0/exec.c:3279 #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028
, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 #6 0x00005612167e4a1b in address_space_write (as=, addr=, attrs=..., buf=, len=) at /usr/src/debug/qemu-4.0/exec.c:3408 #7 0x00005612167e4aa5 in address_space_rw (as=, addr=, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028
, len=, is_write=) at /usr/src/debug/qemu-4.0/exec.c:3419 #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 #9 0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 #10 0x0000561216b794d6 in qemu_thread_start (args=) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 And I searched and found https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add blk_drain() to virtio_blk_device_unrealize()") is to fix this particular bug. But I can still hit the bug even after applying the commit. Do I miss anything? Thanks, Eryu > Eryu On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan wrote: > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: > > > On Tue, 31 Dec 2019 18:34:34 +0800 > > > Eryu Guan wrote: > > > > > > > Hi, > > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I > > > > think it's because io completion hits use-after-free when device is > > > > already gone. Is this a known bug that has been fixed? (I went through > > > > the git log but didn't find anything obvious). > > > > > > > > gdb backtrace is: > > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > > > > Program terminated with signal 11, Segmentation fault. > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > 903 return obj->class; > > > > (gdb) bt > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, > > > > vector=) at > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( > > > > opaque=0x558a2f2fd420, ret=0) > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=, > > > > i1=) at > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > > > > #6 0x00007fff9ed75780 in ?? () > > > > #7 0x0000000000000000 in ?? () > > > > > > > > It seems like qemu was completing a discard/write_zero request, but > > > > parent BusState was already freed & set to NULL. > > > > > > > > Do we need to drain all pending request before unrealizing virtio-blk > > > > device? Like the following patch proposed? > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > > > > > > > If more info is needed, please let me know. > > > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/ > > > > Yeah, this looks promising! I'll try it out (though it's a one-time > > crash for me). Thanks! > > After applying this patch, I don't see the original segfaut and > backtrace, but I see this crash > > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > Core was generated by `/usr/local/libexec/qemu-kvm -name > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. > Program terminated with signal 11, Segmentation fault. > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, > addr=0, val=, size=) at > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > 1324 VirtIOPCIProxy *proxy = > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); > Missing separate debuginfos, use: debuginfo-install > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 > (gdb) bt > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, > addr=0, val=, size=) at > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > #1 0x0000561216835b22 in memory_region_write_accessor (mr=, > addr=, value=, size=, > shift=, mask=, attrs=...) at > /usr/src/debug/qemu-4.0/memory.c:502 > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, > access_size_min=, access_size_max=, > access_fn=0x561216835ac0 , mr=0x56121846d340, > attrs=...) > at /usr/src/debug/qemu-4.0/memory.c:568 > #3 0x0000561216837c66 in memory_region_dispatch_write > (mr=mr@entry=0x56121846d340, addr=0, data=, size=2, > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 > #4 0x00005612167e036f in flatview_write_continue > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., > buf=buf@entry=0x7fce7dd97028
, > len=len@entry=2, addr1=, l=, mr=0x56121846d340) > at /usr/src/debug/qemu-4.0/exec.c:3279 > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, > addr=841813602304, attrs=..., buf=0x7fce7dd97028
of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 > #6 0x00005612167e4a1b in address_space_write (as=, > addr=, attrs=..., buf=, len=) at > /usr/src/debug/qemu-4.0/exec.c:3408 > #7 0x00005612167e4aa5 in address_space_rw (as=, > addr=, attrs=..., attrs@entry=..., > buf=buf@entry=0x7fce7dd97028
, > len=, is_write=) at > /usr/src/debug/qemu-4.0/exec.c:3419 > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 > #10 0x0000561216b794d6 in qemu_thread_start (args=) at > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 > > And I searched and found > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular > bug. > > But I can still hit the bug even after applying the commit. Do I miss > anything? Hi Eryu, This backtrace seems to be caused by this bug (there were two bugs in 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480 Although the solution hasn't been tested on virtio-blk yet, you may want to apply this patch: https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html Let me know if this works. Best regards, Julia Suvorova. On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan wrote: > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: > > > > On Tue, 31 Dec 2019 18:34:34 +0800 > > > > Eryu Guan wrote: > > > > > > > > > Hi, > > > > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I > > > > > think it's because io completion hits use-after-free when device is > > > > > already gone. Is this a known bug that has been fixed? (I went through > > > > > the git log but didn't find anything obvious). > > > > > > > > > > gdb backtrace is: > > > > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > > > > > Program terminated with signal 11, Segmentation fault. > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > 903 return obj->class; > > > > > (gdb) bt > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, > > > > > vector=) at > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > > > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( > > > > > opaque=0x558a2f2fd420, ret=0) > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=, > > > > > i1=) at > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > > > > > #6 0x00007fff9ed75780 in ?? () > > > > > #7 0x0000000000000000 in ?? () > > > > > > > > > > It seems like qemu was completing a discard/write_zero request, but > > > > > parent BusState was already freed & set to NULL. > > > > > > > > > > Do we need to drain all pending request before unrealizing virtio-blk > > > > > device? Like the following patch proposed? > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > > > > > > > > > If more info is needed, please let me know. > > > > > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/ > > > > > > Yeah, this looks promising! I'll try it out (though it's a one-time > > > crash for me). Thanks! > > > > After applying this patch, I don't see the original segfaut and > > backtrace, but I see this crash > > > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. > > Program terminated with signal 11, Segmentation fault. > > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, > > addr=0, val=, size=) at > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > 1324 VirtIOPCIProxy *proxy = > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); > > Missing separate debuginfos, use: debuginfo-install > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 > > (gdb) bt > > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, > > addr=0, val=, size=) at > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=, > > addr=, value=, size=, > > shift=, mask=, attrs=...) at > > /usr/src/debug/qemu-4.0/memory.c:502 > > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, > > access_size_min=, access_size_max=, > > access_fn=0x561216835ac0 , mr=0x56121846d340, > > attrs=...) > > at /usr/src/debug/qemu-4.0/memory.c:568 > > #3 0x0000561216837c66 in memory_region_dispatch_write > > (mr=mr@entry=0x56121846d340, addr=0, data=, size=2, > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 > > #4 0x00005612167e036f in flatview_write_continue > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., > > buf=buf@entry=0x7fce7dd97028
, > > len=len@entry=2, addr1=, l=, > > mr=0x56121846d340) > > at /usr/src/debug/qemu-4.0/exec.c:3279 > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, > > addr=841813602304, attrs=..., buf=0x7fce7dd97028
> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 > > #6 0x00005612167e4a1b in address_space_write (as=, > > addr=, attrs=..., buf=, len=) > > at /usr/src/debug/qemu-4.0/exec.c:3408 > > #7 0x00005612167e4aa5 in address_space_rw (as=, > > addr=, attrs=..., attrs@entry=..., > > buf=buf@entry=0x7fce7dd97028
, > > len=, is_write=) at > > /usr/src/debug/qemu-4.0/exec.c:3419 > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 > > #10 0x0000561216b794d6 in qemu_thread_start (args=) at > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 > > > > And I searched and found > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular > > bug. > > > > But I can still hit the bug even after applying the commit. Do I miss > > anything? > > Hi Eryu, > This backtrace seems to be caused by this bug (there were two bugs in > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480 > Although the solution hasn't been tested on virtio-blk yet, you may > want to apply this patch: > https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html > Let me know if this works. Will try it out, thanks a lot! Eryu On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan wrote: > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: > > > > On Tue, 31 Dec 2019 18:34:34 +0800 > > > > Eryu Guan wrote: > > > > > > > > > Hi, > > > > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I > > > > > think it's because io completion hits use-after-free when device is > > > > > already gone. Is this a known bug that has been fixed? (I went through > > > > > the git log but didn't find anything obvious). > > > > > > > > > > gdb backtrace is: > > > > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > > > > > Program terminated with signal 11, Segmentation fault. > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > 903 return obj->class; > > > > > (gdb) bt > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, > > > > > vector=) at > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > > > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( > > > > > opaque=0x558a2f2fd420, ret=0) > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=, > > > > > i1=) at > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > > > > > #6 0x00007fff9ed75780 in ?? () > > > > > #7 0x0000000000000000 in ?? () > > > > > > > > > > It seems like qemu was completing a discard/write_zero request, but > > > > > parent BusState was already freed & set to NULL. > > > > > > > > > > Do we need to drain all pending request before unrealizing virtio-blk > > > > > device? Like the following patch proposed? > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > > > > > > > > > If more info is needed, please let me know. > > > > > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/ > > > > > > Yeah, this looks promising! I'll try it out (though it's a one-time > > > crash for me). Thanks! > > > > After applying this patch, I don't see the original segfaut and > > backtrace, but I see this crash > > > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. > > Program terminated with signal 11, Segmentation fault. > > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, > > addr=0, val=, size=) at > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > 1324 VirtIOPCIProxy *proxy = > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); > > Missing separate debuginfos, use: debuginfo-install > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 > > (gdb) bt > > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, > > addr=0, val=, size=) at > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=, > > addr=, value=, size=, > > shift=, mask=, attrs=...) at > > /usr/src/debug/qemu-4.0/memory.c:502 > > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, > > access_size_min=, access_size_max=, > > access_fn=0x561216835ac0 , mr=0x56121846d340, > > attrs=...) > > at /usr/src/debug/qemu-4.0/memory.c:568 > > #3 0x0000561216837c66 in memory_region_dispatch_write > > (mr=mr@entry=0x56121846d340, addr=0, data=, size=2, > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 > > #4 0x00005612167e036f in flatview_write_continue > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., > > buf=buf@entry=0x7fce7dd97028
, > > len=len@entry=2, addr1=, l=, > > mr=0x56121846d340) > > at /usr/src/debug/qemu-4.0/exec.c:3279 > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, > > addr=841813602304, attrs=..., buf=0x7fce7dd97028
> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 > > #6 0x00005612167e4a1b in address_space_write (as=, > > addr=, attrs=..., buf=, len=) > > at /usr/src/debug/qemu-4.0/exec.c:3408 > > #7 0x00005612167e4aa5 in address_space_rw (as=, > > addr=, attrs=..., attrs@entry=..., > > buf=buf@entry=0x7fce7dd97028
, > > len=, is_write=) at > > /usr/src/debug/qemu-4.0/exec.c:3419 > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 > > #10 0x0000561216b794d6 in qemu_thread_start (args=) at > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 > > > > And I searched and found > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular > > bug. > > > > But I can still hit the bug even after applying the commit. Do I miss > > anything? > > Hi Eryu, > This backtrace seems to be caused by this bug (there were two bugs in > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480 > Although the solution hasn't been tested on virtio-blk yet, you may > want to apply this patch: > https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html > Let me know if this works. Unfortunately, I still see the same segfault & backtrace after applying commit 421afd2fe8dd ("virtio: reset region cache when on queue deletion") Anything I can help to debug? Thanks, Eryu On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan wrote: > > > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 > > > > > Eryu Guan wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, > > > > > > I > > > > > > think it's because io completion hits use-after-free when device is > > > > > > already gone. Is this a known bug that has been fixed? (I went > > > > > > through > > > > > > the git log but didn't find anything obvious). > > > > > > > > > > > > gdb backtrace is: > > > > > > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > > > > > > Program terminated with signal 11, Segmentation fault. > > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > > 903 return obj->class; > > > > > > (gdb) bt > > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, > > > > > > vector=) at > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > > > > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( > > > > > > opaque=0x558a2f2fd420, ret=0) > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=, > > > > > > i1=) at > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > > > > > > #6 0x00007fff9ed75780 in ?? () > > > > > > #7 0x0000000000000000 in ?? () > > > > > > > > > > > > It seems like qemu was completing a discard/write_zero request, but > > > > > > parent BusState was already freed & set to NULL. > > > > > > > > > > > > Do we need to drain all pending request before unrealizing > > > > > > virtio-blk > > > > > > device? Like the following patch proposed? > > > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > > > > > > > > > > > If more info is needed, please let me know. > > > > > > > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/ > > > > > > > > Yeah, this looks promising! I'll try it out (though it's a one-time > > > > crash for me). Thanks! > > > > > > After applying this patch, I don't see the original segfaut and > > > backtrace, but I see this crash > > > > > > [Thread debugging using libthread_db enabled] > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. > > > Program terminated with signal 11, Segmentation fault. > > > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, > > > addr=0, val=, size=) at > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > > 1324 VirtIOPCIProxy *proxy = > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); > > > Missing separate debuginfos, use: debuginfo-install > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 > > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 > > > (gdb) bt > > > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, > > > addr=0, val=, size=) at > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > > #1 0x0000561216835b22 in memory_region_write_accessor (mr= > > out>, addr=, value=, size=, > > > shift=, mask=, attrs=...) at > > > /usr/src/debug/qemu-4.0/memory.c:502 > > > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, > > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, > > > access_size_min=, access_size_max=, > > > access_fn=0x561216835ac0 , > > > mr=0x56121846d340, attrs=...) > > > at /usr/src/debug/qemu-4.0/memory.c:568 > > > #3 0x0000561216837c66 in memory_region_dispatch_write > > > (mr=mr@entry=0x56121846d340, addr=0, data=, size=2, > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 > > > #4 0x00005612167e036f in flatview_write_continue > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., > > > buf=buf@entry=0x7fce7dd97028
, > > > len=len@entry=2, addr1=, l=, > > > mr=0x56121846d340) > > > at /usr/src/debug/qemu-4.0/exec.c:3279 > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028
> > out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 > > > #6 0x00005612167e4a1b in address_space_write (as=, > > > addr=, attrs=..., buf=, len= > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 > > > #7 0x00005612167e4aa5 in address_space_rw (as=, > > > addr=, attrs=..., attrs@entry=..., > > > buf=buf@entry=0x7fce7dd97028
, > > > len=, is_write=) at > > > /usr/src/debug/qemu-4.0/exec.c:3419 > > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at > > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 > > > #10 0x0000561216b794d6 in qemu_thread_start (args=) at > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 > > > > > > And I searched and found > > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular > > > bug. > > > > > > But I can still hit the bug even after applying the commit. Do I miss > > > anything? > > > > Hi Eryu, > > This backtrace seems to be caused by this bug (there were two bugs in > > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480 > > Although the solution hasn't been tested on virtio-blk yet, you may > > want to apply this patch: > > https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html > > Let me know if this works. > > Unfortunately, I still see the same segfault & backtrace after applying > commit 421afd2fe8dd ("virtio: reset region cache when on queue > deletion") > > Anything I can help to debug? Please post the QEMU command-line and the QMP commands use to remove the device. The backtrace shows a vcpu thread submitting a request. The device seems to be partially destroyed. That's surprising because the monitor and the vcpu thread should use the QEMU global mutex to avoid race conditions. Maybe seeing the QMP commands will make it clearer... Stefan signature.asc Description: PGP signature On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan wrote: > > > > > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 > > > > > > Eryu Guan wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata > > > > > > > sandbox, I > > > > > > > think it's because io completion hits use-after-free when device > > > > > > > is > > > > > > > already gone. Is this a known bug that has been fixed? (I went > > > > > > > through > > > > > > > the git log but didn't find anything obvious). > > > > > > > > > > > > > > gdb backtrace is: > > > > > > > > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > > > > > > > Program terminated with signal 11, Segmentation fault. > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > > > 903 return obj->class; > > > > > > > (gdb) bt > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector > > > > > > > (vdev=0x558a2e7751d0, > > > > > > > vector=) at > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > > > > > > > #2 0x0000558a2bfdcb1e in > > > > > > > virtio_blk_discard_write_zeroes_complete ( > > > > > > > opaque=0x558a2f2fd420, ret=0) > > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) > > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0= > > > > > > out>, > > > > > > > i1=) at > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > > > > > > > #6 0x00007fff9ed75780 in ?? () > > > > > > > #7 0x0000000000000000 in ?? () > > > > > > > > > > > > > > It seems like qemu was completing a discard/write_zero request, > > > > > > > but > > > > > > > parent BusState was already freed & set to NULL. > > > > > > > > > > > > > > Do we need to drain all pending request before unrealizing > > > > > > > virtio-blk > > > > > > > device? Like the following patch proposed? > > > > > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > > > > > > > > > > > > > If more info is needed, please let me know. > > > > > > > > > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/ > > > > > > > > > > Yeah, this looks promising! I'll try it out (though it's a one-time > > > > > crash for me). Thanks! > > > > > > > > After applying this patch, I don't see the original segfaut and > > > > backtrace, but I see this crash > > > > > > > > [Thread debugging using libthread_db enabled] > > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. > > > > Program terminated with signal 11, Segmentation fault. > > > > #0 0x0000561216a57609 in virtio_pci_notify_write > > > > (opaque=0x5612184747e0, addr=0, val=, size= > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > > > 1324 VirtIOPCIProxy *proxy = > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); > > > > Missing separate debuginfos, use: debuginfo-install > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 > > > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 > > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 > > > > (gdb) bt > > > > #0 0x0000561216a57609 in virtio_pci_notify_write > > > > (opaque=0x5612184747e0, addr=0, val=, size= > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > > > #1 0x0000561216835b22 in memory_region_write_accessor (mr= > > > out>, addr=, value=, size= > > > out>, shift=, mask=, attrs=...) at > > > > /usr/src/debug/qemu-4.0/memory.c:502 > > > > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, > > > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, > > > > access_size_min=, access_size_max=, > > > > access_fn=0x561216835ac0 , > > > > mr=0x56121846d340, attrs=...) > > > > at /usr/src/debug/qemu-4.0/memory.c:568 > > > > #3 0x0000561216837c66 in memory_region_dispatch_write > > > > (mr=mr@entry=0x56121846d340, addr=0, data=, size=2, > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 > > > > #4 0x00005612167e036f in flatview_write_continue > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., > > > > buf=buf@entry=0x7fce7dd97028
, > > > > len=len@entry=2, addr1=, l=, > > > > mr=0x56121846d340) > > > > at /usr/src/debug/qemu-4.0/exec.c:3279 > > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028
> > > 0x7fce7dd97028 out of bounds>, len=2) at > > > > /usr/src/debug/qemu-4.0/exec.c:3318 > > > > #6 0x00005612167e4a1b in address_space_write (as=, > > > > addr=, attrs=..., buf=, len= > > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 > > > > #7 0x00005612167e4aa5 in address_space_rw (as=, > > > > addr=, attrs=..., attrs@entry=..., > > > > buf=buf@entry=0x7fce7dd97028
, > > > > len=, is_write=) at > > > > /usr/src/debug/qemu-4.0/exec.c:3419 > > > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) > > > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 > > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn > > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=) at > > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 > > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 > > > > > > > > And I searched and found > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular > > > > bug. > > > > > > > > But I can still hit the bug even after applying the commit. Do I miss > > > > anything? > > > > > > Hi Eryu, > > > This backtrace seems to be caused by this bug (there were two bugs in > > > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480 > > > Although the solution hasn't been tested on virtio-blk yet, you may > > > want to apply this patch: > > > https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html > > > Let me know if this works. > > > > Unfortunately, I still see the same segfault & backtrace after applying > > commit 421afd2fe8dd ("virtio: reset region cache when on queue > > deletion") > > > > Anything I can help to debug? > > Please post the QEMU command-line and the QMP commands use to remove the > device. It's a normal kata instance using virtio-fs as rootfs. /usr/local/libexec/qemu-kvm -name sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ -cpu host -qmp unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait \ -qmp unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait \ -m 2048M,slots=10,maxmem=773893M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 \ -chardev socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait \ -device virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ -chardev socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait \ -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 \ -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng,rng=rng0,romfile= \ -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ -chardev socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait \ -chardev socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock \ -device vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ -device driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= \ -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize \ -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \ -pidfile /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid \ -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 QMP command to delete device (the device id is just an example, not the one caused the crash): "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" which has been hot plugged by: "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" "{\"return\": {}}" "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" "{\"return\": {}}" > > The backtrace shows a vcpu thread submitting a request. The device > seems to be partially destroyed. That's surprising because the monitor > and the vcpu thread should use the QEMU global mutex to avoid race > conditions. Maybe seeing the QMP commands will make it clearer... > > Stefan Thanks! Eryu On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: > On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: > > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: > > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: > > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan wrote: > > > > > > > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: > > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: > > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 > > > > > > > Eryu Guan wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata > > > > > > > > sandbox, I > > > > > > > > think it's because io completion hits use-after-free when > > > > > > > > device is > > > > > > > > already gone. Is this a known bug that has been fixed? (I went > > > > > > > > through > > > > > > > > the git log but didn't find anything obvious). > > > > > > > > > > > > > > > > gdb backtrace is: > > > > > > > > > > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > > > > > > > > Program terminated with signal 11, Segmentation fault. > > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > > > > 903 return obj->class; > > > > > > > > (gdb) bt > > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector > > > > > > > > (vdev=0x558a2e7751d0, > > > > > > > > vector=) at > > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > > > > > > > > #2 0x0000558a2bfdcb1e in > > > > > > > > virtio_blk_discard_write_zeroes_complete ( > > > > > > > > opaque=0x558a2f2fd420, ret=0) > > > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > > > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) > > > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > > > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0= > > > > > > > out>, > > > > > > > > i1=) at > > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > > > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > > > > > > > > #6 0x00007fff9ed75780 in ?? () > > > > > > > > #7 0x0000000000000000 in ?? () > > > > > > > > > > > > > > > > It seems like qemu was completing a discard/write_zero request, > > > > > > > > but > > > > > > > > parent BusState was already freed & set to NULL. > > > > > > > > > > > > > > > > Do we need to drain all pending request before unrealizing > > > > > > > > virtio-blk > > > > > > > > device? Like the following patch proposed? > > > > > > > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > > > > > > > > > > > > > > > If more info is needed, please let me know. > > > > > > > > > > > > > > may be this will help: > > > > > > > https://patchwork.kernel.org/patch/11213047/ > > > > > > > > > > > > Yeah, this looks promising! I'll try it out (though it's a one-time > > > > > > crash for me). Thanks! > > > > > > > > > > After applying this patch, I don't see the original segfaut and > > > > > backtrace, but I see this crash > > > > > > > > > > [Thread debugging using libthread_db enabled] > > > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. > > > > > Program terminated with signal 11, Segmentation fault. > > > > > #0 0x0000561216a57609 in virtio_pci_notify_write > > > > > (opaque=0x5612184747e0, addr=0, val=, size= > > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > > > > 1324 VirtIOPCIProxy *proxy = > > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); > > > > > Missing separate debuginfos, use: debuginfo-install > > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 > > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 > > > > > libstdc++-4.8.5-28.alios7.1.x86_64 > > > > > numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 > > > > > zlib-1.2.7-16.2.alios7.x86_64 > > > > > (gdb) bt > > > > > #0 0x0000561216a57609 in virtio_pci_notify_write > > > > > (opaque=0x5612184747e0, addr=0, val=, size= > > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > > > > #1 0x0000561216835b22 in memory_region_write_accessor (mr= > > > > out>, addr=, value=, size= > > > > out>, shift=, mask=, attrs=...) at > > > > > /usr/src/debug/qemu-4.0/memory.c:502 > > > > > #2 0x0000561216833c5d in access_with_adjusted_size > > > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, > > > > > size=size@entry=2, access_size_min=, > > > > > access_size_max=, access_fn=0x561216835ac0 > > > > > , mr=0x56121846d340, attrs=...) > > > > > at /usr/src/debug/qemu-4.0/memory.c:568 > > > > > #3 0x0000561216837c66 in memory_region_dispatch_write > > > > > (mr=mr@entry=0x56121846d340, addr=0, data=, size=2, > > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 > > > > > #4 0x00005612167e036f in flatview_write_continue > > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., > > > > > buf=buf@entry=0x7fce7dd97028
, > > > > > len=len@entry=2, addr1=, l=, > > > > > mr=0x56121846d340) > > > > > at /usr/src/debug/qemu-4.0/exec.c:3279 > > > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, > > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028
> > > > 0x7fce7dd97028 out of bounds>, len=2) at > > > > > /usr/src/debug/qemu-4.0/exec.c:3318 > > > > > #6 0x00005612167e4a1b in address_space_write (as=, > > > > > addr=, attrs=..., buf=, len= > > > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 > > > > > #7 0x00005612167e4aa5 in address_space_rw (as=, > > > > > addr=, attrs=..., attrs@entry=..., > > > > > buf=buf@entry=0x7fce7dd97028
, > > > > > len=, is_write=) at > > > > > /usr/src/debug/qemu-4.0/exec.c:3419 > > > > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) > > > > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 > > > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn > > > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 > > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=) at > > > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 > > > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 > > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 > > > > > > > > > > And I searched and found > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the > > > > > same > > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add > > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this > > > > > particular > > > > > bug. > > > > > > > > > > But I can still hit the bug even after applying the commit. Do I miss > > > > > anything? > > > > > > > > Hi Eryu, > > > > This backtrace seems to be caused by this bug (there were two bugs in > > > > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480 > > > > Although the solution hasn't been tested on virtio-blk yet, you may > > > > want to apply this patch: > > > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html > > > > Let me know if this works. > > > > > > Unfortunately, I still see the same segfault & backtrace after applying > > > commit 421afd2fe8dd ("virtio: reset region cache when on queue > > > deletion") > > > > > > Anything I can help to debug? > > > > Please post the QEMU command-line and the QMP commands use to remove the > > device. > > It's a normal kata instance using virtio-fs as rootfs. > > /usr/local/libexec/qemu-kvm -name > sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ > -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine > q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ > -cpu host -qmp > unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait > \ > -qmp > unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait > \ > -m 2048M,slots=10,maxmem=773893M -device > pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ > -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device > virtconsole,chardev=charconsole0,id=console0 \ > -chardev > socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait > \ > -device > virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ > -chardev > socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait > \ > -device nvdimm,id=nv0,memdev=mem0 -object > memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 > \ > -object rng-random,id=rng0,filename=/dev/urandom -device > virtio-rng,rng=rng0,romfile= \ > -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ > -chardev > socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait > \ > -chardev > socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock > \ > -device > vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M > -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ > -device > driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= > \ > -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config > -nodefaults -nographic -daemonize \ > -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on > -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ > -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 > i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k > console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 > pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro > ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 > agent.use_vsock=false init=/usr/lib/systemd/systemd > systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service > systemd.mask=systemd-networkd.socket \ > -pidfile > /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid > \ > -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 > > QMP command to delete device (the device id is just an example, not the > one caused the crash): > > "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" > > which has been hot plugged by: > "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" > "{\"return\": {}}" > "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" > "{\"return\": {}}" Thanks. I wasn't able to reproduce this crash with qemu.git/master. One thing that is strange about the latest backtrace you posted: QEMU is dispatching the memory access instead of using the ioeventfd code that that virtio-blk-pci normally takes when a virtqueue is notified. I guess this means ioeventfd has already been disabled due to the hot unplug. Could you try with machine type "i440fx" instead of "q35"? I wonder if pci-bridge/shpc is part of the problem. Stefan signature.asc Description: PGP signature On Tue, Jan 14, 2020 at 04:16:24PM +0000, Stefan Hajnoczi wrote: > On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: > > On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: > > > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: > > > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: > > > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan wrote: > > > > > > > > > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: > > > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: > > > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 > > > > > > > > Eryu Guan wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata > > > > > > > > > sandbox, I > > > > > > > > > think it's because io completion hits use-after-free when > > > > > > > > > device is > > > > > > > > > already gone. Is this a known bug that has been fixed? (I > > > > > > > > > went through > > > > > > > > > the git log but didn't find anything obvious). > > > > > > > > > > > > > > > > > > gdb backtrace is: > > > > > > > > > > > > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. > > > > > > > > > Program terminated with signal 11, Segmentation fault. > > > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > > > > > 903 return obj->class; > > > > > > > > > (gdb) bt > > > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at > > > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 > > > > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector > > > > > > > > > (vdev=0x558a2e7751d0, > > > > > > > > > vector=) at > > > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 > > > > > > > > > #2 0x0000558a2bfdcb1e in > > > > > > > > > virtio_blk_discard_write_zeroes_complete ( > > > > > > > > > opaque=0x558a2f2fd420, ret=0) > > > > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 > > > > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete > > > > > > > > > (acb=0x558a2eed7420) > > > > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 > > > > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0= > > > > > > > > out>, > > > > > > > > > i1=) at > > > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 > > > > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 > > > > > > > > > #6 0x00007fff9ed75780 in ?? () > > > > > > > > > #7 0x0000000000000000 in ?? () > > > > > > > > > > > > > > > > > > It seems like qemu was completing a discard/write_zero > > > > > > > > > request, but > > > > > > > > > parent BusState was already freed & set to NULL. > > > > > > > > > > > > > > > > > > Do we need to drain all pending request before unrealizing > > > > > > > > > virtio-blk > > > > > > > > > device? Like the following patch proposed? > > > > > > > > > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html > > > > > > > > > > > > > > > > > > If more info is needed, please let me know. > > > > > > > > > > > > > > > > may be this will help: > > > > > > > > https://patchwork.kernel.org/patch/11213047/ > > > > > > > > > > > > > > Yeah, this looks promising! I'll try it out (though it's a > > > > > > > one-time > > > > > > > crash for me). Thanks! > > > > > > > > > > > > After applying this patch, I don't see the original segfaut and > > > > > > backtrace, but I see this crash > > > > > > > > > > > > [Thread debugging using libthread_db enabled] > > > > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name > > > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. > > > > > > Program terminated with signal 11, Segmentation fault. > > > > > > #0 0x0000561216a57609 in virtio_pci_notify_write > > > > > > (opaque=0x5612184747e0, addr=0, val=, > > > > > > size=) at > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > > > > > 1324 VirtIOPCIProxy *proxy = > > > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); > > > > > > Missing separate debuginfos, use: debuginfo-install > > > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 > > > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 > > > > > > libstdc++-4.8.5-28.alios7.1.x86_64 > > > > > > numactl-libs-2.0.9-5.1.alios7.x86_64 > > > > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 > > > > > > (gdb) bt > > > > > > #0 0x0000561216a57609 in virtio_pci_notify_write > > > > > > (opaque=0x5612184747e0, addr=0, val=, > > > > > > size=) at > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 > > > > > > #1 0x0000561216835b22 in memory_region_write_accessor > > > > > > (mr=, addr=, value=, > > > > > > size=, shift=, mask=, > > > > > > attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502 > > > > > > #2 0x0000561216833c5d in access_with_adjusted_size > > > > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, > > > > > > size=size@entry=2, access_size_min=, > > > > > > access_size_max=, access_fn=0x561216835ac0 > > > > > > , mr=0x56121846d340, attrs=...) > > > > > > at /usr/src/debug/qemu-4.0/memory.c:568 > > > > > > #3 0x0000561216837c66 in memory_region_dispatch_write > > > > > > (mr=mr@entry=0x56121846d340, addr=0, data=, size=2, > > > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 > > > > > > #4 0x00005612167e036f in flatview_write_continue > > > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, > > > > > > attrs=..., buf=buf@entry=0x7fce7dd97028
> > > > > of bounds>, len=len@entry=2, addr1=, l= > > > > > out>, mr=0x56121846d340) > > > > > > at /usr/src/debug/qemu-4.0/exec.c:3279 > > > > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, > > > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028
> > > > > 0x7fce7dd97028 out of bounds>, len=2) at > > > > > > /usr/src/debug/qemu-4.0/exec.c:3318 > > > > > > #6 0x00005612167e4a1b in address_space_write (as=, > > > > > > addr=, attrs=..., buf=, > > > > > > len=) at /usr/src/debug/qemu-4.0/exec.c:3408 > > > > > > #7 0x00005612167e4aa5 in address_space_rw (as=, > > > > > > addr=, attrs=..., attrs@entry=..., > > > > > > buf=buf@entry=0x7fce7dd97028
> > > > > bounds>, len=, is_write=) at > > > > > > /usr/src/debug/qemu-4.0/exec.c:3419 > > > > > > #8 0x0000561216849da1 in kvm_cpu_exec > > > > > > (cpu=cpu@entry=0x56121849aa00) at > > > > > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 > > > > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn > > > > > > (arg=arg@entry=0x56121849aa00) at > > > > > > /usr/src/debug/qemu-4.0/cpus.c:1281 > > > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=) > > > > > > at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 > > > > > > #11 0x00007fce7bef6e25 in start_thread () from > > > > > > /lib64/libpthread.so.0 > > > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 > > > > > > > > > > > > And I searched and found > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the > > > > > > same > > > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: > > > > > > Add > > > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this > > > > > > particular > > > > > > bug. > > > > > > > > > > > > But I can still hit the bug even after applying the commit. Do I > > > > > > miss > > > > > > anything? > > > > > > > > > > Hi Eryu, > > > > > This backtrace seems to be caused by this bug (there were two bugs in > > > > > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480 > > > > > Although the solution hasn't been tested on virtio-blk yet, you may > > > > > want to apply this patch: > > > > > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html > > > > > Let me know if this works. > > > > > > > > Unfortunately, I still see the same segfault & backtrace after applying > > > > commit 421afd2fe8dd ("virtio: reset region cache when on queue > > > > deletion") > > > > > > > > Anything I can help to debug? > > > > > > Please post the QEMU command-line and the QMP commands use to remove the > > > device. > > > > It's a normal kata instance using virtio-fs as rootfs. > > > > /usr/local/libexec/qemu-kvm -name > > sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ > > -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine > > q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ > > -cpu host -qmp > > unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait > > \ > > -qmp > > unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait > > \ > > -m 2048M,slots=10,maxmem=773893M -device > > pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ > > -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device > > virtconsole,chardev=charconsole0,id=console0 \ > > -chardev > > socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait > > \ > > -device > > virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 > > \ > > -chardev > > socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait > > \ > > -device nvdimm,id=nv0,memdev=mem0 -object > > memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 > > \ > > -object rng-random,id=rng0,filename=/dev/urandom -device > > virtio-rng,rng=rng0,romfile= \ > > -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ > > -chardev > > socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait > > \ > > -chardev > > socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock > > \ > > -device > > vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M > > -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ > > -device > > driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= > > \ > > -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config > > -nodefaults -nographic -daemonize \ > > -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on > > -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ > > -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 > > i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp > > reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests > > net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 > > rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet > > systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false > > init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target > > systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \ > > -pidfile > > /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid > > \ > > -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 > > > > QMP command to delete device (the device id is just an example, not the > > one caused the crash): > > > > "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" > > > > which has been hot plugged by: > > "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" > > "{\"return\": {}}" > > "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" > > "{\"return\": {}}" > > Thanks. I wasn't able to reproduce this crash with qemu.git/master. > > One thing that is strange about the latest backtrace you posted: QEMU is > dispatching the memory access instead of using the ioeventfd code that > that virtio-blk-pci normally takes when a virtqueue is notified. I > guess this means ioeventfd has already been disabled due to the hot > unplug. > > Could you try with machine type "i440fx" instead of "q35"? I wonder if > pci-bridge/shpc is part of the problem. Sure, will try it. But it may take some time, as the test bed is busy with other testing tasks. I'll report back once I got the results. Thanks, Eryu