user-level: 0.896 mistranslation: 0.881 risc-v: 0.859 operating system: 0.844 boot: 0.830 TCG: 0.828 ppc: 0.827 vnc: 0.820 peripherals: 0.820 device: 0.810 hypervisor: 0.806 KVM: 0.803 virtual: 0.802 permissions: 0.802 register: 0.797 VMM: 0.792 debug: 0.788 assembly: 0.768 network: 0.768 graphic: 0.764 arm: 0.761 socket: 0.758 semantic: 0.752 performance: 0.744 architecture: 0.742 kernel: 0.735 PID: 0.731 x86: 0.730 files: 0.730 alpha: 0.712 i386: 0.705 [Qemu-devel] [BUG?] aio_get_linux_aio: Assertion `ctx->linux_aio' failed Hi, I am seeing some strange QEMU assertion failures for qemu on s390x, which prevents a guest from starting. Git bisecting points to the following commit as the source of the error. commit ed6e2161715c527330f936d44af4c547f25f687e Author: Nishanth Aravamudan
Date: Fri Jun 22 12:37:00 2018 -0700 linux-aio: properly bubble up errors from initialization laio_init() can fail for a couple of reasons, which will lead to a NULL pointer dereference in laio_attach_aio_context(). To solve this, add a aio_setup_linux_aio() function which is called early in raw_open_common. If this fails, propagate the error up. The signature of aio_get_linux_aio() was not modified, because it seems preferable to return the actual errno from the possible failing initialization calls. Additionally, when the AioContext changes, we need to associate a LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context callback and call the new aio_setup_linux_aio(), which will allocate a new AioContext if needed, and return errors on failures. If it fails for any reason, fallback to threaded AIO with an error message, as the device is already in-use by the guest. Add an assert that aio_get_linux_aio() cannot return NULL. Signed-off-by: Nishanth Aravamudan Message-id: address@hidden Signed-off-by: Stefan Hajnoczi Not sure what is causing this assertion to fail. Here is the qemu command line of the guest, from qemu log, which throws this error: LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name guest=rt_vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes -machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m 1024 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=28,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -drive file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000 -netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002 -chardev pty,id=charconsole0 -device sclpconsole,chardev=charconsole0,id=console0 -device virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges 2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev pty,id=charconsole0: char device redirected to /dev/pts/3 (label charconsole0) qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion `ctx->linux_aio' failed. 2018-07-17 15:48:43.309+0000: shutting down, reason=failed Any help debugging this would be greatly appreciated. Thank you Farhan On 17.07.2018 [13:25:53 -0400], Farhan Ali wrote: > Hi, > > I am seeing some strange QEMU assertion failures for qemu on s390x, > which prevents a guest from starting. > > Git bisecting points to the following commit as the source of the error. > > commit ed6e2161715c527330f936d44af4c547f25f687e > Author: Nishanth Aravamudan > Date: Fri Jun 22 12:37:00 2018 -0700 > > linux-aio: properly bubble up errors from initialization > > laio_init() can fail for a couple of reasons, which will lead to a NULL > pointer dereference in laio_attach_aio_context(). > > To solve this, add a aio_setup_linux_aio() function which is called > early in raw_open_common. If this fails, propagate the error up. The > signature of aio_get_linux_aio() was not modified, because it seems > preferable to return the actual errno from the possible failing > initialization calls. > > Additionally, when the AioContext changes, we need to associate a > LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context > callback and call the new aio_setup_linux_aio(), which will allocate a > new AioContext if needed, and return errors on failures. If it fails for > any reason, fallback to threaded AIO with an error message, as the > device is already in-use by the guest. > > Add an assert that aio_get_linux_aio() cannot return NULL. > > Signed-off-by: Nishanth Aravamudan > Message-id: address@hidden > Signed-off-by: Stefan Hajnoczi > > > Not sure what is causing this assertion to fail. Here is the qemu command > line of the guest, from qemu log, which throws this error: > > > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin > QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name > guest=rt_vm1,debug-threads=on -S -object > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes > -machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m 1024 > -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object > iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d -display > none -no-user-config -nodefaults -chardev > socket,id=charmonitor,fd=28,server,nowait -mon > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot > strict=on -drive > file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native > -device > virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on > -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device > virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000 > -netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device > virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002 > -chardev pty,id=charconsole0 -device > sclpconsole,chardev=charconsole0,id=console0 -device > virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox > on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg > timestamp=on > > > > 2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges > 2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev pty,id=charconsole0: > char device redirected to /dev/pts/3 (label charconsole0) > qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion > `ctx->linux_aio' failed. > 2018-07-17 15:48:43.309+0000: shutting down, reason=failed > > > Any help debugging this would be greatly appreciated. iiuc, this possibly implies AIO was not actually used previously on this guest (it might have silently been falling back to threaded IO?). I don't have access to s390x, but would it be possible to run qemu under gdb and see if aio_setup_linux_aio is being called at all (I think it might not be, but I'm not sure why), and if so, if it's for the context in question? If it's not being called first, could you see what callpath is calling aio_get_linux_aio when this assertion trips? Thanks! -Nish On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: iiuc, this possibly implies AIO was not actually used previously on this guest (it might have silently been falling back to threaded IO?). I don't have access to s390x, but would it be possible to run qemu under gdb and see if aio_setup_linux_aio is being called at all (I think it might not be, but I'm not sure why), and if so, if it's for the context in question? If it's not being called first, could you see what callpath is calling aio_get_linux_aio when this assertion trips? Thanks! -Nish Hi Nishant, From the coredump of the guest this is the call trace that calls aio_get_linux_aio: Stack trace of thread 145158: #0 0x000003ff94dbe274 raise (libc.so.6) #1 0x000003ff94da39a8 abort (libc.so.6) #2 0x000003ff94db62ce __assert_fail_base (libc.so.6) #3 0x000003ff94db634c __assert_fail (libc.so.6) #4 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) #5 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) #6 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) #7 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) #8 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) #9 0x000002aa20db3c34 aio_poll (qemu-system-s390x) #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) #11 0x000003ff94f879a8 start_thread (libpthread.so.0) #12 0x000003ff94e797ee thread_start (libc.so.6) Thanks for taking a look and responding. Thanks Farhan On 07/18/2018 09:42 AM, Farhan Ali wrote: On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: iiuc, this possibly implies AIO was not actually used previously on this guest (it might have silently been falling back to threaded IO?). I don't have access to s390x, but would it be possible to run qemu under gdb and see if aio_setup_linux_aio is being called at all (I think it might not be, but I'm not sure why), and if so, if it's for the context in question? If it's not being called first, could you see what callpath is calling aio_get_linux_aio when this assertion trips? Thanks! -Nish Hi Nishant, From the coredump of the guest this is the call trace that calls aio_get_linux_aio: Stack trace of thread 145158: #0Â 0x000003ff94dbe274 raise (libc.so.6) #1Â 0x000003ff94da39a8 abort (libc.so.6) #2Â 0x000003ff94db62ce __assert_fail_base (libc.so.6) #3Â 0x000003ff94db634c __assert_fail (libc.so.6) #4Â 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) #5Â 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) #6Â 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) #7Â 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) #8Â 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) #9Â 0x000002aa20db3c34 aio_poll (qemu-system-s390x) #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) #11 0x000003ff94f879a8 start_thread (libpthread.so.0) #12 0x000003ff94e797ee thread_start (libc.so.6) Thanks for taking a look and responding. Thanks Farhan Trying to debug a little further, the block device in this case is a "host device". And looking at your commit carefully you use the bdrv_attach_aio_context callback to setup a Linux AioContext. For some reason the "host device" struct (BlockDriver bdrv_host_device in block/file-posix.c) does not have a bdrv_attach_aio_context defined. So a simple change of adding the callback to the struct solves the issue and the guest starts fine. diff --git a/block/file-posix.c b/block/file-posix.c index 28824aa..b8d59fb 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = { .bdrv_refresh_limits = raw_refresh_limits, .bdrv_io_plug = raw_aio_plug, .bdrv_io_unplug = raw_aio_unplug, + .bdrv_attach_aio_context = raw_aio_attach_aio_context, .bdrv_co_truncate = raw_co_truncate, .bdrv_getlength = raw_getlength, I am not too familiar with block device code in QEMU, so not sure if this is the right fix or if there are some underlying problems. Thanks Farhan On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote: > > > On 07/18/2018 09:42 AM, Farhan Ali wrote: > > > > > > On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: > > > iiuc, this possibly implies AIO was not actually used previously on this > > > guest (it might have silently been falling back to threaded IO?). I > > > don't have access to s390x, but would it be possible to run qemu under > > > gdb and see if aio_setup_linux_aio is being called at all (I think it > > > might not be, but I'm not sure why), and if so, if it's for the context > > > in question? > > > > > > If it's not being called first, could you see what callpath is calling > > > aio_get_linux_aio when this assertion trips? > > > > > > Thanks! > > > -Nish > > > > > > Hi Nishant, > > > > From the coredump of the guest this is the call trace that calls > > aio_get_linux_aio: > > > > > > Stack trace of thread 145158: > > #0Â 0x000003ff94dbe274 raise (libc.so.6) > > #1Â 0x000003ff94da39a8 abort (libc.so.6) > > #2Â 0x000003ff94db62ce __assert_fail_base (libc.so.6) > > #3Â 0x000003ff94db634c __assert_fail (libc.so.6) > > #4Â 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) > > #5Â 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) > > #6Â 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) > > #7Â 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) > > #8Â 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) > > #9Â 0x000002aa20db3c34 aio_poll (qemu-system-s390x) > > #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) > > #11 0x000003ff94f879a8 start_thread (libpthread.so.0) > > #12 0x000003ff94e797ee thread_start (libc.so.6) > > > > > > Thanks for taking a look and responding. > > > > Thanks > > Farhan > > > > > > > > Trying to debug a little further, the block device in this case is a "host > device". And looking at your commit carefully you use the > bdrv_attach_aio_context callback to setup a Linux AioContext. > > For some reason the "host device" struct (BlockDriver bdrv_host_device in > block/file-posix.c) does not have a bdrv_attach_aio_context defined. > So a simple change of adding the callback to the struct solves the issue and > the guest starts fine. > > > diff --git a/block/file-posix.c b/block/file-posix.c > index 28824aa..b8d59fb 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = { > .bdrv_refresh_limits = raw_refresh_limits, > .bdrv_io_plug = raw_aio_plug, > .bdrv_io_unplug = raw_aio_unplug, > + .bdrv_attach_aio_context = raw_aio_attach_aio_context, > > .bdrv_co_truncate = raw_co_truncate, > .bdrv_getlength = raw_getlength, > > > > I am not too familiar with block device code in QEMU, so not sure if > this is the right fix or if there are some underlying problems. Oh this is quite embarassing! I only added the bdrv_attach_aio_context callback for the file-backed device. Your fix is definitely corect for host device. Let me make sure there weren't any others missed and I will send out a properly formatted patch. Thank you for the quick testing and turnaround! -Nish On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote: > On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote: > > > > > > On 07/18/2018 09:42 AM, Farhan Ali wrote: > >> > >> > >> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: > >>> iiuc, this possibly implies AIO was not actually used previously on this > >>> guest (it might have silently been falling back to threaded IO?). I > >>> don't have access to s390x, but would it be possible to run qemu under > >>> gdb and see if aio_setup_linux_aio is being called at all (I think it > >>> might not be, but I'm not sure why), and if so, if it's for the context > >>> in question? > >>> > >>> If it's not being called first, could you see what callpath is calling > >>> aio_get_linux_aio when this assertion trips? > >>> > >>> Thanks! > >>> -Nish > >> > >> > >> Hi Nishant, > >> > >> From the coredump of the guest this is the call trace that calls > >> aio_get_linux_aio: > >> > >> > >> Stack trace of thread 145158: > >> #0Â 0x000003ff94dbe274 raise (libc.so.6) > >> #1Â 0x000003ff94da39a8 abort (libc.so.6) > >> #2Â 0x000003ff94db62ce __assert_fail_base (libc.so.6) > >> #3Â 0x000003ff94db634c __assert_fail (libc.so.6) > >> #4Â 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) > >> #5Â 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) > >> #6Â 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) > >> #7Â 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) > >> #8Â 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) > >> #9Â 0x000002aa20db3c34 aio_poll (qemu-system-s390x) > >> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) > >> #11 0x000003ff94f879a8 start_thread (libpthread.so.0) > >> #12 0x000003ff94e797ee thread_start (libc.so.6) > >> > >> > >> Thanks for taking a look and responding. > >> > >> Thanks > >> Farhan > >> > >> > >> > > > > Trying to debug a little further, the block device in this case is a "host > > device". And looking at your commit carefully you use the > > bdrv_attach_aio_context callback to setup a Linux AioContext. > > > > For some reason the "host device" struct (BlockDriver bdrv_host_device in > > block/file-posix.c) does not have a bdrv_attach_aio_context defined. > > So a simple change of adding the callback to the struct solves the issue and > > the guest starts fine. > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > index 28824aa..b8d59fb 100644 > > --- a/block/file-posix.c > > +++ b/block/file-posix.c > > @@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = { > > .bdrv_refresh_limits = raw_refresh_limits, > > .bdrv_io_plug = raw_aio_plug, > > .bdrv_io_unplug = raw_aio_unplug, > > + .bdrv_attach_aio_context = raw_aio_attach_aio_context, > > > > .bdrv_co_truncate = raw_co_truncate, > > .bdrv_getlength = raw_getlength, > > > > > > > > I am not too familiar with block device code in QEMU, so not sure if > > this is the right fix or if there are some underlying problems. > > Oh this is quite embarassing! I only added the bdrv_attach_aio_context > callback for the file-backed device. Your fix is definitely corect for > host device. Let me make sure there weren't any others missed and I will > send out a properly formatted patch. Thank you for the quick testing and > turnaround! Farhan, can you respin your patch with proper sign-off and patch description? Adding qemu-block. Hi Christian, On 19.07.2018 [08:55:20 +0200], Christian Borntraeger wrote: > > > On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote: > > On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote: > >> > >> > >> On 07/18/2018 09:42 AM, Farhan Ali wrote: