qemu 2.7.0 segfaults in qemu_co_queue_run_restart() I've been experiencing frequent segfaults lately with qemu 2.7.0 running Ubuntu 16.04 guests. The crash usually happens in qemu_co_queue_run_restart(). I haven't seen this so far with any other guests or distros. Here is one back trace I obtained from one of the crashing VMs. ------------------------------------------------------------------------------------------------- (gdb) bt #0 qemu_co_queue_run_restart (co=0x7fba8ff05aa0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:59 #1 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8ff05aa0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #2 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd20430) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #3 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd20430) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #4 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd14ea0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #5 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd14ea0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #6 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba80c11dc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #7 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba80c11dc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #8 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd0bd70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #9 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd0bd70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #10 0x000055c1656f3fa0 in qemu_co_enter_next (queue=queue@entry=0x55c1669e75e0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:106 #11 0x000055c165692060 in timer_cb (blk=0x55c1669e7590, is_write=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/block/throttle-groups.c:400 #12 0x000055c16564f615 in timerlist_run_timers (timer_list=0x55c166a53e80) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:528 #13 0x000055c16564f679 in timerlistgroup_run_timers (tlg=tlg@entry=0x55c167c81cf8) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:564 #14 0x000055c16564ff47 in aio_dispatch (ctx=ctx@entry=0x55c167c81bb0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:357 #15 0x000055c1656500e8 in aio_poll (ctx=0x55c167c81bb0, blocking=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:479 #16 0x000055c1654b1c79 in iothread_run (opaque=0x55c167c81960) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/iothread.c:46 #17 0x00007fbc4b64f0a4 in allocate_stack (stack=, pdp=, attr=0x0) at allocatestack.c:416 #18 __pthread_create_2_1 (newthread=, attr=, start_routine=, arg=) at pthread_create.c:539 Backtrace stopped: Cannot access memory at address 0x8 ------------------------------------------------------------------------------------------------- The code that crashes is this ------------------------------------------------------------------------------------------------- void qemu_co_queue_run_restart(Coroutine *co) { Coroutine *next; trace_qemu_co_queue_run_restart(co); while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) { QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next); <--- Crash occurs here this time qemu_coroutine_enter(next); } } ------------------------------------------------------------------------------------------------- Expanding the macro QSIMPLEQ_REMOVE_HEAD gives us ------------------------------------------------------------------------------------------------- #define QSIMPLEQ_REMOVE_HEAD(head, field) do { \ if (((head)->sqh_first = (head)->sqh_first->field.sqe_next) == NULL)\ (head)->sqh_last = &(head)->sqh_first; \ } while (/*CONSTCOND*/0) ------------------------------------------------------------------------------------------------- which corrsponds to ------------------------------------------------------------------------------------------------- if (((&co->co_queue_wakeup)->sqh_first = (&co->co_queue_wakeup)->sqh_first->co_queue_next.sqe_next) == NULL)\ (&co->co_queue_wakeup)->sqh_last = &(&co->co_queue_wakeup)->sqh_first; ------------------------------------------------------------------------------------------------- Debugging the list we see ------------------------------------------------------------------------------------------------- (gdb) print *(&co->co_queue_wakeup->sqh_first) $6 = (struct Coroutine *) 0x1000 (gdb) print *(&co->co_queue_wakeup->sqh_first->co_queue_next) Cannot access memory at address 0x1030 ------------------------------------------------------------------------------------------------- So the data in co->co_queue_wakeup->sqh_first is corrupted and represents an invalid address. Any idea why is that? Another stack trace --------------------------------------------------------------------- (gdb) bt #0 qemu_co_queue_run_restart (co=0x7f668be15260) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:59 #1 0x0000564cb19f59a9 in qemu_coroutine_enter (co=0x7f668be15260) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #2 0x0000564cb19f5fa0 in qemu_co_enter_next (queue=queue@entry=0x564cb35e55e0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:106 #3 0x0000564cb1994060 in timer_cb (blk=0x564cb35e5590, is_write=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/block/throttle-groups.c:400 #4 0x0000564cb1951615 in timerlist_run_timers (timer_list=0x564cb3651e80) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:528 #5 0x0000564cb1951679 in timerlistgroup_run_timers (tlg=tlg@entry=0x564cb487fcf8) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:564 #6 0x0000564cb1951f47 in aio_dispatch (ctx=ctx@entry=0x564cb487fbb0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:357 #7 0x0000564cb19520e8 in aio_poll (ctx=0x564cb487fbb0, blocking=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:479 #8 0x0000564cb17b3c79 in iothread_run (opaque=0x564cb487f960) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/iothread.c:46 #9 0x00007f684b0b30a4 in allocate_stack (stack=, pdp=, attr=0x0) at allocatestack.c:416 #10 __pthread_create_2_1 (newthread=, attr=, start_routine=, arg=) at pthread_create.c:539 Backtrace stopped: Cannot access memory at address 0x8 ----------------------------------------------------------------------------------------------- Here is a bit of examination of the data ----------------------------------------------------------------------------------------------- (gdb) print *(&co->co_queue_wakeup->sqh_first) $1 = (struct Coroutine *) 0xc54b578 (gdb) print *(&co->co_queue_wakeup->sqh_first->co_queue_next) Cannot access memory at address 0xc54b5a8 ----------------------------------------------------------------------------------------------- Again seems to be pointing at an invalid address. It's worth noting here that it the number of restarted and re-run co-routines is much smaller. A third stack trace It generates the following stack trace --------------------------------------------------------------------- (gdb) bt #0 qemu_co_queue_run_restart (co=0x7f75ed30dbc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:59 #1 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75ed30dbc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #2 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75f1c0f200) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #3 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75f1c0f200) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #4 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75ed304870) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #5 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75ed304870) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #6 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800fcd0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #7 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800fcd0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #8 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800fac0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #9 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800fac0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #10 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800f8b0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #11 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800f8b0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #12 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75fbf05570) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #13 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75fbf05570) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #14 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e8009b70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #15 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e8009b70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #16 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800b5d0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #17 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800b5d0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #18 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e8008910) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #19 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e8008910) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #20 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800f6a0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #21 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800f6a0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #22 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75fbf05100) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #23 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75fbf05100) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #24 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75fbf04ee0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #25 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75fbf04ee0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #26 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75ed301c50) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #27 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75ed301c50) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #28 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75ed315270) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #29 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75ed315270) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #30 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75ed31cf10) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #31 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75ed31cf10) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #32 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800a970) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #33 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800a970) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #34 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e8007df0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #35 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e8007df0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #36 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e8005960) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #37 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e8005960) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #38 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800e1b0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #39 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800e1b0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #40 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e8000a00) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #41 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e8000a00) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #42 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e8007900) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60 #43 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e8007900) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119 #44 0x0000561927482fa0 in qemu_co_enter_next (queue=queue@entry=0x5619288d15e0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:106 #45 0x0000561927421060 in timer_cb (blk=0x5619288d1590, is_write=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/block/throttle-groups.c:400 #46 0x00005619273de615 in timerlist_run_timers (timer_list=0x56192893de80) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:528 #47 0x00005619273de679 in timerlistgroup_run_timers (tlg=tlg@entry=0x561929b6bcf8) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:564 #48 0x00005619273def47 in aio_dispatch (ctx=ctx@entry=0x561929b6bbb0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:357 #49 0x00005619273df0e8 in aio_poll (ctx=0x561929b6bbb0, blocking=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:479 #50 0x0000561927240c79 in iothread_run (opaque=0x561929b6b960) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/iothread.c:46 #51 0x00007f77b32160a4 in start_thread (arg=0x7f77997ff700) at pthread_create.c:403 #52 0x00007f77b2f4b62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 --------------------------------------------------------------------- It's also crashing in list traversal. Looking at the contained data we see: --------------------------------------------------------------------- (gdb) print *(&co->co_queue_wakeup->sqh_first) $1 = (struct Coroutine *) 0x1 (gdb) print *(&co->co_queue_wakeup->sqh_first->co_queue_next) Cannot access memory at address 0x31 --------------------------------------------------------------------- So again. Segfault is caused by apparently invalid addresses. And this time it occurs after so many invocations of qemu_co_queue_run_restart() The VMs were running with the following arguments --------------------------------------------------------------------- -m 1024,slots=255,maxmem=256G -M pc-i440fx-2.7 -enable-kvm -nodefconfig -nodefaults -rtc base=utc -netdev tap,ifname=n020133f0895e,id=hostnet6,vhost=on,vhostforce=on,vnet_hdr=off,script=no,downscript=no -device virtio-net-pci,netdev=hostnet6,id=net6,mac=02:01:33:f0:89:5e,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -vnc 0.0.0.0:94 -vga qxl -cpu Haswell,+vmx -smp 6,sockets=32,cores=1,maxcpus=64,threads=2 -drive file=/dev/md10,if=none,id=drive-virtio-disk5,format=raw,snapshot=off,aio=native,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk5,num-queues=3,id=virtio-disk5,bootindex=1 -S --------------------------------------------------------------------- Could you please retry with the latest stable version (either 2.8.0 or 2.7.1) ... maybe the problem is already fixed there. Unfortunately it'd not be possible to use another version at the moment. Is it possible that someone takes a look at the stack traces? Fixed by commit 528f449f590829b53ea01ed91817a695b540421d