diff options
Diffstat (limited to 'results/classifier/105/KVM/1224444')
| -rw-r--r-- | results/classifier/105/KVM/1224444 | 657 |
1 files changed, 657 insertions, 0 deletions
diff --git a/results/classifier/105/KVM/1224444 b/results/classifier/105/KVM/1224444 new file mode 100644 index 000000000..c726cb150 --- /dev/null +++ b/results/classifier/105/KVM/1224444 @@ -0,0 +1,657 @@ +KVM: 0.799 +mistranslation: 0.735 +vnc: 0.708 +network: 0.577 +boot: 0.557 +semantic: 0.552 +assembly: 0.539 +device: 0.523 +instruction: 0.504 +other: 0.504 +socket: 0.494 +graphic: 0.438 + +virtio-serial loses writes when used over virtio-mmio + +virtio-serial appears to lose writes, but only when used on top of virtio-mmio. The scenario is this: + +/home/rjones/d/qemu/arm-softmmu/qemu-system-arm \ + -global virtio-blk-device.scsi=off \ + -nodefconfig \ + -nodefaults \ + -nographic \ + -M vexpress-a15 \ + -machine accel=kvm:tcg \ + -m 500 \ + -no-reboot \ + -kernel /home/rjones/d/libguestfs/tmp/.guestfs-1001/kernel.27944 \ + -dtb /home/rjones/d/libguestfs/tmp/.guestfs-1001/dtb.27944 \ + -initrd /home/rjones/d/libguestfs/tmp/.guestfs-1001/initrd.27944 \ + -device virtio-scsi-device,id=scsi \ + -drive file=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/scratch.1,cache=unsafe,format=raw,id=hd0,if=none \ + -device scsi-hd,drive=hd0 \ + -drive file=/home/rjones/d/libguestfs/tmp/.guestfs-1001/root.27944,snapshot=on,id=appliance,cache=unsafe,if=none \ + -device scsi-hd,drive=appliance \ + -device virtio-serial-device \ + -serial stdio \ + -chardev socket,path=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/guestfsd.sock,id=channel0 \ + -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \ + -append 'panic=1 mem=500M console=ttyAMA0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm-256color' + +After the guest starts up, a daemon writes 4 bytes to a virtio-serial socket. The host side reads these 4 bytes correctly and writes a 64 byte message. The guest never sees this message. + +I enabled virtio-mmio debugging, and this is what is printed (## = my comment): + +## guest opens the socket: +trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0' +virtio_mmio: virtio_mmio_write offset 0x50 value 0x3 +opened the socket, sock = 3 +udevadm settle +## guest writes 4 bytes to the socket: +virtio_mmio: virtio_mmio_write offset 0x50 value 0x5 +virtio_mmio: virtio_mmio setting IRQ 1 +virtio_mmio: virtio_mmio_read offset 0x60 +virtio_mmio: virtio_mmio_write offset 0x64 value 0x1 +virtio_mmio: virtio_mmio setting IRQ 0 +sent magic GUESTFS_LAUNCH_FLAG +## host reads 4 bytes successfully: +main_loop libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG +libguestfs: [14605ms] appliance is up +Guest launched OK. +## host writes 64 bytes to socket: +libguestfs: writing the data to the socket (size = 64) +waiting for next request +libguestfs: data written OK +## hangs here forever with guest in read() call never receiving any data + +I am using qemu from git today (2d1fe1873a984). + +strace -f of qemu when it fails. + +Notes: + + - fd = 6 is the Unix domain socket connected to virtio-serial + - only one 4 byte write occurs to this socket (expected guest -> host communication) + - the socket isn't read at all (even though the library on the other side has written) + - the socket is never added to any poll/ppoll syscall, so it's no wonder that qemu never sees any data on the socket + + +Recall this bug only happens intermittently. This is an strace -f of qemu when it happens to work. + +Notes: + + - fd = 6 is the Unix domain socket + - there are an expected number of recvmsg & writes, all with the correct sizes + - this time qemu adds the socket to ppoll + +I can reproduce this bug on a second ARM machine which doesn't have KVM (ie. using TCG). Note it's still linked to virtio-mmio. + +On 09/12/13 14:04, Richard Jones wrote: + +> + -chardev socket,path=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/guestfsd.sock,id=channel0 \ + +Is this a socket that libguestfs pre-creates on the host-side? + +> the socket is never added to any poll/ppoll syscall, so it's no +> wonder that qemu never sees any data on the socket + + +This should be happening: + +qemu_chr_open_socket() [qemu-char.c] + unix_connect_opts() [util/qemu-sockets.c] + qemu_socket() + connect() + qemu_set_nonblock() [util/oslib-posix.c] + qemu_chr_open_socket_fd() + socket_set_nodelay() [util/osdep.c] + io_channel_from_socket() + g_io_channel_unix_new() + tcp_chr_connect() + io_add_watch_poll() + g_source_new() + g_source_attach() + g_source_unref() + qemu_chr_be_generic_open() + +io_add_watch_poll() should make sure the fd is polled starting with the +next main loop iteration. + +Interestingly, even in the "successful" case, there's a slew of ppoll() +calls between connect() returning 6, and the first ppoll() that actually +covers fd=6. + +Laszlo + + +> Is this a socket that libguestfs pre-creates on the host-side? + +Yes it is: +https://github.com/libguestfs/libguestfs/blob/master/src/launch-direct.c#L208 + +You mention a scenario that might cause this. But that appears to be when the socket is opened. Note that the guest did send 4 bytes successfully (received OK at the host). The lost write occurs when the host next tries to send a message back to the guest. + +On 09/16/13 16:39, Richard Jones wrote: +>> Is this a socket that libguestfs pre-creates on the host-side? +> +> Yes it is: +> https://github.com/libguestfs/libguestfs/blob/master/src/launch-direct.c#L208 +> +> You mention a scenario that might cause this. But that appears to be +> when the socket is opened. Note that the guest did send 4 bytes +> successfully (received OK at the host). The lost write occurs when the +> host next tries to send a message back to the guest. + +Which is the first time ever that a GLib event loop context messed up +only for reading would be exposed. + +In other words, if the action + + register fd 6 for reading in the GLib main loop context + +fails, that wouldn't prevent qemu from *writing* to the UNIX domain socket. + +In both traces, the IO-thread (thread-id 8488 in the successful case, +and thread-id 7586 in the failing case) is the one opening / registering +etc. fd 6. The IO-thread is also the one calling ppoll(). + +However, all write(6, ...) syscalls are issued by one of the VCPU +threads (thread-id 8490 in the successful case, and thread-id 7588 in +the failing case). + +Hmmmm. Normally (as in, virtio-pci), when a VCPU thread (running KVM) +executes guest code that sends data to the host via virtio, KVM kicks +the "host notifier" eventfd. + +Once this "host notifier" eventfd is kicked, the IO thread should do: + + virtio_queue_host_notifier_read() + virtio_queue_notify_vq() + vq->handle_output() + handle_output() [hw/char/virtio-serial-bus.c] + do_flush_queued_data() + vsc->have_data() + flush_buf() [hw/char/virtio-console.c] + qemu_chr_fe_write() + ... goes to the unix domain socket ... + +When virtio-mmio is used though, the same seems to happen in VCPU thread: + + virtio_mmio_write() + virtio_queue_notify() + virtio_queue_notify_vq() + ...same as above... + + + +A long shot: + +(a) With virtio-pci: + +(a1) guest writes to virtio-serial port, +(a2) KVM sets the host notifier eventfd "pending", +(a3) the IO thread sees that in the main loop / ppoll(), and copies the +data to the UNIX domain socket (the backend), +(a4) host-side libguestfs reads the data and responds, +(a5) the IO-thread reads the data from the UNIX domain socket, +(a6) the IO-thread pushes the data to the guest. + +(b) with virtio-mmio: + +(b1) guest writes to virtio-serial port, +(b2) the VCPU thread in qemu reads the data (virtio-mmio) and copies it +to the UNIX domain socket, +(b3) host-side libguestfs reads the data and responds, +(b4) the IO-thread is not (yet?) ready to read the data from the UNIX +domain socket. + +I can't quite pin it down, but I think that in the virtio-pci case, the +fact that everything runs through the IO-thread automatically serializes +the connection to the UNIX domain socket (and its addition to the GLib +main loop context) with the message from the guest. Due to the KVM +eventfd (the "host notifier") everything goes through the same ppoll(). +Maybe it doesn't enforce any theoretical serialization, it might just +add a sufficiently long delay that there's never a problem in practice. + +Whereas in the virtio-mmio case, the initial write to the UNIX domain +socket, and the response from host-side libguestfs, runs unfettered. I +imagine something like: + +- (IO thread) connect to socket +- (IO thread) add fd to main loop context +- (guest) write to virtio-serial port +- (VCPU thread) copy data to UNIX domain socket +- (host libguestfs) read req, write resp to UNIX domain socket +- (IO thread) "I should probably check readiness on that socket + sometime" + +I don't know why the IO-thread doesn't get there *eventually*. + +What happens if you add a five second delay to libguestfs, before +writing the response? + +Laszlo + + + +On 16 September 2013 17:13, Laszlo Ersek <email address hidden> wrote: +> Hmmmm. Normally (as in, virtio-pci), when a VCPU thread (running KVM) +> executes guest code that sends data to the host via virtio, KVM kicks +> the "host notifier" eventfd. + +What happens in the virtio-pci without eventfd case? +(eg virtio-pci on a non-x86 host) + +Also, IIRC Alex said they'd had an annoying "data gets lost" +issue with the s390 virtio transports too... + +-- PMM + + +> What happens if you add a five second delay to libguestfs, +> before writing the response? + +No change. Still hangs in the same place. + +On 09/17/13 10:09, Peter Maydell wrote: +> On 16 September 2013 17:13, Laszlo Ersek <email address hidden> wrote: +>> Hmmmm. Normally (as in, virtio-pci), when a VCPU thread (running KVM) +>> executes guest code that sends data to the host via virtio, KVM kicks +>> the "host notifier" eventfd. +> +> What happens in the virtio-pci without eventfd case? +> (eg virtio-pci on a non-x86 host) + +I'm confused. I think Anthony or Michael could answer better. + +There's at least three cases here I guess (KVM + eventfd, KVM without +eventfd (enforceable eg. with the "ioeventfd" property for virtio +devices), and TCG). We're probably talking about the third case. + +I think we end up in + + virtio_pci_config_ops.write == virtio_pci_config_write + virtio_ioport_write() + virtio_queue_notify() + ... the "usual" stuff ... + +As far as I know TCG supports exactly one VCPU thread but that's still +separate from the IO-thread. In that case the above could trigger the +problem similarly to virtio-mmio I guess... + +I think we should debug into GLib, independently of virtio. What annoys +me mostly is the huge number of ppoll()s in Rich's trace between +connecting to the UNIX domain socket and actually checking it for +read-readiness. The fd in question should show up in the first ppoll() +after connect(). + +My email might not make any sense. Sorry. +Laszlo + + +> There's at least three cases here I guess (KVM + eventfd, KVM without +> eventfd (enforceable eg. with the "ioeventfd" property for virtio +> devices), and TCG). We're probably talking about the third case. + +To clarify on this point: I have reproduced this bug on two different ARM +machines, one using KVM and one using TCG. + +In both cases they are ./configure'd without any special ioeventfd-related +options, which appears to mean CONFIG_EVENTFD=y (in both cases). + +In both cases I'm using a single vCPU. + +On 09/17/13 11:51, Richard Jones wrote: +>> There's at least three cases here I guess (KVM + eventfd, KVM without +>> eventfd (enforceable eg. with the "ioeventfd" property for virtio +>> devices), and TCG). We're probably talking about the third case. +> +> To clarify on this point: I have reproduced this bug on two different ARM +> machines, one using KVM and one using TCG. +> +> In both cases they are ./configure'd without any special ioeventfd-related +> options, which appears to mean CONFIG_EVENTFD=y (in both cases). +> +> In both cases I'm using a single vCPU. +> + +I think I have a theory now; it's quite convoluted. + +The problem is a deadlock in ppoll() that is *masked* by unrelated file +descriptor traffic in all of the apparently working cases. + +I wrote some ad-hoc debug patches, and this is the log leading up to the +hang: + + io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:0 + qemu_poll_ns: timeout=4281013151888 + poll entry #0 fd 3 + poll entry #1 fd 5 + poll entry #2 fd 0 + poll entry #3 fd 11 + poll entry #4 fd 4 + trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0' + opened the socket, sock = 3 + udevadm settle + libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG + libguestfs: [21734ms] appliance is up + Guest launched OK. + libguestfs: writing the data to the socket (size = 64) + sent magic GUESTFS_LAUNCH_FLAG + main_loop waiting for next request + libguestfs: data written OK + <HANG> + +Setup call tree for the backend (ie. the UNIX domain socket): + + 1 qemu_chr_open_socket() [qemu-char.c] + 2 unix_connect_opts() [util/qemu-sockets.c] + 3 qemu_socket() + 4 connect() + 5 qemu_chr_open_socket_fd() [qemu-char.c] + 6 io_channel_from_socket() + 7 g_io_channel_unix_new() + 8 tcp_chr_connect() + 9 io_add_watch_poll() + 10 g_source_new() + 11 g_source_attach() + +This part connects to libguestfs's UNIX domain socket (the new socket +file descriptor, returned on line 3, is fd 6), and it registers a few +callbacks. Notably, the above doesn't try to add fd 6 to the set of +polled file descriptors. + + +Then, the setup call tree for the frontend (the virtio-serial port) is +as follows: + + 12 virtconsole_initfn() [hw/char/virtio-console.c] + 13 qemu_chr_add_handlers() [qemu-char.c] + +This reaches into the chardev (ie. the backend referenced by the +frontend, label "channel0"), and sets further callbacks. + + +The following seems to lead up to the hang: + + 14 os_host_main_loop_wait() [main-loop.c] + 15 glib_pollfds_fill() + 16 g_main_context_prepare() + 17 io_watch_poll_prepare() [qemu-char.c] + 18 chr_can_read() [hw/char/virtio-console.c] + 19 virtio_serial_guest_ready() [hw/char/virtio-serial-bus.c] + 20 + 21 if (use_multiport(port->vser) && !port->guest_connected) { + 22 return 0; + 23 } + 24 + 25 virtqueue_get_avail_bytes() + 26 g_io_create_watch() // conditionally + 27 qemu_poll_ns() [qemu-timer.c] + 28 ppoll() + +Line 15: glib_pollfds_fill() prepares the array of file descriptors for +polling. As first step, + +Line 16: it calls g_main_context_prepare(). This GLib function runs the +"prepare" callbacks for the GSource's in the main context. + +The GSource for fd 6 has been allocated on line 10 above, and its +"prepare" callback has been set to io_watch_poll_prepare() there. It is +called on line 17. + +Line 17: io_watch_poll_prepare() is a crucial function. It decides +whether fd 6 (the backend fd) will be added to the set of pollfds or +not. + +It checks whether the frontend has become writeable (ie. it must have +been unwriteable up to now, but it must be writeable now). If so, a +(persistent) watch is created (on line 26), which is the action that +includes fd 6 in the set of pollfds after all. If there is no change in +the status of the frontend, the watch is not changed. + +io_watch_poll_prepare() checks for the writeability of the frontend (ie. +virtio serial port) by the "fd_can_read" callback. This has been set to +chr_can_read() on line 13, inside virtconsole_initfn(). + +So, the frontend-writeability check happens in chr_can_read(), which +simply calls: + +Line 19: virtio_serial_guest_ready(). This function *normally* checks +for the available room in the virtqueue (the guest receives serial port +data from the host by submitting "receive requests" that must be filled +in by the host); see line 25. + +However, virtio_serial_guest_ready() first verifies whether the guest +has connected to the virtio-serial port at all. If not, then the +function will report the frontend unwriteable (lines 21-23). + + +Now, right before the hang, the guest hasn't yet connected to the +virtio-serial port. Therefore line 22 fires (= virtio-serial port is +unwriteable), which in turn results in *no* watch being created for the +backend. Consequently, the UNIX domain socket (fd 6) is not added to the +set of pollfds: + + io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:0 + qemu_poll_ns: timeout=4281013151888 + poll entry #0 fd 3 + poll entry #1 fd 5 + poll entry #2 fd 0 + poll entry #3 fd 11 + poll entry #4 fd 4 + +At this point the IO thread is blocked in ppoll(). + +Then, the guest connects to the serial port, and sends data. + + trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0' + opened the socket, sock = 3 + udevadm settle + +As discussed before, this guest-to-host transfer is handled by the VCPU +thread, and the data is written to fd 6 (the UNIX domain socket). The +host-side libguestfs component reads it, and answers. + + libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG + libguestfs: [21734ms] appliance is up + Guest launched OK. + libguestfs: writing the data to the socket (size = 64) + sent magic GUESTFS_LAUNCH_FLAG + main_loop waiting for next request # note, this is a libguestfs message! + libguestfs: data written OK + <HANG> + +Unfortunately, ppoll() is not watching out for fd 6 at all, hence this +deadlocks. + + +What about the successful cases though? A good proportion of the +attempts succeed. + +This is explained by the fact that *other* file descriptors can break +out the IO-thread from ppoll(). + +- The most common example is KVM *with* eventfd support. The KVM eventfd + (the host notifier) is part of the pollfd set, and whenever the guest + sends some data, KVM kicks the eventfd, and ppoll() returns. This + masks the problem universally. + +- In the other two cases, we either have KVM *without* eventfd support, + or TCG. Rich reproduced the hang under both, and he's seen successful + (actually: masked deadlock) cases as well, on both. + + In these setups the file descriptor traffic that masks the problem is + not from a KVM eventfd, hence the wakeup is quite random. There is + sometimes a perceivable pause between ppoll() going to sleep and + waking up. At other times there's no other fd traffic, and the + deadlock persists. + +In my testing on Rich's ARM box, the unrelated fd that breaks out the IO-thread +from ppoll() is the eventfd that belongs to the AIO thread pool. It is fd 11, +and it is allocated in: + + 0x00512d3c in event_notifier_init (e=0x1f4ad80, active=0) at util/event_notifier-posix.c:34 + 34 ret = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + (gdb) n + 39 if (ret >= 0) { + (gdb) print ret + $2 = 11 + (gdb) where + #0 event_notifier_init (e=0x1f4ad80, active=0) at util/event_notifier-posix.c:39 + #1 0x00300b4c in thread_pool_init_one (pool=0x1f4ad80, ctx=0x1f39e18) at thread-pool.c:295 + #2 0x00300c6c in thread_pool_new (ctx=0x1f39e18) at thread-pool.c:313 + #3 0x0001686c in aio_get_thread_pool (ctx=0x1f39e18) at async.c:238 + #4 0x0006f9d8 in paio_submit (bs=0x1f4a020, fd=10, sector_num=0, qiov=0xbe88c164, nb_sectors=4, + cb=0x389d8 <bdrv_co_io_em_complete>, opaque=0xb6505ec4, type=1) at block/raw-posix.c:799 + #5 0x0006fb84 in raw_aio_submit (bs=0x1f4a020, sector_num=0, qiov=0xbe88c164, nb_sectors=4, + cb=0x389d8 <bdrv_co_io_em_complete>, opaque=0xb6505ec4, type=1) at block/raw-posix.c:828 + #6 0x0006fc28 in raw_aio_readv (bs=0x1f4a020, sector_num=0, qiov=0xbe88c164, nb_sectors=4, + cb=0x389d8 <bdrv_co_io_em_complete>, opaque=0xb6505ec4) at block/raw-posix.c:836 + #7 0x00038b2c in bdrv_co_io_em (bs=0x1f4a020, sector_num=0, nb_sectors=4, iov=0xbe88c164, is_write=false) + at block.c:3957 + #8 0x00038bf0 in bdrv_co_readv_em (bs=0x1f4a020, sector_num=0, nb_sectors=4, iov=0xbe88c164) at block.c:3974 + #9 0x000349d4 in bdrv_co_do_readv (bs=0x1f4a020, sector_num=0, nb_sectors=4, qiov=0xbe88c164, flags=(unknown: 0)) + at block.c:2619 + #10 0x00033804 in bdrv_rw_co_entry (opaque=0xbe88c0e8) at block.c:2236 + #11 0x0009ba54 in coroutine_trampoline (i0=32811528, i1=0) at coroutine-ucontext.c:118 + #12 0x492fd160 in setcontext () from /usr/lib/libc.so.6 + #13 0x492fd160 in setcontext () from /usr/lib/libc.so.6 + Backtrace stopped: previous frame identical to this frame (corrupt stack?) + +And the transitory hang looks like: + + io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:0 + qemu_poll_ns: timeout=4281193192443 + poll entry #0 fd 3 + poll entry #1 fd 5 + poll entry #2 fd 0 + poll entry #3 fd 11 + poll entry #4 fd 4 + +Again, at this point IO thread is blocked in ppoll(), + + trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0' + opened the socket, sock = 3 + udevadm settle + +the guest transferred out some data, + + libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG + libguestfs: [20921ms] appliance is up + Guest launched OK. + libguestfs: writing the data to the socket (size = 64) + sent magic GUESTFS_LAUNCH_FLAG + main_loop waiting for next request + libguestfs: data written OK + +and the host side libguestfs has responded. The IO-thread is blocked in +ppoll(), guaranteed, and it doesn't notice readiness of fd 6 for +reading. + +However, the (completely unrelated) AIO thread-pool eventfd is kicked at +that point, and poll returns: + + ppoll(): 1, errno=Success + poll entry #0 fd 3 events 25 revents 0 + poll entry #1 fd 5 events 1 revents 0 + poll entry #2 fd 0 events 1 revents 0 + poll entry #3 fd 11 events 1 revents 1 + poll entry #4 fd 4 events 1 revents 0 + +Which in turn allows the IO-thread to run os_host_main_loop_wait() +again, and *now* we're seeing the activation of fd 6 (its frontend, the +virtio-serial port has been connected by the guest in the meantime and +is now writeable): + + io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:1 + qemu_poll_ns: timeout=0 + poll entry #0 fd 3 + poll entry #1 fd 5 + poll entry #2 fd 0 + poll entry #3 fd 11 + poll entry #4 fd 4 + poll entry #5 fd 6 + +And stuff works as expected from here on. + + +The VCPU thread needs to interrupt the IO-thread's ppoll() call +explicitly. + +Basically, when the chardev's attached frontend (in this case, the +virtio serial port) experiences a change that would cause it to report +writeability in io_watch_poll_prepare() -- lines 17-18 --, it must +interrupt ppoll(). + +The following call tree seems relevant, but I'm not sure if it would be +appropriate. When the guest message + + trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0' + +is printed, the following call chain is executed in the VCPU thread: + + #0 qemu_chr_fe_set_open (chr=0xf22190, fe_open=1) at qemu-char.c:3404 + #1 0x001079dc in set_guest_connected (port=0x1134f00, guest_connected=1) at hw/char/virtio-console.c:83 + #2 0x003cfd94 in handle_control_message (vser=0x1124360, buf=0xb50005f8, len=8) + at /home/rjones/d/qemu/hw/char/virtio-serial-bus.c:379 + #3 0x003d0020 in control_out (vdev=0x1124360, vq=0x11246b0) at /home/rjones/d/qemu/hw/char/virtio-serial-bus.c:416 + #4 0x0044afe4 in virtio_queue_notify_vq (vq=0x11246b0) at /home/rjones/d/qemu/hw/virtio/virtio.c:720 + #5 0x0044b054 in virtio_queue_notify (vdev=0x1124360, n=3) at /home/rjones/d/qemu/hw/virtio/virtio.c:726 + #6 0x00271f30 in virtio_mmio_write (opaque=0x11278c8, offset=80, value=3, size=4) at hw/virtio/virtio-mmio.c:264 + #7 0x00456aac in memory_region_write_accessor (mr=0x1128ba8, addr=80, value=0xb5972b08, size=4, shift=0, + mask=4294967295) at /home/rjones/d/qemu/memory.c:440 + #8 0x00456c90 in access_with_adjusted_size (addr=80, value=0xb5972b08, size=4, access_size_min=1, + access_size_max=4, access=0x4569d0 <memory_region_write_accessor>, mr=0x1128ba8) + at /home/rjones/d/qemu/memory.c:477 + #9 0x0045955c in memory_region_dispatch_write (mr=0x1128ba8, addr=80, data=3, size=4) + at /home/rjones/d/qemu/memory.c:984 + #10 0x0045cee0 in io_mem_write (mr=0x1128ba8, addr=80, val=3, size=4) at /home/rjones/d/qemu/memory.c:1748 + #11 0x0035d8dc in address_space_rw (as=0xa982f8 <address_space_memory>, addr=471008336, buf=0xb6f3d028 "\003", + len=4, is_write=true) at /home/rjones/d/qemu/exec.c:1954 + #12 0x0035ddf0 in cpu_physical_memory_rw (addr=471008336, buf=0xb6f3d028 "\003", len=4, is_write=1) + at /home/rjones/d/qemu/exec.c:2033 + #13 0x00453000 in kvm_cpu_exec (cpu=0x1097020) at /home/rjones/d/qemu/kvm-all.c:1665 + #14 0x0034ca94 in qemu_kvm_cpu_thread_fn (arg=0x1097020) at /home/rjones/d/qemu/cpus.c:802 + #15 0x494c6bc0 in start_thread () from /usr/lib/libpthread.so.0 + +Unfortunately, the leaf (ie. qemu_chr_fe_set_open()) doesn't do anything +here; the only chardev that sets the "chr_set_fe_open" callback is the +spicevmc backend. + +I think the "socket" chardev might want to implement "chr_set_fe_open", +kicking a (new) global eventfd, sending some signal to the IO-thread, or +interrupting ppoll() in some other way. A new global eventfd just for +this purpose seems quite the kludge, but it shouldn't be hard to +implement. It needs no handler at all. + +Thanks +Laszlo + + + +FWIW I am able to reproduce this quite easily on aarch64 too. + +My test program is: +https://github.com/libguestfs/libguestfs/blob/master/tests/qemu/qemu-speed-test.c + +and you use it like this: +qemu-speed-test --virtio-serial-upload + +(You can also test virtio-serial downloads and a few other things, but those don't appear to deadlock) + +Slowing down the upload, even just by enabling debugging, is sufficient to make the problem go away most of the time. + +I am testing with qemu from git (f45c56e0166e86d3b309ae72f4cb8e3d0949c7ef). + +I don't know how to close bugs in launchpad, but this one can be closed +for a couple of reasons: + +(1) I benchmarked virtio-mmio the other day using qemu-speed-test on aarch64 +and I did not encounter the bug. + +(2) aarch64 has supported virtio-pci for a while, for virtio-mmio is effectively +obsolete. + +Fixed upstream, see previous comment. + |