results/classifier/zero-shot/111/KVM/1224444


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673

KVM: 0.129
vnc: 0.101
boot: 0.076
network: 0.074
debug: 0.073
PID: 0.071
socket: 0.070
other: 0.069
device: 0.067
permissions: 0.065
semantic: 0.064
performance: 0.055
files: 0.046
graphic: 0.039
KVM: 0.359
debug: 0.215
device: 0.092
boot: 0.085
socket: 0.061
semantic: 0.034
PID: 0.031
files: 0.030
other: 0.029
performance: 0.026
network: 0.012
permissions: 0.011
graphic: 0.009
vnc: 0.005

virtio-serial loses writes when used over virtio-mmio

virtio-serial appears to lose writes, but only when used on top of virtio-mmio.  The scenario is this:

/home/rjones/d/qemu/arm-softmmu/qemu-system-arm \
    -global virtio-blk-device.scsi=off \
    -nodefconfig \
    -nodefaults \
    -nographic \
    -M vexpress-a15 \
    -machine accel=kvm:tcg \
    -m 500 \
    -no-reboot \
    -kernel /home/rjones/d/libguestfs/tmp/.guestfs-1001/kernel.27944 \
    -dtb /home/rjones/d/libguestfs/tmp/.guestfs-1001/dtb.27944 \
    -initrd /home/rjones/d/libguestfs/tmp/.guestfs-1001/initrd.27944 \
    -device virtio-scsi-device,id=scsi \
    -drive file=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/scratch.1,cache=unsafe,format=raw,id=hd0,if=none \
    -device scsi-hd,drive=hd0 \
    -drive file=/home/rjones/d/libguestfs/tmp/.guestfs-1001/root.27944,snapshot=on,id=appliance,cache=unsafe,if=none \
    -device scsi-hd,drive=appliance \
    -device virtio-serial-device \
    -serial stdio \
    -chardev socket,path=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/guestfsd.sock,id=channel0 \
    -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
    -append 'panic=1 mem=500M console=ttyAMA0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm-256color'

After the guest starts up, a daemon writes 4 bytes to a virtio-serial socket.  The host side reads these 4 bytes correctly and writes a 64 byte message.  The guest never sees this message.

I enabled virtio-mmio debugging, and this is what is printed (## = my comment):

## guest opens the socket:
trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0'
virtio_mmio: virtio_mmio_write offset 0x50 value 0x3
opened the socket, sock = 3
udevadm settle
## guest writes 4 bytes to the socket:
virtio_mmio: virtio_mmio_write offset 0x50 value 0x5
virtio_mmio: virtio_mmio setting IRQ 1
virtio_mmio: virtio_mmio_read offset 0x60
virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
virtio_mmio: virtio_mmio setting IRQ 0
sent magic GUESTFS_LAUNCH_FLAG
## host reads 4 bytes successfully:
main_loop libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
libguestfs: [14605ms] appliance is up
Guest launched OK.
## host writes 64 bytes to socket:
libguestfs: writing the data to the socket (size = 64)
waiting for next request
libguestfs: data written OK
## hangs here forever with guest in read() call never receiving any data

I am using qemu from git today (2d1fe1873a984).

strace -f of qemu when it fails.

Notes:

 - fd = 6 is the Unix domain socket connected to virtio-serial
 - only one 4 byte write occurs to this socket (expected guest -> host communication)
 - the socket isn't read at all (even though the library on the other side has written)
 - the socket is never added to any poll/ppoll syscall, so it's no wonder that qemu never sees any data on the socket


Recall this bug only happens intermittently.  This is an strace -f of qemu when it happens to work.

Notes:

 - fd = 6 is the Unix domain socket
 - there are an expected number of recvmsg & writes, all with the correct sizes
 - this time qemu adds the socket to ppoll

I can reproduce this bug on a second ARM machine which doesn't have KVM (ie. using TCG).  Note it's still linked to virtio-mmio.

On 09/12/13 14:04, Richard Jones wrote:

> +     -chardev socket,path=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/guestfsd.sock,id=channel0 \

Is this a socket that libguestfs pre-creates on the host-side?

> the socket is never added to any poll/ppoll syscall, so it's no
> wonder that qemu never sees any data on the socket


This should be happening:

qemu_chr_open_socket() [qemu-char.c]
  unix_connect_opts() [util/qemu-sockets.c]
    qemu_socket()
    connect()
  qemu_set_nonblock() [util/oslib-posix.c]
  qemu_chr_open_socket_fd()
    socket_set_nodelay() [util/osdep.c]
    io_channel_from_socket()
      g_io_channel_unix_new()
    tcp_chr_connect()
      io_add_watch_poll()
        g_source_new()
        g_source_attach()
        g_source_unref()
      qemu_chr_be_generic_open()

io_add_watch_poll() should make sure the fd is polled starting with the
next main loop iteration.

Interestingly, even in the "successful" case, there's a slew of ppoll()
calls between connect() returning 6, and the first ppoll() that actually
covers fd=6.

Laszlo


> Is this a socket that libguestfs pre-creates on the host-side?

Yes it is:
https://github.com/libguestfs/libguestfs/blob/master/src/launch-direct.c#L208

You mention a scenario that might cause this.  But that appears to be when the socket is opened.  Note that the guest did send 4 bytes successfully (received OK at the host).  The lost write occurs when the host next tries to send a message back to the guest.

On 09/16/13 16:39, Richard Jones wrote:
>> Is this a socket that libguestfs pre-creates on the host-side?
> 
> Yes it is:
> https://github.com/libguestfs/libguestfs/blob/master/src/launch-direct.c#L208
> 
> You mention a scenario that might cause this.  But that appears to be
> when the socket is opened.  Note that the guest did send 4 bytes
> successfully (received OK at the host).  The lost write occurs when the
> host next tries to send a message back to the guest.

Which is the first time ever that a GLib event loop context messed up
only for reading would be exposed.

In other words, if the action

  register fd 6 for reading in the GLib main loop context

fails, that wouldn't prevent qemu from *writing* to the UNIX domain socket.

In both traces, the IO-thread (thread-id 8488 in the successful case,
and thread-id 7586 in the failing case) is the one opening / registering
etc. fd 6. The IO-thread is also the one calling ppoll().

However, all write(6, ...) syscalls are issued by one of the VCPU
threads (thread-id 8490 in the successful case, and thread-id 7588 in
the failing case).

Hmmmm. Normally (as in, virtio-pci), when a VCPU thread (running KVM)
executes guest code that sends data to the host via virtio, KVM kicks
the "host notifier" eventfd.

Once this "host notifier" eventfd is kicked, the IO thread should do:

  virtio_queue_host_notifier_read()
    virtio_queue_notify_vq()
      vq->handle_output()
        handle_output() [hw/char/virtio-serial-bus.c]
          do_flush_queued_data()
            vsc->have_data()
              flush_buf() [hw/char/virtio-console.c]
                qemu_chr_fe_write()
                  ... goes to the unix domain socket ...

When virtio-mmio is used though, the same seems to happen in VCPU thread:

  virtio_mmio_write()
    virtio_queue_notify()
      virtio_queue_notify_vq()
        ...same as above...


A long shot:

(a) With virtio-pci:

(a1) guest writes to virtio-serial port,
(a2) KVM sets the host notifier eventfd "pending",
(a3) the IO thread sees that in the main loop / ppoll(), and copies the
data to the UNIX domain socket (the backend),
(a4) host-side libguestfs reads the data and responds,
(a5) the IO-thread reads the data from the UNIX domain socket,
(a6) the IO-thread pushes the data to the guest.

(b) with virtio-mmio:

(b1) guest writes to virtio-serial port,
(b2) the VCPU thread in qemu reads the data (virtio-mmio) and copies it
to the UNIX domain socket,
(b3) host-side libguestfs reads the data and responds,
(b4) the IO-thread is not (yet?) ready to read the data from the UNIX
domain socket.

I can't quite pin it down, but I think that in the virtio-pci case, the
fact that everything runs through the IO-thread automatically serializes
the connection to the UNIX domain socket (and its addition to the GLib
main loop context) with the message from the guest. Due to the KVM
eventfd (the "host notifier") everything goes through the same ppoll().
Maybe it doesn't enforce any theoretical serialization, it might just
add a sufficiently long delay that there's never a problem in practice.

Whereas in the virtio-mmio case, the initial write to the UNIX domain
socket, and the response from host-side libguestfs, runs unfettered. I
imagine something like:

- (IO thread)       connect to socket
- (IO thread)       add fd to main loop context
- (guest)           write to virtio-serial port
- (VCPU thread)     copy data to UNIX domain socket
- (host libguestfs) read req, write resp to UNIX domain socket
- (IO thread)       "I should probably check readiness on that socket
                    sometime"

I don't know why the IO-thread doesn't get there *eventually*.

What happens if you add a five second delay to libguestfs, before
writing the response?

Laszlo


On 16 September 2013 17:13, Laszlo Ersek <email address hidden> wrote:
> Hmmmm. Normally (as in, virtio-pci), when a VCPU thread (running KVM)
> executes guest code that sends data to the host via virtio, KVM kicks
> the "host notifier" eventfd.

What happens in the virtio-pci without eventfd case?
(eg virtio-pci on a non-x86 host)

Also, IIRC Alex said they'd had an annoying "data gets lost"
issue with the s390 virtio transports too...

-- PMM


> What happens if you add a five second delay to libguestfs,
> before writing the response?

No change.  Still hangs in the same place.

On 09/17/13 10:09, Peter Maydell wrote:
> On 16 September 2013 17:13, Laszlo Ersek <email address hidden> wrote:
>> Hmmmm. Normally (as in, virtio-pci), when a VCPU thread (running KVM)
>> executes guest code that sends data to the host via virtio, KVM kicks
>> the "host notifier" eventfd.
> 
> What happens in the virtio-pci without eventfd case?
> (eg virtio-pci on a non-x86 host)

I'm confused. I think Anthony or Michael could answer better.

There's at least three cases here I guess (KVM + eventfd, KVM without
eventfd (enforceable eg. with the "ioeventfd" property for virtio
devices), and TCG). We're probably talking about the third case.

I think we end up in

  virtio_pci_config_ops.write == virtio_pci_config_write
    virtio_ioport_write()
      virtio_queue_notify()
        ... the "usual" stuff ...

As far as I know TCG supports exactly one VCPU thread but that's still
separate from the IO-thread. In that case the above could trigger the
problem similarly to virtio-mmio I guess...

I think we should debug into GLib, independently of virtio. What annoys
me mostly is the huge number of ppoll()s in Rich's trace between
connecting to the UNIX domain socket and actually checking it for
read-readiness. The fd in question should show up in the first ppoll()
after connect().

My email might not make any sense. Sorry.
Laszlo


> There's at least three cases here I guess (KVM + eventfd, KVM without
> eventfd (enforceable eg. with the "ioeventfd" property for virtio
> devices), and TCG). We're probably talking about the third case.

To clarify on this point: I have reproduced this bug on two different ARM
machines, one using KVM and one using TCG.

In both cases they are ./configure'd without any special ioeventfd-related
options, which appears to mean CONFIG_EVENTFD=y (in both cases).

In both cases I'm using a single vCPU.

On 09/17/13 11:51, Richard Jones wrote:
>> There's at least three cases here I guess (KVM + eventfd, KVM without
>> eventfd (enforceable eg. with the "ioeventfd" property for virtio
>> devices), and TCG). We're probably talking about the third case.
> 
> To clarify on this point: I have reproduced this bug on two different ARM
> machines, one using KVM and one using TCG.
> 
> In both cases they are ./configure'd without any special ioeventfd-related
> options, which appears to mean CONFIG_EVENTFD=y (in both cases).
> 
> In both cases I'm using a single vCPU.
> 

I think I have a theory now; it's quite convoluted.

The problem is a deadlock in ppoll() that is *masked* by unrelated file
descriptor traffic in all of the apparently working cases.

I wrote some ad-hoc debug patches, and this is the log leading up to the
hang:

  io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:0
  qemu_poll_ns: timeout=4281013151888
  poll entry #0 fd 3
  poll entry #1 fd 5
  poll entry #2 fd 0
  poll entry #3 fd 11
  poll entry #4 fd 4
  trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0'
  opened the socket, sock = 3
  udevadm settle
  libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [21734ms] appliance is up
  Guest launched OK.
  libguestfs: writing the data to the socket (size = 64)
  sent magic GUESTFS_LAUNCH_FLAG
  main_loop waiting for next request
  libguestfs: data written OK
  <HANG>

Setup call tree for the backend (ie. the UNIX domain socket):

   1  qemu_chr_open_socket() [qemu-char.c]
   2    unix_connect_opts() [util/qemu-sockets.c]
   3      qemu_socket()
   4      connect()
   5    qemu_chr_open_socket_fd() [qemu-char.c]
   6      io_channel_from_socket()
   7        g_io_channel_unix_new()
   8      tcp_chr_connect()
   9        io_add_watch_poll()
  10          g_source_new()
  11          g_source_attach()

This part connects to libguestfs's UNIX domain socket (the new socket
file descriptor, returned on line 3, is fd 6), and it registers a few
callbacks. Notably, the above doesn't try to add fd 6 to the set of
polled file descriptors.


Then, the setup call tree for the frontend (the virtio-serial port) is
as follows:

  12  virtconsole_initfn() [hw/char/virtio-console.c]
  13    qemu_chr_add_handlers() [qemu-char.c]

This reaches into the chardev (ie. the backend referenced by the
frontend, label "channel0"), and sets further callbacks.


The following seems to lead up to the hang:

  14 os_host_main_loop_wait() [main-loop.c]
  15   glib_pollfds_fill()
  16     g_main_context_prepare()
  17       io_watch_poll_prepare() [qemu-char.c]
  18         chr_can_read() [hw/char/virtio-console.c]
  19           virtio_serial_guest_ready() [hw/char/virtio-serial-bus.c]
  20
  21             if (use_multiport(port->vser) && !port->guest_connected) {
  22                 return 0;
  23             }
  24
  25             virtqueue_get_avail_bytes()
  26         g_io_create_watch() // conditionally
  27   qemu_poll_ns() [qemu-timer.c]
  28     ppoll()

Line 15: glib_pollfds_fill() prepares the array of file descriptors for
polling. As first step,

Line 16: it calls g_main_context_prepare(). This GLib function runs the
"prepare" callbacks for the GSource's in the main context.

The GSource for fd 6 has been allocated on line 10 above, and its
"prepare" callback has been set to io_watch_poll_prepare() there. It is
called on line 17.

Line 17: io_watch_poll_prepare() is a crucial function. It decides
whether fd 6 (the backend fd) will be added to the set of pollfds or
not.

It checks whether the frontend has become writeable (ie. it must have
been unwriteable up to now, but it must be writeable now). If so, a
(persistent) watch is created (on line 26), which is the action that
includes fd 6 in the set of pollfds after all. If there is no change in
the status of the frontend, the watch is not changed.

io_watch_poll_prepare() checks for the writeability of the frontend (ie.
virtio serial port) by the "fd_can_read" callback. This has been set to
chr_can_read() on line 13, inside virtconsole_initfn().

So, the frontend-writeability check happens in chr_can_read(), which
simply calls:

Line 19: virtio_serial_guest_ready(). This function *normally* checks
for the available room in the virtqueue (the guest receives serial port
data from the host by submitting "receive requests" that must be filled
in by the host); see line 25.

However, virtio_serial_guest_ready() first verifies whether the guest
has connected to the virtio-serial port at all. If not, then the
function will report the frontend unwriteable (lines 21-23).


Now, right before the hang, the guest hasn't yet connected to the
virtio-serial port. Therefore line 22 fires (= virtio-serial port is
unwriteable), which in turn results in *no* watch being created for the
backend. Consequently, the UNIX domain socket (fd 6) is not added to the
set of pollfds:

  io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:0
  qemu_poll_ns: timeout=4281013151888
  poll entry #0 fd 3
  poll entry #1 fd 5
  poll entry #2 fd 0
  poll entry #3 fd 11
  poll entry #4 fd 4

At this point the IO thread is blocked in ppoll().

Then, the guest connects to the serial port, and sends data.

  trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0'
  opened the socket, sock = 3
  udevadm settle

As discussed before, this guest-to-host transfer is handled by the VCPU
thread, and the data is written to fd 6 (the UNIX domain socket). The
host-side libguestfs component reads it, and answers.

  libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [21734ms] appliance is up
  Guest launched OK.
  libguestfs: writing the data to the socket (size = 64)
  sent magic GUESTFS_LAUNCH_FLAG
  main_loop waiting for next request  # note, this is a libguestfs message!
  libguestfs: data written OK
  <HANG>

Unfortunately, ppoll() is not watching out for fd 6 at all, hence this
deadlocks.


What about the successful cases though? A good proportion of the
attempts succeed.

This is explained by the fact that *other* file descriptors can break
out the IO-thread from ppoll().

- The most common example is KVM *with* eventfd support. The KVM eventfd
  (the host notifier) is part of the pollfd set, and whenever the guest
  sends some data, KVM kicks the eventfd, and ppoll() returns. This
  masks the problem universally.

- In the other two cases, we either have KVM *without* eventfd support,
  or TCG. Rich reproduced the hang under both, and he's seen successful
  (actually: masked deadlock) cases as well, on both.

  In these setups the file descriptor traffic that masks the problem is
  not from a KVM eventfd, hence the wakeup is quite random. There is
  sometimes a perceivable pause between ppoll() going to sleep and
  waking up. At other times there's no other fd traffic, and the
  deadlock persists.

In my testing on Rich's ARM box, the unrelated fd that breaks out the IO-thread
from ppoll() is the eventfd that belongs to the AIO thread pool. It is fd 11,
and it is allocated in:

  0x00512d3c in event_notifier_init (e=0x1f4ad80, active=0) at util/event_notifier-posix.c:34
  34          ret = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
  (gdb) n
  39          if (ret >= 0) {
  (gdb) print ret
  $2 = 11
  (gdb) where
  #0  event_notifier_init (e=0x1f4ad80, active=0) at util/event_notifier-posix.c:39
  #1  0x00300b4c in thread_pool_init_one (pool=0x1f4ad80, ctx=0x1f39e18) at thread-pool.c:295
  #2  0x00300c6c in thread_pool_new (ctx=0x1f39e18) at thread-pool.c:313
  #3  0x0001686c in aio_get_thread_pool (ctx=0x1f39e18) at async.c:238
  #4  0x0006f9d8 in paio_submit (bs=0x1f4a020, fd=10, sector_num=0, qiov=0xbe88c164, nb_sectors=4,
      cb=0x389d8 <bdrv_co_io_em_complete>, opaque=0xb6505ec4, type=1) at block/raw-posix.c:799
  #5  0x0006fb84 in raw_aio_submit (bs=0x1f4a020, sector_num=0, qiov=0xbe88c164, nb_sectors=4,
      cb=0x389d8 <bdrv_co_io_em_complete>, opaque=0xb6505ec4, type=1) at block/raw-posix.c:828
  #6  0x0006fc28 in raw_aio_readv (bs=0x1f4a020, sector_num=0, qiov=0xbe88c164, nb_sectors=4,
      cb=0x389d8 <bdrv_co_io_em_complete>, opaque=0xb6505ec4) at block/raw-posix.c:836
  #7  0x00038b2c in bdrv_co_io_em (bs=0x1f4a020, sector_num=0, nb_sectors=4, iov=0xbe88c164, is_write=false)
      at block.c:3957
  #8  0x00038bf0 in bdrv_co_readv_em (bs=0x1f4a020, sector_num=0, nb_sectors=4, iov=0xbe88c164) at block.c:3974
  #9  0x000349d4 in bdrv_co_do_readv (bs=0x1f4a020, sector_num=0, nb_sectors=4, qiov=0xbe88c164, flags=(unknown: 0))
      at block.c:2619
  #10 0x00033804 in bdrv_rw_co_entry (opaque=0xbe88c0e8) at block.c:2236
  #11 0x0009ba54 in coroutine_trampoline (i0=32811528, i1=0) at coroutine-ucontext.c:118
  #12 0x492fd160 in setcontext () from /usr/lib/libc.so.6
  #13 0x492fd160 in setcontext () from /usr/lib/libc.so.6
  Backtrace stopped: previous frame identical to this frame (corrupt stack?)

And the transitory hang looks like:

  io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:0
  qemu_poll_ns: timeout=4281193192443
  poll entry #0 fd 3
  poll entry #1 fd 5
  poll entry #2 fd 0
  poll entry #3 fd 11
  poll entry #4 fd 4

Again, at this point IO thread is blocked in ppoll(),

  trying to open virtio-serial channel   '/dev/virtio-ports/org.libguestfs.channel.0'
  opened the socket, sock = 3
  udevadm settle

the guest transferred out some data,

  libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [20921ms] appliance is up
  Guest launched OK.
  libguestfs: writing the data to the socket (size = 64)
  sent magic GUESTFS_LAUNCH_FLAG
  main_loop waiting for next request
  libguestfs: data written OK

and the host side libguestfs has responded. The IO-thread is  blocked in
ppoll(), guaranteed, and it doesn't notice readiness of fd 6 for
reading.

However, the (completely unrelated) AIO thread-pool eventfd is kicked at
that point, and poll returns:

  ppoll(): 1, errno=Success
  poll entry #0 fd 3 events 25 revents 0
  poll entry #1 fd 5 events 1 revents 0
  poll entry #2 fd 0 events 1 revents 0
  poll entry #3 fd 11 events 1 revents 1
  poll entry #4 fd 4 events 1 revents 0

Which in turn allows the IO-thread to run os_host_main_loop_wait()
again, and *now* we're seeing the activation of fd 6 (its frontend, the
virtio-serial port has been connected by the guest in the meantime and
is now writeable):

  io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:1
  qemu_poll_ns: timeout=0
  poll entry #0 fd 3
  poll entry #1 fd 5
  poll entry #2 fd 0
  poll entry #3 fd 11
  poll entry #4 fd 4
  poll entry #5 fd 6

And stuff works as expected from here on.


The VCPU thread needs to interrupt the IO-thread's ppoll() call
explicitly.

Basically, when the chardev's attached frontend (in this case, the
virtio serial port) experiences a change that would cause it to report
writeability in io_watch_poll_prepare() -- lines 17-18 --, it must
interrupt ppoll().

The following call tree seems relevant, but I'm not sure if it would be
appropriate. When the guest message

  trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0'

is printed, the following call chain is executed in the VCPU thread:

  #0  qemu_chr_fe_set_open (chr=0xf22190, fe_open=1) at qemu-char.c:3404
  #1  0x001079dc in set_guest_connected (port=0x1134f00, guest_connected=1) at hw/char/virtio-console.c:83
  #2  0x003cfd94 in handle_control_message (vser=0x1124360, buf=0xb50005f8, len=8)
      at /home/rjones/d/qemu/hw/char/virtio-serial-bus.c:379
  #3  0x003d0020 in control_out (vdev=0x1124360, vq=0x11246b0) at /home/rjones/d/qemu/hw/char/virtio-serial-bus.c:416
  #4  0x0044afe4 in virtio_queue_notify_vq (vq=0x11246b0) at /home/rjones/d/qemu/hw/virtio/virtio.c:720
  #5  0x0044b054 in virtio_queue_notify (vdev=0x1124360, n=3) at /home/rjones/d/qemu/hw/virtio/virtio.c:726
  #6  0x00271f30 in virtio_mmio_write (opaque=0x11278c8, offset=80, value=3, size=4) at hw/virtio/virtio-mmio.c:264
  #7  0x00456aac in memory_region_write_accessor (mr=0x1128ba8, addr=80, value=0xb5972b08, size=4, shift=0,
      mask=4294967295) at /home/rjones/d/qemu/memory.c:440
  #8  0x00456c90 in access_with_adjusted_size (addr=80, value=0xb5972b08, size=4, access_size_min=1,
      access_size_max=4, access=0x4569d0 <memory_region_write_accessor>, mr=0x1128ba8)
      at /home/rjones/d/qemu/memory.c:477
  #9  0x0045955c in memory_region_dispatch_write (mr=0x1128ba8, addr=80, data=3, size=4)
      at /home/rjones/d/qemu/memory.c:984
  #10 0x0045cee0 in io_mem_write (mr=0x1128ba8, addr=80, val=3, size=4) at /home/rjones/d/qemu/memory.c:1748
  #11 0x0035d8dc in address_space_rw (as=0xa982f8 <address_space_memory>, addr=471008336, buf=0xb6f3d028 "\003",
      len=4, is_write=true) at /home/rjones/d/qemu/exec.c:1954
  #12 0x0035ddf0 in cpu_physical_memory_rw (addr=471008336, buf=0xb6f3d028 "\003", len=4, is_write=1)
      at /home/rjones/d/qemu/exec.c:2033
  #13 0x00453000 in kvm_cpu_exec (cpu=0x1097020) at /home/rjones/d/qemu/kvm-all.c:1665
  #14 0x0034ca94 in qemu_kvm_cpu_thread_fn (arg=0x1097020) at /home/rjones/d/qemu/cpus.c:802
  #15 0x494c6bc0 in start_thread () from /usr/lib/libpthread.so.0

Unfortunately, the leaf (ie. qemu_chr_fe_set_open()) doesn't do anything
here; the only chardev that sets the "chr_set_fe_open" callback is the
spicevmc backend.

I think the "socket" chardev might want to implement "chr_set_fe_open",
kicking a (new) global eventfd, sending some signal to the IO-thread, or
interrupting ppoll() in some other way. A new global eventfd just for
this purpose seems quite the kludge, but it shouldn't be hard to
implement. It needs no handler at all.

Thanks
Laszlo


FWIW I am able to reproduce this quite easily on aarch64 too.

My test program is:
https://github.com/libguestfs/libguestfs/blob/master/tests/qemu/qemu-speed-test.c

and you use it like this:
qemu-speed-test --virtio-serial-upload

(You can also test virtio-serial downloads and a few other things, but those don't appear to deadlock)

Slowing down the upload, even just by enabling debugging, is sufficient to make the problem go away most of the time.

I am testing with qemu from git (f45c56e0166e86d3b309ae72f4cb8e3d0949c7ef).

I don't know how to close bugs in launchpad, but this one can be closed
for a couple of reasons:

(1) I benchmarked virtio-mmio the other day using qemu-speed-test on aarch64
and I did not encounter the bug.

(2) aarch64 has supported virtio-pci for a while, for virtio-mmio is effectively
obsolete.

Fixed upstream, see previous comment.