1 files changed, 112 insertions, 0 deletions
diff --git a/results/classifier/105/semantic/1670377 b/results/classifier/105/semantic/1670377
new file mode 100644
index 000000000..b9d41afaf
--- /dev/null
+++ b/results/classifier/105/semantic/1670377
@@ -0,0 +1,112 @@
+semantic: 0.751
+graphic: 0.735
+instruction: 0.719
+assembly: 0.707
+device: 0.683
+network: 0.681
+other: 0.665
+vnc: 0.624
+boot: 0.513
+KVM: 0.505
+socket: 0.486
+mistranslation: 0.437
+
+ VNC: short read for zlre data/RDR EndOfStream
+
+In openQA we have a custom VNC client (https://github.com/os-autoinst/os-autoinst/tree/master/consoles), which connects to QEMU guest and from there performs actions (sends keys, handles pointer, ...). We have several backends (https://github.com/os-autoinst/os-autoinst/tree/master/backend). With qemu backend we start QEMU guest *locally* on openQA worker which connects to it via VNC and sends commands. That works fine.
+
+However, with svirt backend we start QEMU on a KVM or Xen host and then connect to it remotely from openQA worker - the guest and worker are different systems. In this scenario fairly often happens that while system operates in Grub2, QEMU stops sending data via VNC:
+
+...
+15:24:15.5341 Debug: /var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:50 called testapi::send_key
+15:24:15.5342 27074 <<< testapi::send_key(key='c')
+15:24:15.7361 Debug: /var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:51 called testapi::type_string
+15:24:15.7362 27074 <<< testapi::type_string(string='gfxmode=1024x768; terminal_output console; terminal_output gfxterm
+', max_interval=250, wait_screen_changes=0)
+15:24:22.2243 Debug: /var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:53 called testapi::send_key
+15:24:22.2244 27074 <<< testapi::send_key(key='esc')
+15:24:22.4255 Debug: /var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:79 called testapi::send_key
+15:24:22.4256 27074 <<< testapi::send_key(key='e')
+15:24:22.6264 Debug: /var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:81 called testapi::send_key
+15:24:22.6265 27074 <<< testapi::send_key(key='down')
+15:24:22.8273 Debug: /var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:81 called testapi::send_key
+15:24:22.8274 27074 <<< testapi::send_key(key='down')
+15:24:23.0282 Debug: /var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:81 called testapi::send_key
+15:24:23.0283 27074 <<< testapi::send_key(key='down')
+DIE short read for zlre data 107132 - 995002 at /usr/lib/os-autoinst/consoles/VNC.pm line 978.
+
+ at /usr/lib/os-autoinst/backend/baseclass.pm line 73.
+...
+
+My observation is that it happens only while in Grub, when resolution happened a short while ago. See attached video and log.
+
+Prior to QEMU 2.8.0 I was able to reproduce a similar issue with vncviewer. I started QEMU with SLES JeOS image pressed several times a 'down' key in Grub and vncviewer (Tiger VNC 1.6.0 from openSUSE Leap 42.2) crashed with rdr::EndOfStream exception. This does not happen with QEMU 2.8.0, but I am still able to reproduce similar issue via openQA.
+
+/usr/bin/qemu-system-x86_64 -name guest=openQA-SUT-20,debug-threads=on -S -machine pc-i440fx-2.6,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 87535fc1-e693-41b9-813e-834d6fc4cb5a -no-user-config -nodefaults   -rtc base=utc -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/openQA-SUT-20.img,format=qcow2,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev user,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:12:34:56,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device virtio-tablet-pci,id=input0,bus=pci.0,addr=0x6 -device virtio-keyboard-pci,id=input1,bus=pci.0,addr=0x7 -vnc 0.0.0.0:20,share=force-shared -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on -monitor stdio
+
+Host: openSUSE Leap 42.2 x86_64 KVM or Xen on x86_64 Intel with QEMU 2.6.0.
+Guest: Leap 42.2.
+
+I can't reproduce the problem with QEMU 2.5.0, but I can with any QEMU version from 2.6 RC1 on.
+
+
+
+
+
+It isn't 100% clear from the info provided, but this is almost certainly fixed in 2.9.0 by
+
+commit 537848ee62195fc06c328b1cd64f4218f404a7f1
+Author: Michael Tokarev <email address hidden>
+Date:   Fri Feb 3 12:52:29 2017 +0300
+
+    vnc: do not disconnect on EAGAIN
+    
+    When qemu vnc server is trying to send large update to clients,
+    there might be a situation when system responds with something
+    like EAGAIN, indicating that there's no system memory to send
+    that much data (depending on the network speed, client and server
+    and what is happening).  In this case, something like this happens
+    on qemu side (from strace):
+    
+    sendmsg(16, {msg_name(0)=NULL,
+            msg_iov(1)=[{"\244\"..., 729186}],
+            msg_controllen=0, msg_flags=0}, 0) = 103950
+    sendmsg(16, {msg_name(0)=NULL,
+            msg_iov(1)=[{"lz\346"..., 1559618}],
+            msg_controllen=0, msg_flags=0}, 0) = -1 EAGAIN
+    sendmsg(-1, {msg_name(0)=NULL,
+            msg_iov(1)=[{"lz\346"..., 1559618}],
+            msg_controllen=0, msg_flags=0}, 0) = -1 EBADF
+    
+    qemu closes the socket before the retry, and obviously it gets EBADF
+    when trying to send to -1.
+    
+    This is because there WAS a special handling for EAGAIN, but now it doesn't
+    work anymore, after commit 04d2529da27db512dcbd5e99d0e26d333f16efcc, because
+    now in all error-like cases we initiate vnc disconnect.
+    
+    This change were introduced in qemu 2.6, and caused numerous grief for many
+    people, resulting in their vnc clients reporting sporadic random disconnects
+    from vnc server.
+    
+    Fix that by doing the disconnect only when necessary, i.e. omitting this
+    very case of EAGAIN.
+    
+    Hopefully the existing condition (comparing with QIO_CHANNEL_ERR_BLOCK)
+    is sufficient, as the original code (before the above commit) were
+    checking for other errno values too.
+    
+    Apparently there's another (semi?)bug exist somewhere here, since the
+    code tries to write to fd# -1, it probably should check if the connection
+    is open before. But this isn't important.
+    
+    Signed-off-by: Michael Tokarev <email address hidden>
+    Reviewed-by: Daniel P. Berrange <email address hidden>
+    Message-id: <email address hidden>
+    Fixes: 04d2529da27db512dcbd5e99d0e26d333f16efcc
+    Cc: Daniel P. Berrange <email address hidden>
+    Cc: Gerd Hoffmann <email address hidden>
+    Cc: <email address hidden>
+    Signed-off-by: Gerd Hoffmann <email address hidden>
+
+