1 files changed, 273 insertions, 0 deletions
diff --git a/results/classifier/118/all/1207686 b/results/classifier/118/all/1207686
new file mode 100644
index 000000000..7ca4ccceb
--- /dev/null
+++ b/results/classifier/118/all/1207686
@@ -0,0 +1,273 @@
+permissions: 0.972
+vnc: 0.959
+user-level: 0.951
+network: 0.949
+peripherals: 0.946
+debug: 0.942
+PID: 0.940
+semantic: 0.932
+register: 0.930
+socket: 0.930
+device: 0.928
+performance: 0.923
+architecture: 0.920
+KVM: 0.917
+boot: 0.916
+kernel: 0.897
+mistranslation: 0.892
+assembly: 0.889
+files: 0.878
+graphic: 0.870
+virtual: 0.867
+risc-v: 0.860
+hypervisor: 0.847
+ppc: 0.813
+VMM: 0.807
+arm: 0.794
+x86: 0.793
+TCG: 0.791
+i386: 0.414
+
+qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process
+
+Hi,
+
+after some testing I tried to narrow down a problem, which was initially reported by some users.
+Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now.
+
+All using some flavour of linux-3.2.x kernel.
+
+Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem.
+Problem could be triggert with some workload ala:
+
+spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
+and in parallel do some apt-get install/remove/whatever.
+
+That results in a somewhat stuck qemu-session with the bad "kernel_hung_task..." messages.
+
+A typical command-line is as follows:
+
+/usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable-kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu-server/760.vnc,password -qmp unix:/var/run/qemu-server/760.qmp,server,nowait -nodefaults -serial none -parallel none -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1 -device virtio-blk-pci,drive=virtio0 -drive format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native -drive format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
+
+no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-session is accepted, need to hard-kill the process.
+
+Please give any advice on what to do for tracing/debugging, because the number of tickets here are raising, and noone knows, what users are doing inside their VM.
+
+Kind regards,
+
+Oliver Francke.
+
+On Fri, Aug 02, 2013 at 09:58:29AM -0000, Oliver Francke wrote:
+> after some testing I tried to narrow down a problem, which was initially reported by some users.
+> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now.
+> 
+> All using some flavour of linux-3.2.x kernel.
+> 
+> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem.
+
+Is that a guest kernel upgrade?
+
+> Problem could be triggert with some workload ala:
+> 
+> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
+> and in parallel do some apt-get install/remove/whatever.
+> 
+> That results in a somewhat stuck qemu-session with the bad
+> "kernel_hung_task..." messages.
+> 
+> A typical command-line is as follows:
+> 
+> /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable-
+> kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
+> unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu-
+> server/760.vnc,password -qmp unix:/var/run/qemu-
+> server/760.qmp,server,nowait -nodefaults -serial none -parallel none
+> -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
+> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
+> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
+> -device virtio-blk-pci,drive=virtio0 -drive
+> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
+> -drive
+> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
+> -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
+> if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
+> 
+> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
+> session is accepted, need to hard-kill the process.
+
+Yesterday I saw a possibly related report on IRC.  It was a Windows
+guest running under OpenStack with images on Ceph.
+
+They reported that the QEMU process would lock up - ping would not work
+and their management tools showed 0 CPU activity for the guest.
+However, they were able to "kick" the guest by taking a VNC screenshot
+(I think).  Then it would come back to life.
+
+If you have a Linux guest that is reporting kernel_hung_task, then it
+could be a similar scenario.
+
+Please confirm that the hung task message is from inside the guest.
+
+If you are able to reproduce this and have an alternative non-Ceph
+storage pool, please try that since Ceph is common to both these bug
+reports.
+
+Stefan
+
+
+Hi Stefan,
+
+Am 02.08.2013 um 17:24 schrieb Stefan Hajnoczi <email address hidden>:
+
+> On Fri, Aug 02, 2013 at 09:58:29AM -0000, Oliver Francke wrote:
+>> after some testing I tried to narrow down a problem, which was initially reported by some users.
+>> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now.
+>> 
+>> All using some flavour of linux-3.2.x kernel.
+>> 
+>> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem.
+> 
+> Is that a guest kernel upgrade?
+
+yeah, sorry if that was not clear enough.
+
+> 
+>> Problem could be triggert with some workload ala:
+>> 
+>> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
+>> and in parallel do some apt-get install/remove/whatever.
+>> 
+>> That results in a somewhat stuck qemu-session with the bad
+>> "kernel_hung_task..." messages.
+>> 
+>> A typical command-line is as follows:
+>> 
+>> /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable-
+>> kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
+>> unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu-
+>> server/760.vnc,password -qmp unix:/var/run/qemu-
+>> server/760.qmp,server,nowait -nodefaults -serial none -parallel none
+>> -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
+>> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
+>> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
+>> -device virtio-blk-pci,drive=virtio0 -drive
+>> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
+>> -drive
+>> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
+>> -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
+>> if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
+>> 
+>> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
+>> session is accepted, need to hard-kill the process.
+> 
+> Yesterday I saw a possibly related report on IRC.  It was a Windows
+> guest running under OpenStack with images on Ceph.
+> 
+> They reported that the QEMU process would lock up - ping would not work
+> and their management tools showed 0 CPU activity for the guest.
+> However, they were able to "kick" the guest by taking a VNC screenshot
+> (I think).  Then it would come back to life.
+> 
+> If you have a Linux guest that is reporting kernel_hung_task, then it
+> could be a similar scenario.
+> 
+> Please confirm that the hung task message is from inside the guest.
+> 
+
+confirmed.
+
+> If you are able to reproduce this and have an alternative non-Ceph
+> storage pool, please try that since Ceph is common to both these bug
+> reports.
+> 
+
+I can reproduce it with: kernel 3.2.something + qemu-1.[456] ( never spent much time on 1.3) and high I/O.
+I took this VM later this day and converted it to local-storage-qcow2, no prob with any kernel. I already asked on ceph-users-list for assistance, especially from Josh ( if he's not on summer holiday ;) )
+
+What is strange, I have a session via VNC-console opened and have a loop ala:
+
+while true; do apt-get install -y ntp libopts25; apt-get remove -y ntp-libopts25; done
+and and parallel spew as described, the apt-"session" dies and one can see the hung_task-thingy, but I still can restart the spew-test.
+Just for completeness.
+
+Thnx for you attention,
+
+Oliver.
+
+> Stefan
+> 
+> -- 
+> You received this bug notification because you are subscribed to the bug
+> report.
+> https://bugs.launchpad.net/bugs/1207686
+> 
+> Title:
+>  qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to
+>  kernel_hung_tasks_timout_secs message and unresponsive qemu-process
+> 
+> Status in QEMU:
+>  New
+> 
+> Bug description:
+>  Hi,
+> 
+>  after some testing I tried to narrow down a problem, which was initially reported by some users.
+>  Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now.
+> 
+>  All using some flavour of linux-3.2.x kernel.
+> 
+>  Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem.
+>  Problem could be triggert with some workload ala:
+> 
+>  spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
+>  and in parallel do some apt-get install/remove/whatever.
+> 
+>  That results in a somewhat stuck qemu-session with the bad
+>  "kernel_hung_task..." messages.
+> 
+>  A typical command-line is as follows:
+> 
+>  /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet
+>  -enable-kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
+>  unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run
+>  /qemu-server/760.vnc,password -qmp unix:/var/run/qemu-
+>  server/760.qmp,server,nowait -nodefaults -serial none -parallel none
+>  -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
+>  type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
+>  -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
+>  -device virtio-blk-pci,drive=virtio0 -drive
+>  format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
+>  -drive
+>  format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
+>  -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
+>  if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
+> 
+>  no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
+>  session is accepted, need to hard-kill the process.
+> 
+>  Please give any advice on what to do for tracing/debugging, because
+>  the number of tickets here are raising, and noone knows, what users
+>  are doing inside their VM.
+> 
+>  Kind regards,
+> 
+>  Oliver Francke.
+> 
+> To manage notifications about this bug go to:
+> https://bugs.launchpad.net/qemu/+bug/1207686/+subscriptions
+
+
+
+Hi,
+
+opened a ticket with the ceph-guys, and it turned out to be a bug in "librados aio flush".
+
+With latest "wip-librados-aio-flush (bobtail)" I got no error even with _very_ high load.
+
+Thnx for the attention ;)
+
+Oliver.
+
+
+Closing as "Invalid" since this was not a QEMU bug according to comment #3.
+