diff options
Diffstat (limited to 'results/classifier/105/vnc/1207686')
| -rw-r--r-- | results/classifier/105/vnc/1207686 | 256 |
1 files changed, 256 insertions, 0 deletions
diff --git a/results/classifier/105/vnc/1207686 b/results/classifier/105/vnc/1207686 new file mode 100644 index 000000000..64f75b022 --- /dev/null +++ b/results/classifier/105/vnc/1207686 @@ -0,0 +1,256 @@ +vnc: 0.959 +network: 0.949 +semantic: 0.932 +socket: 0.930 +device: 0.928 +other: 0.919 +KVM: 0.917 +boot: 0.916 +instruction: 0.906 +mistranslation: 0.892 +assembly: 0.889 +graphic: 0.870 + +qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process + +Hi, + +after some testing I tried to narrow down a problem, which was initially reported by some users. +Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now. + +All using some flavour of linux-3.2.x kernel. + +Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem. +Problem could be triggert with some workload ala: + +spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat +and in parallel do some apt-get install/remove/whatever. + +That results in a somewhat stuck qemu-session with the bad "kernel_hung_task..." messages. + +A typical command-line is as follows: + +/usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable-kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu-server/760.vnc,password -qmp unix:/var/run/qemu-server/760.qmp,server,nowait -nodefaults -serial none -parallel none -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1 -device virtio-blk-pci,drive=virtio0 -drive format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native -drive format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc + +no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-session is accepted, need to hard-kill the process. + +Please give any advice on what to do for tracing/debugging, because the number of tickets here are raising, and noone knows, what users are doing inside their VM. + +Kind regards, + +Oliver Francke. + +On Fri, Aug 02, 2013 at 09:58:29AM -0000, Oliver Francke wrote: +> after some testing I tried to narrow down a problem, which was initially reported by some users. +> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now. +> +> All using some flavour of linux-3.2.x kernel. +> +> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem. + +Is that a guest kernel upgrade? + +> Problem could be triggert with some workload ala: +> +> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat +> and in parallel do some apt-get install/remove/whatever. +> +> That results in a somewhat stuck qemu-session with the bad +> "kernel_hung_task..." messages. +> +> A typical command-line is as follows: +> +> /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable- +> kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor +> unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu- +> server/760.vnc,password -qmp unix:/var/run/qemu- +> server/760.qmp,server,nowait -nodefaults -serial none -parallel none +> -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev +> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh +> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1 +> -device virtio-blk-pci,drive=virtio0 -drive +> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native +> -drive +> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native +> -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive +> if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc +> +> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring- +> session is accepted, need to hard-kill the process. + +Yesterday I saw a possibly related report on IRC. It was a Windows +guest running under OpenStack with images on Ceph. + +They reported that the QEMU process would lock up - ping would not work +and their management tools showed 0 CPU activity for the guest. +However, they were able to "kick" the guest by taking a VNC screenshot +(I think). Then it would come back to life. + +If you have a Linux guest that is reporting kernel_hung_task, then it +could be a similar scenario. + +Please confirm that the hung task message is from inside the guest. + +If you are able to reproduce this and have an alternative non-Ceph +storage pool, please try that since Ceph is common to both these bug +reports. + +Stefan + + +Hi Stefan, + +Am 02.08.2013 um 17:24 schrieb Stefan Hajnoczi <email address hidden>: + +> On Fri, Aug 02, 2013 at 09:58:29AM -0000, Oliver Francke wrote: +>> after some testing I tried to narrow down a problem, which was initially reported by some users. +>> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now. +>> +>> All using some flavour of linux-3.2.x kernel. +>> +>> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem. +> +> Is that a guest kernel upgrade? + +yeah, sorry if that was not clear enough. + +> +>> Problem could be triggert with some workload ala: +>> +>> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat +>> and in parallel do some apt-get install/remove/whatever. +>> +>> That results in a somewhat stuck qemu-session with the bad +>> "kernel_hung_task..." messages. +>> +>> A typical command-line is as follows: +>> +>> /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable- +>> kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor +>> unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu- +>> server/760.vnc,password -qmp unix:/var/run/qemu- +>> server/760.qmp,server,nowait -nodefaults -serial none -parallel none +>> -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev +>> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh +>> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1 +>> -device virtio-blk-pci,drive=virtio0 -drive +>> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native +>> -drive +>> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native +>> -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive +>> if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc +>> +>> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring- +>> session is accepted, need to hard-kill the process. +> +> Yesterday I saw a possibly related report on IRC. It was a Windows +> guest running under OpenStack with images on Ceph. +> +> They reported that the QEMU process would lock up - ping would not work +> and their management tools showed 0 CPU activity for the guest. +> However, they were able to "kick" the guest by taking a VNC screenshot +> (I think). Then it would come back to life. +> +> If you have a Linux guest that is reporting kernel_hung_task, then it +> could be a similar scenario. +> +> Please confirm that the hung task message is from inside the guest. +> + +confirmed. + +> If you are able to reproduce this and have an alternative non-Ceph +> storage pool, please try that since Ceph is common to both these bug +> reports. +> + +I can reproduce it with: kernel 3.2.something + qemu-1.[456] ( never spent much time on 1.3) and high I/O. +I took this VM later this day and converted it to local-storage-qcow2, no prob with any kernel. I already asked on ceph-users-list for assistance, especially from Josh ( if he's not on summer holiday ;) ) + +What is strange, I have a session via VNC-console opened and have a loop ala: + +while true; do apt-get install -y ntp libopts25; apt-get remove -y ntp-libopts25; done +and and parallel spew as described, the apt-"session" dies and one can see the hung_task-thingy, but I still can restart the spew-test. +Just for completeness. + +Thnx for you attention, + +Oliver. + +> Stefan +> +> -- +> You received this bug notification because you are subscribed to the bug +> report. +> https://bugs.launchpad.net/bugs/1207686 +> +> Title: +> qemu-1.4.0 and onwards, linux kernel 3.2.x, heavy I/O leads to +> kernel_hung_tasks_timout_secs message and unresponsive qemu-process +> +> Status in QEMU: +> New +> +> Bug description: +> Hi, +> +> after some testing I tried to narrow down a problem, which was initially reported by some users. +> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now. +> +> All using some flavour of linux-3.2.x kernel. +> +> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem. +> Problem could be triggert with some workload ala: +> +> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat +> and in parallel do some apt-get install/remove/whatever. +> +> That results in a somewhat stuck qemu-session with the bad +> "kernel_hung_task..." messages. +> +> A typical command-line is as follows: +> +> /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet +> -enable-kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor +> unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run +> /qemu-server/760.vnc,password -qmp unix:/var/run/qemu- +> server/760.qmp,server,nowait -nodefaults -serial none -parallel none +> -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev +> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh +> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1 +> -device virtio-blk-pci,drive=virtio0 -drive +> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native +> -drive +> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native +> -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive +> if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc +> +> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring- +> session is accepted, need to hard-kill the process. +> +> Please give any advice on what to do for tracing/debugging, because +> the number of tickets here are raising, and noone knows, what users +> are doing inside their VM. +> +> Kind regards, +> +> Oliver Francke. +> +> To manage notifications about this bug go to: +> https://bugs.launchpad.net/qemu/+bug/1207686/+subscriptions + + + +Hi, + +opened a ticket with the ceph-guys, and it turned out to be a bug in "librados aio flush". + +With latest "wip-librados-aio-flush (bobtail)" I got no error even with _very_ high load. + +Thnx for the attention ;) + +Oliver. + + +Closing as "Invalid" since this was not a QEMU bug according to comment #3. + |