diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 19:39:53 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 19:39:53 +0200 |
| commit | dee4dcba78baf712cab403d47d9db319ab7f95d6 (patch) | |
| tree | 418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/015/unknown | |
| parent | 4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff) | |
| download | qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip | |
restructure results
Diffstat (limited to 'results/classifier/015/unknown')
| -rw-r--r-- | results/classifier/015/unknown/02572177 | 448 | ||||
| -rw-r--r-- | results/classifier/015/unknown/04472277 | 603 | ||||
| -rw-r--r-- | results/classifier/015/unknown/13442371 | 396 | ||||
| -rw-r--r-- | results/classifier/015/unknown/23270873 | 719 | ||||
| -rw-r--r-- | results/classifier/015/unknown/25842545 | 229 | ||||
| -rw-r--r-- | results/classifier/015/unknown/25892827 | 1104 | ||||
| -rw-r--r-- | results/classifier/015/unknown/31349848 | 181 | ||||
| -rw-r--r-- | results/classifier/015/unknown/32484936 | 250 | ||||
| -rw-r--r-- | results/classifier/015/unknown/57756589 | 1448 | ||||
| -rw-r--r-- | results/classifier/015/unknown/70294255 | 1088 | ||||
| -rw-r--r-- | results/classifier/015/unknown/80615920 | 375 |
11 files changed, 0 insertions, 6841 deletions
diff --git a/results/classifier/015/unknown/02572177 b/results/classifier/015/unknown/02572177 deleted file mode 100644 index a19ec8566..000000000 --- a/results/classifier/015/unknown/02572177 +++ /dev/null @@ -1,448 +0,0 @@ -operating system: 0.816 -permissions: 0.812 -device: 0.791 -architecture: 0.788 -performance: 0.781 -peripherals: 0.775 -virtual: 0.774 -semantic: 0.770 -register: 0.767 -risc-v: 0.761 -debug: 0.756 -assembly: 0.756 -ppc: 0.753 -arm: 0.749 -graphic: 0.747 -socket: 0.742 -user-level: 0.735 -PID: 0.731 -hypervisor: 0.723 -TCG: 0.722 -x86: 0.719 -network: 0.708 -vnc: 0.706 -mistranslation: 0.693 -VMM: 0.692 -kernel: 0.689 -alpha: 0.679 -KVM: 0.669 -boot: 0.658 -files: 0.640 -i386: 0.635 - -[Qemu-devel] 答复: Re: [BUG]COLO failover hang - -hi. - - -I test the git qemu master have the same problem. - - -(gdb) bt - - -#0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, niov=1, -fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 - - -#1 0x00007f658e4aa0c2 in qio_channel_read (address@hidden, address@hidden "", -address@hidden, address@hidden) at io/channel.c:114 - - -#2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, -buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at -migration/qemu-file-channel.c:78 - - -#3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at -migration/qemu-file.c:295 - - -#4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, address@hidden) at -migration/qemu-file.c:555 - - -#5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at -migration/qemu-file.c:568 - - -#6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at -migration/qemu-file.c:648 - - -#7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, -address@hidden) at migration/colo.c:244 - - -#8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized outï¼, -address@hidden, address@hidden) - - - at migration/colo.c:264 - - -#9 0x00007f658e3e740e in colo_process_incoming_thread (opaque=0x7f658eb30360 -ï¼mis_current.31286ï¼) at migration/colo.c:577 - - -#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 - - -#11 0x00007f65881983ed in clone () from /lib64/libc.so.6 - - -(gdb) p ioc-ï¼name - - -$2 = 0x7f658ff7d5c0 "migration-socket-incoming" - - -(gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN - - -$3 = 0 - - - - - -(gdb) bt - - -#0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, condition=G_IO_IN, -opaque=0x7fdcceeafa90) at migration/socket.c:137 - - -#1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at -gmain.c:3054 - - -#2 g_main_context_dispatch (context=ï¼optimized outï¼, address@hidden) at -gmain.c:3630 - - -#3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 - - -#4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at util/main-loop.c:258 - - -#5 main_loop_wait (address@hidden) at util/main-loop.c:506 - - -#6 0x00007fdccb526187 in main_loop () at vl.c:1898 - - -#7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized outï¼) at -vl.c:4709 - - -(gdb) p ioc-ï¼features - - -$1 = 6 - - -(gdb) p ioc-ï¼name - - -$2 = 0x7fdcce1b1ab0 "migration-socket-listener" - - - - - -May be socket_accept_incoming_migration should call -qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? - - - - - -thank you. - - - - - - - - - - - - - - - -åå§é®ä»¶ - - - -åä»¶äººï¼ address@hidden -æ¶ä»¶äººï¼ç广10165992 address@hidden -æéäººï¼ address@hidden address@hidden -æ¥ æ ï¼2017å¹´03æ16æ¥ 14:46 -主 é¢ ï¼Re: [Qemu-devel] COLO failover hang - - - - - - - -On 03/15/2017 05:06 PM, wangguang wrote: -ï¼ am testing QEMU COLO feature described here [QEMU -ï¼ Wiki]( -http://wiki.qemu-project.org/Features/COLO -). -ï¼ -ï¼ When the Primary Node panic,the Secondary Node qemu hang. -ï¼ hang at recvmsg in qio_channel_socket_readv. -ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": -ï¼ "x-colo-lost-heartbeat" } in Secondary VM's -ï¼ monitor,the Secondary Node qemu still hang at recvmsg . -ï¼ -ï¼ I found that the colo in qemu is not complete yet. -ï¼ Do the colo have any plan for development? - -Yes, We are developing. You can see some of patch we pushing. - -ï¼ Has anyone ever run it successfully? Any help is appreciated! - -In our internal version can run it successfully, -The failover detail you can ask Zhanghailiang for help. -Next time if you have some question about COLO, -please cc me and zhanghailiang address@hidden - - -Thanks -Zhang Chen - - -ï¼ -ï¼ -ï¼ -ï¼ centos7.2+qemu2.7.50 -ï¼ (gdb) bt -ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 -ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, -ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at -ï¼ io/channel-socket.c:497 -ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, -ï¼ address@hidden "", address@hidden, -ï¼ address@hidden) at io/channel.c:97 -ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ migration/qemu-file-channel.c:78 -ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at -ï¼ migration/qemu-file.c:257 -ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, -ï¼ address@hidden) at migration/qemu-file.c:510 -ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at -ï¼ migration/qemu-file.c:523 -ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at -ï¼ migration/qemu-file.c:603 -ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, -ï¼ address@hidden) at migration/colo.c:215 -ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, -ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at -ï¼ migration/colo.c:546 -ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at -ï¼ migration/colo.c:649 -ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -- -ï¼ View this message in context: -http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html -ï¼ Sent from the Developer mailing list archive at Nabble.com. -ï¼ -ï¼ -ï¼ -ï¼ - --- -Thanks -Zhang Chen - -Hi,Wang. - -You can test this branch: -https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk -and please follow wiki ensure your own configuration correctly. -http://wiki.qemu-project.org/Features/COLO -Thanks - -Zhang Chen - - -On 03/21/2017 03:27 PM, address@hidden wrote: -hi. - -I test the git qemu master have the same problem. - -(gdb) bt -#0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, -niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 -#1 0x00007f658e4aa0c2 in qio_channel_read -(address@hidden, address@hidden "", -address@hidden, address@hidden) at io/channel.c:114 -#2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, -buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at -migration/qemu-file-channel.c:78 -#3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at -migration/qemu-file.c:295 -#4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, -address@hidden) at migration/qemu-file.c:555 -#5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at -migration/qemu-file.c:568 -#6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at -migration/qemu-file.c:648 -#7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, -address@hidden) at migration/colo.c:244 -#8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized -outï¼, address@hidden, -address@hidden) -at migration/colo.c:264 -#9 0x00007f658e3e740e in colo_process_incoming_thread -(opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 -#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 - -#11 0x00007f65881983ed in clone () from /lib64/libc.so.6 - -(gdb) p ioc-ï¼name - -$2 = 0x7f658ff7d5c0 "migration-socket-incoming" - -(gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN - -$3 = 0 - - -(gdb) bt -#0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, -condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 -#1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at -gmain.c:3054 -#2 g_main_context_dispatch (context=ï¼optimized outï¼, -address@hidden) at gmain.c:3630 -#3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 -#4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at -util/main-loop.c:258 -#5 main_loop_wait (address@hidden) at -util/main-loop.c:506 -#6 0x00007fdccb526187 in main_loop () at vl.c:1898 -#7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized -outï¼) at vl.c:4709 -(gdb) p ioc-ï¼features - -$1 = 6 - -(gdb) p ioc-ï¼name - -$2 = 0x7fdcce1b1ab0 "migration-socket-listener" -May be socket_accept_incoming_migration should -call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? -thank you. - - - - - -åå§é®ä»¶ -address@hidden; -*æ¶ä»¶äººï¼*ç广10165992;address@hidden; -address@hidden;address@hidden; -*æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 -*主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* - - - - -On 03/15/2017 05:06 PM, wangguang wrote: -ï¼ am testing QEMU COLO feature described here [QEMU -ï¼ Wiki]( -http://wiki.qemu-project.org/Features/COLO -). -ï¼ -ï¼ When the Primary Node panic,the Secondary Node qemu hang. -ï¼ hang at recvmsg in qio_channel_socket_readv. -ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": -ï¼ "x-colo-lost-heartbeat" } in Secondary VM's -ï¼ monitor,the Secondary Node qemu still hang at recvmsg . -ï¼ -ï¼ I found that the colo in qemu is not complete yet. -ï¼ Do the colo have any plan for development? - -Yes, We are developing. You can see some of patch we pushing. - -ï¼ Has anyone ever run it successfully? Any help is appreciated! - -In our internal version can run it successfully, -The failover detail you can ask Zhanghailiang for help. -Next time if you have some question about COLO, -please cc me and zhanghailiang address@hidden - - -Thanks -Zhang Chen - - -ï¼ -ï¼ -ï¼ -ï¼ centos7.2+qemu2.7.50 -ï¼ (gdb) bt -ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 -ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, -ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at -ï¼ io/channel-socket.c:497 -ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, -ï¼ address@hidden "", address@hidden, -ï¼ address@hidden) at io/channel.c:97 -ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ migration/qemu-file-channel.c:78 -ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at -ï¼ migration/qemu-file.c:257 -ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, -ï¼ address@hidden) at migration/qemu-file.c:510 -ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at -ï¼ migration/qemu-file.c:523 -ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at -ï¼ migration/qemu-file.c:603 -ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, -ï¼ address@hidden) at migration/colo.c:215 -ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, -ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at -ï¼ migration/colo.c:546 -ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at -ï¼ migration/colo.c:649 -ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -- -ï¼ View this message in context: -http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html -ï¼ Sent from the Developer mailing list archive at Nabble.com. -ï¼ -ï¼ -ï¼ -ï¼ - --- -Thanks -Zhang Chen --- -Thanks -Zhang Chen - diff --git a/results/classifier/015/unknown/04472277 b/results/classifier/015/unknown/04472277 deleted file mode 100644 index 62b09c6a4..000000000 --- a/results/classifier/015/unknown/04472277 +++ /dev/null @@ -1,603 +0,0 @@ -KVM: 0.890 -user-level: 0.889 -register: 0.886 -virtual: 0.876 -operating system: 0.865 -risc-v: 0.864 -VMM: 0.858 -architecture: 0.857 -hypervisor: 0.854 -permissions: 0.851 -device: 0.849 -debug: 0.849 -ppc: 0.848 -network: 0.847 -graphic: 0.846 -x86: 0.841 -performance: 0.841 -assembly: 0.841 -kernel: 0.839 -peripherals: 0.838 -boot: 0.831 -vnc: 0.828 -PID: 0.826 -TCG: 0.825 -socket: 0.824 -arm: 0.821 -mistranslation: 0.817 -semantic: 0.815 -i386: 0.805 -alpha: 0.804 -files: 0.790 - -[BUG][KVM_SET_USER_MEMORY_REGION] KVM_SET_USER_MEMORY_REGION failed - -Hi all, -I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR. -Is there any one know this? -The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log -``` -2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-14 10:09:18.198+0000: shutting down, reason=crashed -``` -The xml file -``` -root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml -<!-- -WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE -OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: - virsh edit instance-0000000e -or other application using the libvirt API. ---> -<domain type='kvm'> - <name>instance-0000000e</name> - <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid> - <metadata> -  <nova:instance xmlns:nova=" -http://openstack.org/xmlns/libvirt/nova/1.1 -"> -   <nova:package version="25.1.0"/> -   <nova:name>provider-instance</nova:name> -   <nova:creationTime>2023-03-14 10:09:13</nova:creationTime> -   <nova:flavor name="cirros-os-dpu-test-1"> -    <nova:memory>64</nova:memory> -    <nova:disk>1</nova:disk> -    <nova:swap>0</nova:swap> -    <nova:ephemeral>0</nova:ephemeral> -    <nova:vcpus>1</nova:vcpus> -   </nova:flavor> -   <nova:owner> -    <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user> -    <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project> -   </nova:owner> -   <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/> -   <nova:ports> -    <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340"> -     <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/> -    </nova:port> -   </nova:ports> -  </nova:instance> - </metadata> - <memory unit='KiB'>65536</memory> - <currentMemory unit='KiB'>65536</currentMemory> - <vcpu placement='static'>1</vcpu> - <sysinfo type='smbios'> -  <system> -   <entry name='manufacturer'>OpenStack Foundation</entry> -   <entry name='product'>OpenStack Nova</entry> -   <entry name='version'>25.1.0</entry> -   <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='family'>Virtual Machine</entry> -  </system> - </sysinfo> - <os> -  <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type> -  <boot dev='hd'/> -  <smbios mode='sysinfo'/> - </os> - <features> -  <acpi/> -  <apic/> -  <vmcoreinfo state='on'/> - </features> - <cpu mode='host-model' check='partial'> -  <topology sockets='1' dies='1' cores='1' threads='1'/> - </cpu> - <clock offset='utc'> -  <timer name='pit' tickpolicy='delay'/> -  <timer name='rtc' tickpolicy='catchup'/> -  <timer name='hpet' present='no'/> - </clock> - <on_poweroff>destroy</on_poweroff> - <on_reboot>restart</on_reboot> - <on_crash>destroy</on_crash> - <devices> -  <emulator>/usr/bin/qemu-system-x86_64</emulator> -  <disk type='file' device='disk'> -   <driver name='qemu' type='qcow2' cache='none'/> -   <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/> -   <target dev='vda' bus='virtio'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> -  </disk> -  <controller type='usb' index='0' model='piix3-uhci'> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> -  </controller> -  <controller type='pci' index='0' model='pci-root'/> -  <interface type='hostdev' managed='yes'> -   <mac address='fa:16:3e:aa:d9:23'/> -   <source> -    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> -  </interface> -  <serial type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='isa-serial' port='0'> -    <model name='isa-serial'/> -   </target> -  </serial> -  <console type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='serial' port='0'/> -  </console> -  <input type='tablet' bus='usb'> -   <address type='usb' bus='0' port='1'/> -  </input> -  <input type='mouse' bus='ps2'/> -  <input type='keyboard' bus='ps2'/> -  <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> -   <listen type='address' address='0.0.0.0'/> -  </graphics> -  <audio id='1' type='none'/> -  <video> -   <model type='virtio' heads='1' primary='yes'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> -  </video> -  <hostdev mode='subsystem' type='pci' managed='yes'> -   <source> -    <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> -  </hostdev> -  <memballoon model='virtio'> -   <stats period='10'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> -  </memballoon> -  <rng model='virtio'> -   <backend model='random'>/dev/urandom</backend> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> -  </rng> - </devices> -</domain> -``` ----- -Simon Jones - -This is happened in ubuntu22.04. -QEMU is install by apt like this: -apt install -y qemu qemu-kvm qemu-system -and QEMU version is 6.2.0 ----- -Simon Jones -Simon Jones < -batmanustc@gmail.com -> äº2023å¹´3æ21æ¥å¨äº 08:40åéï¼ -Hi all, -I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR. -Is there any one know this? -The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log -``` -2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-14 10:09:18.198+0000: shutting down, reason=crashed -``` -The xml file -``` -root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml -<!-- -WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE -OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: - virsh edit instance-0000000e -or other application using the libvirt API. ---> -<domain type='kvm'> - <name>instance-0000000e</name> - <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid> - <metadata> -  <nova:instance xmlns:nova=" -http://openstack.org/xmlns/libvirt/nova/1.1 -"> -   <nova:package version="25.1.0"/> -   <nova:name>provider-instance</nova:name> -   <nova:creationTime>2023-03-14 10:09:13</nova:creationTime> -   <nova:flavor name="cirros-os-dpu-test-1"> -    <nova:memory>64</nova:memory> -    <nova:disk>1</nova:disk> -    <nova:swap>0</nova:swap> -    <nova:ephemeral>0</nova:ephemeral> -    <nova:vcpus>1</nova:vcpus> -   </nova:flavor> -   <nova:owner> -    <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user> -    <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project> -   </nova:owner> -   <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/> -   <nova:ports> -    <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340"> -     <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/> -    </nova:port> -   </nova:ports> -  </nova:instance> - </metadata> - <memory unit='KiB'>65536</memory> - <currentMemory unit='KiB'>65536</currentMemory> - <vcpu placement='static'>1</vcpu> - <sysinfo type='smbios'> -  <system> -   <entry name='manufacturer'>OpenStack Foundation</entry> -   <entry name='product'>OpenStack Nova</entry> -   <entry name='version'>25.1.0</entry> -   <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='family'>Virtual Machine</entry> -  </system> - </sysinfo> - <os> -  <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type> -  <boot dev='hd'/> -  <smbios mode='sysinfo'/> - </os> - <features> -  <acpi/> -  <apic/> -  <vmcoreinfo state='on'/> - </features> - <cpu mode='host-model' check='partial'> -  <topology sockets='1' dies='1' cores='1' threads='1'/> - </cpu> - <clock offset='utc'> -  <timer name='pit' tickpolicy='delay'/> -  <timer name='rtc' tickpolicy='catchup'/> -  <timer name='hpet' present='no'/> - </clock> - <on_poweroff>destroy</on_poweroff> - <on_reboot>restart</on_reboot> - <on_crash>destroy</on_crash> - <devices> -  <emulator>/usr/bin/qemu-system-x86_64</emulator> -  <disk type='file' device='disk'> -   <driver name='qemu' type='qcow2' cache='none'/> -   <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/> -   <target dev='vda' bus='virtio'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> -  </disk> -  <controller type='usb' index='0' model='piix3-uhci'> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> -  </controller> -  <controller type='pci' index='0' model='pci-root'/> -  <interface type='hostdev' managed='yes'> -   <mac address='fa:16:3e:aa:d9:23'/> -   <source> -    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> -  </interface> -  <serial type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='isa-serial' port='0'> -    <model name='isa-serial'/> -   </target> -  </serial> -  <console type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='serial' port='0'/> -  </console> -  <input type='tablet' bus='usb'> -   <address type='usb' bus='0' port='1'/> -  </input> -  <input type='mouse' bus='ps2'/> -  <input type='keyboard' bus='ps2'/> -  <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> -   <listen type='address' address='0.0.0.0'/> -  </graphics> -  <audio id='1' type='none'/> -  <video> -   <model type='virtio' heads='1' primary='yes'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> -  </video> -  <hostdev mode='subsystem' type='pci' managed='yes'> -   <source> -    <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> -  </hostdev> -  <memballoon model='virtio'> -   <stats period='10'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> -  </memballoon> -  <rng model='virtio'> -   <backend model='random'>/dev/urandom</backend> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> -  </rng> - </devices> -</domain> -``` ----- -Simon Jones - -This is full ERROR log -2023-03-23 08:00:52.362+0000: starting up libvirt version: 8.0.0, package: 1ubuntu7.4 (Christian Ehrhardt < -christian.ehrhardt@canonical.com -> Tue, 22 Nov 2022 15:59:28 +0100), qemu version: 6.2.0Debian 1:6.2+dfsg-2ubuntu6.6, kernel: 5.19.0-35-generic, hostname: c1c2 -LC_ALL=C \ -PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \ -HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e \ -XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.local/share \ -XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.cache \ -XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.config \ -/usr/bin/qemu-system-x86_64 \ --name guest=instance-0000000e,debug-threads=on \ --S \ --object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-4-instance-0000000e/master-key.aes"}' \ --machine pc-i440fx-6.2,usb=off,dump-guest-core=off,memory-backend=pc.ram \ --accel kvm \ --cpu Cooperlake,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,sha-ni=on,umip=on,waitpkg=on,gfni=on,vaes=on,vpclmulqdq=on,rdpid=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,avx-vnni=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,hle=off,rtm=off,avx512f=off,avx512dq=off,avx512cd=off,avx512bw=off,avx512vl=off,avx512vnni=off,avx512-bf16=off,taa-no=off \ --m 64 \ --object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":67108864}' \ --overcommit mem-lock=off \ --smp 1,sockets=1,dies=1,cores=1,threads=1 \ --uuid ff91d2dc-69a1-43ef-abde-c9e4e9a0305b \ --smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=25.1.0,serial=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,uuid=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,family=Virtual Machine' \ --no-user-config \ --nodefaults \ --chardev socket,id=charmonitor,fd=33,server=on,wait=off \ --mon chardev=charmonitor,id=monitor,mode=control \ --rtc base=utc,driftfix=slew \ --global kvm-pit.lost_tick_policy=delay \ --no-hpet \ --no-shutdown \ --boot strict=on \ --device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ --blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/8b58db82a488248e7c5e769599954adaa47a5314","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ --blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ --blockdev '{"driver":"file","filename":"/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ --blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ --device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ --add-fd set=1,fd=34 \ --chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on \ --device isa-serial,chardev=charserial0,id=serial0 \ --device usb-tablet,id=input0,bus=usb.0,port=1 \ --audiodev '{"id":"audio1","driver":"none"}' \ --vnc -0.0.0.0:0 -,audiodev=audio1 \ --device virtio-vga,id=video0,max_outputs=1,bus=pci.0,addr=0x2 \ --device vfio-pci,host=0000:01:00.5,id=hostdev0,bus=pci.0,addr=0x4 \ --device vfio-pci,host=0000:01:00.6,id=hostdev1,bus=pci.0,addr=0x5 \ --device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ --object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ --device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 \ --device vmcoreinfo \ --sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ --msg timestamp=on -char device redirected to /dev/pts/3 (label charserial0) -2023-03-23T08:00:53.728550Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-23 08:00:54.201+0000: shutting down, reason=crashed -2023-03-23 08:54:43.468+0000: starting up libvirt version: 8.0.0, package: 1ubuntu7.4 (Christian Ehrhardt < -christian.ehrhardt@canonical.com -> Tue, 22 Nov 2022 15:59:28 +0100), qemu version: 6.2.0Debian 1:6.2+dfsg-2ubuntu6.6, kernel: 5.19.0-35-generic, hostname: c1c2 -LC_ALL=C \ -PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \ -HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e \ -XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.local/share \ -XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.cache \ -XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.config \ -/usr/bin/qemu-system-x86_64 \ --name guest=instance-0000000e,debug-threads=on \ --S \ --object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-5-instance-0000000e/master-key.aes"}' \ --machine pc-i440fx-6.2,usb=off,dump-guest-core=off,memory-backend=pc.ram \ --accel kvm \ --cpu Cooperlake,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,sha-ni=on,umip=on,waitpkg=on,gfni=on,vaes=on,vpclmulqdq=on,rdpid=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,avx-vnni=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,hle=off,rtm=off,avx512f=off,avx512dq=off,avx512cd=off,avx512bw=off,avx512vl=off,avx512vnni=off,avx512-bf16=off,taa-no=off \ --m 64 \ --object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":67108864}' \ --overcommit mem-lock=off \ --smp 1,sockets=1,dies=1,cores=1,threads=1 \ --uuid ff91d2dc-69a1-43ef-abde-c9e4e9a0305b \ --smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=25.1.0,serial=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,uuid=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,family=Virtual Machine' \ --no-user-config \ --nodefaults \ --chardev socket,id=charmonitor,fd=33,server=on,wait=off \ --mon chardev=charmonitor,id=monitor,mode=control \ --rtc base=utc,driftfix=slew \ --global kvm-pit.lost_tick_policy=delay \ --no-hpet \ --no-shutdown \ --boot strict=on \ --device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ --blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/8b58db82a488248e7c5e769599954adaa47a5314","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ --blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ --blockdev '{"driver":"file","filename":"/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ --blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ --device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ --add-fd set=1,fd=34 \ --chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on \ --device isa-serial,chardev=charserial0,id=serial0 \ --device usb-tablet,id=input0,bus=usb.0,port=1 \ --audiodev '{"id":"audio1","driver":"none"}' \ --vnc -0.0.0.0:0 -,audiodev=audio1 \ --device virtio-vga,id=video0,max_outputs=1,bus=pci.0,addr=0x2 \ --device vfio-pci,host=0000:01:00.5,id=hostdev0,bus=pci.0,addr=0x4 \ --device vfio-pci,host=0000:01:00.6,id=hostdev1,bus=pci.0,addr=0x5 \ --device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ --object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ --device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 \ --device vmcoreinfo \ --sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ --msg timestamp=on -char device redirected to /dev/pts/3 (label charserial0) -2023-03-23T08:54:44.755039Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-23 08:54:45.230+0000: shutting down, reason=crashed ----- -Simon Jones -Simon Jones < -batmanustc@gmail.com -> äº2023å¹´3æ23æ¥å¨å 05:49åéï¼ -This is happened in ubuntu22.04. -QEMU is install by apt like this: -apt install -y qemu qemu-kvm qemu-system -and QEMU version is 6.2.0 ----- -Simon Jones -Simon Jones < -batmanustc@gmail.com -> äº2023å¹´3æ21æ¥å¨äº 08:40åéï¼ -Hi all, -I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR. -Is there any one know this? -The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log -``` -2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-14 10:09:18.198+0000: shutting down, reason=crashed -``` -The xml file -``` -root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml -<!-- -WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE -OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: - virsh edit instance-0000000e -or other application using the libvirt API. ---> -<domain type='kvm'> - <name>instance-0000000e</name> - <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid> - <metadata> -  <nova:instance xmlns:nova=" -http://openstack.org/xmlns/libvirt/nova/1.1 -"> -   <nova:package version="25.1.0"/> -   <nova:name>provider-instance</nova:name> -   <nova:creationTime>2023-03-14 10:09:13</nova:creationTime> -   <nova:flavor name="cirros-os-dpu-test-1"> -    <nova:memory>64</nova:memory> -    <nova:disk>1</nova:disk> -    <nova:swap>0</nova:swap> -    <nova:ephemeral>0</nova:ephemeral> -    <nova:vcpus>1</nova:vcpus> -   </nova:flavor> -   <nova:owner> -    <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user> -    <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project> -   </nova:owner> -   <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/> -   <nova:ports> -    <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340"> -     <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/> -    </nova:port> -   </nova:ports> -  </nova:instance> - </metadata> - <memory unit='KiB'>65536</memory> - <currentMemory unit='KiB'>65536</currentMemory> - <vcpu placement='static'>1</vcpu> - <sysinfo type='smbios'> -  <system> -   <entry name='manufacturer'>OpenStack Foundation</entry> -   <entry name='product'>OpenStack Nova</entry> -   <entry name='version'>25.1.0</entry> -   <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='family'>Virtual Machine</entry> -  </system> - </sysinfo> - <os> -  <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type> -  <boot dev='hd'/> -  <smbios mode='sysinfo'/> - </os> - <features> -  <acpi/> -  <apic/> -  <vmcoreinfo state='on'/> - </features> - <cpu mode='host-model' check='partial'> -  <topology sockets='1' dies='1' cores='1' threads='1'/> - </cpu> - <clock offset='utc'> -  <timer name='pit' tickpolicy='delay'/> -  <timer name='rtc' tickpolicy='catchup'/> -  <timer name='hpet' present='no'/> - </clock> - <on_poweroff>destroy</on_poweroff> - <on_reboot>restart</on_reboot> - <on_crash>destroy</on_crash> - <devices> -  <emulator>/usr/bin/qemu-system-x86_64</emulator> -  <disk type='file' device='disk'> -   <driver name='qemu' type='qcow2' cache='none'/> -   <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/> -   <target dev='vda' bus='virtio'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> -  </disk> -  <controller type='usb' index='0' model='piix3-uhci'> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> -  </controller> -  <controller type='pci' index='0' model='pci-root'/> -  <interface type='hostdev' managed='yes'> -   <mac address='fa:16:3e:aa:d9:23'/> -   <source> -    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> -  </interface> -  <serial type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='isa-serial' port='0'> -    <model name='isa-serial'/> -   </target> -  </serial> -  <console type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='serial' port='0'/> -  </console> -  <input type='tablet' bus='usb'> -   <address type='usb' bus='0' port='1'/> -  </input> -  <input type='mouse' bus='ps2'/> -  <input type='keyboard' bus='ps2'/> -  <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> -   <listen type='address' address='0.0.0.0'/> -  </graphics> -  <audio id='1' type='none'/> -  <video> -   <model type='virtio' heads='1' primary='yes'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> -  </video> -  <hostdev mode='subsystem' type='pci' managed='yes'> -   <source> -    <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> -  </hostdev> -  <memballoon model='virtio'> -   <stats period='10'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> -  </memballoon> -  <rng model='virtio'> -   <backend model='random'>/dev/urandom</backend> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> -  </rng> - </devices> -</domain> -``` ----- -Simon Jones - diff --git a/results/classifier/015/unknown/13442371 b/results/classifier/015/unknown/13442371 deleted file mode 100644 index 2e6a8bc91..000000000 --- a/results/classifier/015/unknown/13442371 +++ /dev/null @@ -1,396 +0,0 @@ -TCG: 0.901 -user-level: 0.899 -peripherals: 0.889 -ppc: 0.884 -device: 0.883 -assembly: 0.877 -virtual: 0.872 -KVM: 0.872 -hypervisor: 0.872 -register: 0.871 -i386: 0.870 -operating system: 0.870 -arm: 0.867 -debug: 0.866 -mistranslation: 0.859 -vnc: 0.858 -permissions: 0.854 -graphic: 0.850 -semantic: 0.850 -PID: 0.849 -architecture: 0.846 -risc-v: 0.846 -VMM: 0.843 -kernel: 0.842 -performance: 0.841 -files: 0.837 -x86: 0.836 -socket: 0.831 -alpha: 0.819 -boot: 0.815 -network: 0.811 - -[Qemu-devel] [BUG] nanoMIPS support problem related to extract2 support for i386 TCG target - -Hello, Richard, Peter, and others. - -As a part of activities before 4.1 release, I tested nanoMIPS support -in QEMU (which was officially fully integrated in 4.0, is currently -limited to system mode only, and was tested in a similar fashion right -prior to 4.0). - -This support appears to be broken now. Following command line works in -4.0, but results in kernel panic for the current tip of the tree: - -~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel --cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m -1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append -"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" - -(kernel and rootfs image files used in this commend line can be -downloaded from the locations mentioned in our user guide) - -The quick bisect points to the commit: - -commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab -Author: Richard Henderson <address@hidden> -Date: Mon Feb 25 11:42:35 2019 -0800 - - tcg/i386: Support INDEX_op_extract2_{i32,i64} - - Signed-off-by: Richard Henderson <address@hidden> - -Please advise on further actions. - -Yours, -Aleksandar - -On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic -<address@hidden> wrote: -> -> -Hello, Richard, Peter, and others. -> -> -As a part of activities before 4.1 release, I tested nanoMIPS support -> -in QEMU (which was officially fully integrated in 4.0, is currently -> -limited to system mode only, and was tested in a similar fashion right -> -prior to 4.0). -> -> -This support appears to be broken now. Following command line works in -> -4.0, but results in kernel panic for the current tip of the tree: -> -> -~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel -> --cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m -> -1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append -> -"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" -> -> -(kernel and rootfs image files used in this commend line can be -> -downloaded from the locations mentioned in our user guide) -> -> -The quick bisect points to the commit: -> -> -commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab -> -Author: Richard Henderson <address@hidden> -> -Date: Mon Feb 25 11:42:35 2019 -0800 -> -> -tcg/i386: Support INDEX_op_extract2_{i32,i64} -> -> -Signed-off-by: Richard Henderson <address@hidden> -> -> -Please advise on further actions. -> -Just to add a data point: - -If the following change is applied: - -diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h -index 928e8b8..b6a4cf2 100644 ---- a/tcg/i386/tcg-target.h -+++ b/tcg/i386/tcg-target.h -@@ -124,7 +124,7 @@ extern bool have_avx2; - #define TCG_TARGET_HAS_deposit_i32 1 - #define TCG_TARGET_HAS_extract_i32 1 - #define TCG_TARGET_HAS_sextract_i32 1 --#define TCG_TARGET_HAS_extract2_i32 1 -+#define TCG_TARGET_HAS_extract2_i32 0 - #define TCG_TARGET_HAS_movcond_i32 1 - #define TCG_TARGET_HAS_add2_i32 1 - #define TCG_TARGET_HAS_sub2_i32 1 -@@ -163,7 +163,7 @@ extern bool have_avx2; - #define TCG_TARGET_HAS_deposit_i64 1 - #define TCG_TARGET_HAS_extract_i64 1 - #define TCG_TARGET_HAS_sextract_i64 0 --#define TCG_TARGET_HAS_extract2_i64 1 -+#define TCG_TARGET_HAS_extract2_i64 0 - #define TCG_TARGET_HAS_movcond_i64 1 - #define TCG_TARGET_HAS_add2_i64 1 - #define TCG_TARGET_HAS_sub2_i64 1 - -... the problem disappears. - - -> -Yours, -> -Aleksandar - -On Fri, Jul 12, 2019 at 8:19 PM Aleksandar Markovic -<address@hidden> wrote: -> -> -On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic -> -<address@hidden> wrote: -> -> -> -> Hello, Richard, Peter, and others. -> -> -> -> As a part of activities before 4.1 release, I tested nanoMIPS support -> -> in QEMU (which was officially fully integrated in 4.0, is currently -> -> limited to system mode only, and was tested in a similar fashion right -> -> prior to 4.0). -> -> -> -> This support appears to be broken now. Following command line works in -> -> 4.0, but results in kernel panic for the current tip of the tree: -> -> -> -> ~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel -> -> -cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m -> -> 1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append -> -> "mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" -> -> -> -> (kernel and rootfs image files used in this commend line can be -> -> downloaded from the locations mentioned in our user guide) -> -> -> -> The quick bisect points to the commit: -> -> -> -> commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab -> -> Author: Richard Henderson <address@hidden> -> -> Date: Mon Feb 25 11:42:35 2019 -0800 -> -> -> -> tcg/i386: Support INDEX_op_extract2_{i32,i64} -> -> -> -> Signed-off-by: Richard Henderson <address@hidden> -> -> -> -> Please advise on further actions. -> -> -> -> -Just to add a data point: -> -> -If the following change is applied: -> -> -diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h -> -index 928e8b8..b6a4cf2 100644 -> ---- a/tcg/i386/tcg-target.h -> -+++ b/tcg/i386/tcg-target.h -> -@@ -124,7 +124,7 @@ extern bool have_avx2; -> -#define TCG_TARGET_HAS_deposit_i32 1 -> -#define TCG_TARGET_HAS_extract_i32 1 -> -#define TCG_TARGET_HAS_sextract_i32 1 -> --#define TCG_TARGET_HAS_extract2_i32 1 -> -+#define TCG_TARGET_HAS_extract2_i32 0 -> -#define TCG_TARGET_HAS_movcond_i32 1 -> -#define TCG_TARGET_HAS_add2_i32 1 -> -#define TCG_TARGET_HAS_sub2_i32 1 -> -@@ -163,7 +163,7 @@ extern bool have_avx2; -> -#define TCG_TARGET_HAS_deposit_i64 1 -> -#define TCG_TARGET_HAS_extract_i64 1 -> -#define TCG_TARGET_HAS_sextract_i64 0 -> --#define TCG_TARGET_HAS_extract2_i64 1 -> -+#define TCG_TARGET_HAS_extract2_i64 0 -> -#define TCG_TARGET_HAS_movcond_i64 1 -> -#define TCG_TARGET_HAS_add2_i64 1 -> -#define TCG_TARGET_HAS_sub2_i64 1 -> -> -... the problem disappears. -> -It looks the problem is in this code segment in of tcg_gen_deposit_i32(): - - if (ofs == 0) { - tcg_gen_extract2_i32(ret, arg1, arg2, len); - tcg_gen_rotli_i32(ret, ret, len); - goto done; - } - -) - -If that code segment is deleted altogether (which effectively forces -usage of "fallback" part of tcg_gen_deposit_i32()), the problem also -vanishes (without changes from my previous mail). - -> -> -> Yours, -> -> Aleksandar - -Aleksandar Markovic <address@hidden> writes: - -> -Hello, Richard, Peter, and others. -> -> -As a part of activities before 4.1 release, I tested nanoMIPS support -> -in QEMU (which was officially fully integrated in 4.0, is currently -> -limited to system mode only, and was tested in a similar fashion right -> -prior to 4.0). -> -> -This support appears to be broken now. Following command line works in -> -4.0, but results in kernel panic for the current tip of the tree: -> -> -~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel -> --cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m -> -1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append -> -"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" -> -> -(kernel and rootfs image files used in this commend line can be -> -downloaded from the locations mentioned in our user guide) -> -> -The quick bisect points to the commit: -> -> -commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab -> -Author: Richard Henderson <address@hidden> -> -Date: Mon Feb 25 11:42:35 2019 -0800 -> -> -tcg/i386: Support INDEX_op_extract2_{i32,i64} -> -> -Signed-off-by: Richard Henderson <address@hidden> -> -> -Please advise on further actions. -Please see the fix: - - Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32 - Date: Tue, 9 Jul 2019 14:19:00 +0200 - Message-Id: <address@hidden> - -> -> -Yours, -> -Aleksandar --- -Alex Bennée - -On Sat, Jul 13, 2019 at 9:21 AM Alex Bennée <address@hidden> wrote: -> -> -Please see the fix: -> -> -Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32 -> -Date: Tue, 9 Jul 2019 14:19:00 +0200 -> -Message-Id: <address@hidden> -> -Thanks, this fixed the behavior. - -Sincerely, -Aleksandar - -> -> -> -> -> Yours, -> -> Aleksandar -> -> -> --- -> -Alex Bennée -> - diff --git a/results/classifier/015/unknown/23270873 b/results/classifier/015/unknown/23270873 deleted file mode 100644 index 3cf889b8d..000000000 --- a/results/classifier/015/unknown/23270873 +++ /dev/null @@ -1,719 +0,0 @@ -user-level: 0.896 -mistranslation: 0.881 -risc-v: 0.859 -operating system: 0.844 -boot: 0.830 -TCG: 0.828 -ppc: 0.827 -vnc: 0.820 -peripherals: 0.820 -device: 0.810 -hypervisor: 0.806 -KVM: 0.803 -virtual: 0.802 -permissions: 0.802 -register: 0.797 -VMM: 0.792 -debug: 0.788 -assembly: 0.768 -network: 0.768 -graphic: 0.764 -arm: 0.761 -socket: 0.758 -semantic: 0.752 -performance: 0.744 -architecture: 0.742 -kernel: 0.735 -PID: 0.731 -x86: 0.730 -files: 0.730 -alpha: 0.712 -i386: 0.705 - -[Qemu-devel] [BUG?] aio_get_linux_aio: Assertion `ctx->linux_aio' failed - -Hi, - -I am seeing some strange QEMU assertion failures for qemu on s390x, -which prevents a guest from starting. - -Git bisecting points to the following commit as the source of the error. - -commit ed6e2161715c527330f936d44af4c547f25f687e -Author: Nishanth Aravamudan <address@hidden> -Date: Fri Jun 22 12:37:00 2018 -0700 - - linux-aio: properly bubble up errors from initialization - - laio_init() can fail for a couple of reasons, which will lead to a NULL - pointer dereference in laio_attach_aio_context(). - - To solve this, add a aio_setup_linux_aio() function which is called - early in raw_open_common. If this fails, propagate the error up. The - signature of aio_get_linux_aio() was not modified, because it seems - preferable to return the actual errno from the possible failing - initialization calls. - - Additionally, when the AioContext changes, we need to associate a - LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context - callback and call the new aio_setup_linux_aio(), which will allocate a -new AioContext if needed, and return errors on failures. If it -fails for -any reason, fallback to threaded AIO with an error message, as the - device is already in-use by the guest. - - Add an assert that aio_get_linux_aio() cannot return NULL. - - Signed-off-by: Nishanth Aravamudan <address@hidden> - Message-id: address@hidden - Signed-off-by: Stefan Hajnoczi <address@hidden> -Not sure what is causing this assertion to fail. Here is the qemu -command line of the guest, from qemu log, which throws this error: -LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin -QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name -guest=rt_vm1,debug-threads=on -S -object -secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes --machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m -1024 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object -iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d --display none -no-user-config -nodefaults -chardev -socket,id=charmonitor,fd=28,server,nowait -mon -chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown --boot strict=on -drive -file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native --device -virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on --netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device -virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000 --netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device -virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002 --chardev pty,id=charconsole0 -device -sclpconsole,chardev=charconsole0,id=console0 -device -virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox -on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny --msg timestamp=on -2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges -2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev -pty,id=charconsole0: char device redirected to /dev/pts/3 (label -charconsole0) -qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion -`ctx->linux_aio' failed. -2018-07-17 15:48:43.309+0000: shutting down, reason=failed - - -Any help debugging this would be greatly appreciated. - -Thank you -Farhan - -On 17.07.2018 [13:25:53 -0400], Farhan Ali wrote: -> -Hi, -> -> -I am seeing some strange QEMU assertion failures for qemu on s390x, -> -which prevents a guest from starting. -> -> -Git bisecting points to the following commit as the source of the error. -> -> -commit ed6e2161715c527330f936d44af4c547f25f687e -> -Author: Nishanth Aravamudan <address@hidden> -> -Date: Fri Jun 22 12:37:00 2018 -0700 -> -> -linux-aio: properly bubble up errors from initialization -> -> -laio_init() can fail for a couple of reasons, which will lead to a NULL -> -pointer dereference in laio_attach_aio_context(). -> -> -To solve this, add a aio_setup_linux_aio() function which is called -> -early in raw_open_common. If this fails, propagate the error up. The -> -signature of aio_get_linux_aio() was not modified, because it seems -> -preferable to return the actual errno from the possible failing -> -initialization calls. -> -> -Additionally, when the AioContext changes, we need to associate a -> -LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context -> -callback and call the new aio_setup_linux_aio(), which will allocate a -> -new AioContext if needed, and return errors on failures. If it fails for -> -any reason, fallback to threaded AIO with an error message, as the -> -device is already in-use by the guest. -> -> -Add an assert that aio_get_linux_aio() cannot return NULL. -> -> -Signed-off-by: Nishanth Aravamudan <address@hidden> -> -Message-id: address@hidden -> -Signed-off-by: Stefan Hajnoczi <address@hidden> -> -> -> -Not sure what is causing this assertion to fail. Here is the qemu command -> -line of the guest, from qemu log, which throws this error: -> -> -> -LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin -> -QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name -> -guest=rt_vm1,debug-threads=on -S -object -> -secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes -> --machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m 1024 -> --realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object -> -iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d -display -> -none -no-user-config -nodefaults -chardev -> -socket,id=charmonitor,fd=28,server,nowait -mon -> -chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot -> -strict=on -drive -> -file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native -> --device -> -virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on -> --netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device -> -virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000 -> --netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device -> -virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002 -> --chardev pty,id=charconsole0 -device -> -sclpconsole,chardev=charconsole0,id=console0 -device -> -virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox -> -on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg -> -timestamp=on -> -> -> -> -2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges -> -2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev pty,id=charconsole0: -> -char device redirected to /dev/pts/3 (label charconsole0) -> -qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion -> -`ctx->linux_aio' failed. -> -2018-07-17 15:48:43.309+0000: shutting down, reason=failed -> -> -> -Any help debugging this would be greatly appreciated. -iiuc, this possibly implies AIO was not actually used previously on this -guest (it might have silently been falling back to threaded IO?). I -don't have access to s390x, but would it be possible to run qemu under -gdb and see if aio_setup_linux_aio is being called at all (I think it -might not be, but I'm not sure why), and if so, if it's for the context -in question? - -If it's not being called first, could you see what callpath is calling -aio_get_linux_aio when this assertion trips? - -Thanks! --Nish - -On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: -iiuc, this possibly implies AIO was not actually used previously on this -guest (it might have silently been falling back to threaded IO?). I -don't have access to s390x, but would it be possible to run qemu under -gdb and see if aio_setup_linux_aio is being called at all (I think it -might not be, but I'm not sure why), and if so, if it's for the context -in question? - -If it's not being called first, could you see what callpath is calling -aio_get_linux_aio when this assertion trips? - -Thanks! --Nish -Hi Nishant, -From the coredump of the guest this is the call trace that calls -aio_get_linux_aio: -Stack trace of thread 145158: -#0 0x000003ff94dbe274 raise (libc.so.6) -#1 0x000003ff94da39a8 abort (libc.so.6) -#2 0x000003ff94db62ce __assert_fail_base (libc.so.6) -#3 0x000003ff94db634c __assert_fail (libc.so.6) -#4 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) -#5 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) -#6 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) -#7 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) -#8 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) -#9 0x000002aa20db3c34 aio_poll (qemu-system-s390x) -#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) -#11 0x000003ff94f879a8 start_thread (libpthread.so.0) -#12 0x000003ff94e797ee thread_start (libc.so.6) - - -Thanks for taking a look and responding. - -Thanks -Farhan - -On 07/18/2018 09:42 AM, Farhan Ali wrote: -On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: -iiuc, this possibly implies AIO was not actually used previously on this -guest (it might have silently been falling back to threaded IO?). I -don't have access to s390x, but would it be possible to run qemu under -gdb and see if aio_setup_linux_aio is being called at all (I think it -might not be, but I'm not sure why), and if so, if it's for the context -in question? - -If it's not being called first, could you see what callpath is calling -aio_get_linux_aio when this assertion trips? - -Thanks! --Nish -Hi Nishant, -From the coredump of the guest this is the call trace that calls -aio_get_linux_aio: -Stack trace of thread 145158: -#0 0x000003ff94dbe274 raise (libc.so.6) -#1 0x000003ff94da39a8 abort (libc.so.6) -#2 0x000003ff94db62ce __assert_fail_base (libc.so.6) -#3 0x000003ff94db634c __assert_fail (libc.so.6) -#4 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) -#5 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) -#6 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) -#7 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) -#8 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) -#9 0x000002aa20db3c34 aio_poll (qemu-system-s390x) -#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) -#11 0x000003ff94f879a8 start_thread (libpthread.so.0) -#12 0x000003ff94e797ee thread_start (libc.so.6) - - -Thanks for taking a look and responding. - -Thanks -Farhan -Trying to debug a little further, the block device in this case is a -"host device". And looking at your commit carefully you use the -bdrv_attach_aio_context callback to setup a Linux AioContext. -For some reason the "host device" struct (BlockDriver bdrv_host_device -in block/file-posix.c) does not have a bdrv_attach_aio_context defined. -So a simple change of adding the callback to the struct solves the issue -and the guest starts fine. -diff --git a/block/file-posix.c b/block/file-posix.c -index 28824aa..b8d59fb 100644 ---- a/block/file-posix.c -+++ b/block/file-posix.c -@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = { - .bdrv_refresh_limits = raw_refresh_limits, - .bdrv_io_plug = raw_aio_plug, - .bdrv_io_unplug = raw_aio_unplug, -+ .bdrv_attach_aio_context = raw_aio_attach_aio_context, - - .bdrv_co_truncate = raw_co_truncate, - .bdrv_getlength = raw_getlength, -I am not too familiar with block device code in QEMU, so not sure if -this is the right fix or if there are some underlying problems. -Thanks -Farhan - -On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote: -> -> -> -On 07/18/2018 09:42 AM, Farhan Ali wrote: -> -> -> -> -> -> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: -> -> > iiuc, this possibly implies AIO was not actually used previously on this -> -> > guest (it might have silently been falling back to threaded IO?). I -> -> > don't have access to s390x, but would it be possible to run qemu under -> -> > gdb and see if aio_setup_linux_aio is being called at all (I think it -> -> > might not be, but I'm not sure why), and if so, if it's for the context -> -> > in question? -> -> > -> -> > If it's not being called first, could you see what callpath is calling -> -> > aio_get_linux_aio when this assertion trips? -> -> > -> -> > Thanks! -> -> > -Nish -> -> -> -> -> -> Hi Nishant, -> -> -> -> From the coredump of the guest this is the call trace that calls -> -> aio_get_linux_aio: -> -> -> -> -> -> Stack trace of thread 145158: -> -> #0 0x000003ff94dbe274 raise (libc.so.6) -> -> #1 0x000003ff94da39a8 abort (libc.so.6) -> -> #2 0x000003ff94db62ce __assert_fail_base (libc.so.6) -> -> #3 0x000003ff94db634c __assert_fail (libc.so.6) -> -> #4 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) -> -> #5 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) -> -> #6 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) -> -> #7 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) -> -> #8 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) -> -> #9 0x000002aa20db3c34 aio_poll (qemu-system-s390x) -> -> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) -> -> #11 0x000003ff94f879a8 start_thread (libpthread.so.0) -> -> #12 0x000003ff94e797ee thread_start (libc.so.6) -> -> -> -> -> -> Thanks for taking a look and responding. -> -> -> -> Thanks -> -> Farhan -> -> -> -> -> -> -> -> -Trying to debug a little further, the block device in this case is a "host -> -device". And looking at your commit carefully you use the -> -bdrv_attach_aio_context callback to setup a Linux AioContext. -> -> -For some reason the "host device" struct (BlockDriver bdrv_host_device in -> -block/file-posix.c) does not have a bdrv_attach_aio_context defined. -> -So a simple change of adding the callback to the struct solves the issue and -> -the guest starts fine. -> -> -> -diff --git a/block/file-posix.c b/block/file-posix.c -> -index 28824aa..b8d59fb 100644 -> ---- a/block/file-posix.c -> -+++ b/block/file-posix.c -> -@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = { -> -.bdrv_refresh_limits = raw_refresh_limits, -> -.bdrv_io_plug = raw_aio_plug, -> -.bdrv_io_unplug = raw_aio_unplug, -> -+ .bdrv_attach_aio_context = raw_aio_attach_aio_context, -> -> -.bdrv_co_truncate = raw_co_truncate, -> -.bdrv_getlength = raw_getlength, -> -> -> -> -I am not too familiar with block device code in QEMU, so not sure if -> -this is the right fix or if there are some underlying problems. -Oh this is quite embarassing! I only added the bdrv_attach_aio_context -callback for the file-backed device. Your fix is definitely corect for -host device. Let me make sure there weren't any others missed and I will -send out a properly formatted patch. Thank you for the quick testing and -turnaround! - --Nish - -On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote: -> -On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote: -> -> -> -> -> -> On 07/18/2018 09:42 AM, Farhan Ali wrote: -> ->> -> ->> -> ->> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: -> ->>> iiuc, this possibly implies AIO was not actually used previously on this -> ->>> guest (it might have silently been falling back to threaded IO?). I -> ->>> don't have access to s390x, but would it be possible to run qemu under -> ->>> gdb and see if aio_setup_linux_aio is being called at all (I think it -> ->>> might not be, but I'm not sure why), and if so, if it's for the context -> ->>> in question? -> ->>> -> ->>> If it's not being called first, could you see what callpath is calling -> ->>> aio_get_linux_aio when this assertion trips? -> ->>> -> ->>> Thanks! -> ->>> -Nish -> ->> -> ->> -> ->> Hi Nishant, -> ->> -> ->> From the coredump of the guest this is the call trace that calls -> ->> aio_get_linux_aio: -> ->> -> ->> -> ->> Stack trace of thread 145158: -> ->> #0 0x000003ff94dbe274 raise (libc.so.6) -> ->> #1 0x000003ff94da39a8 abort (libc.so.6) -> ->> #2 0x000003ff94db62ce __assert_fail_base (libc.so.6) -> ->> #3 0x000003ff94db634c __assert_fail (libc.so.6) -> ->> #4 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) -> ->> #5 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) -> ->> #6 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) -> ->> #7 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) -> ->> #8 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) -> ->> #9 0x000002aa20db3c34 aio_poll (qemu-system-s390x) -> ->> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) -> ->> #11 0x000003ff94f879a8 start_thread (libpthread.so.0) -> ->> #12 0x000003ff94e797ee thread_start (libc.so.6) -> ->> -> ->> -> ->> Thanks for taking a look and responding. -> ->> -> ->> Thanks -> ->> Farhan -> ->> -> ->> -> ->> -> -> -> -> Trying to debug a little further, the block device in this case is a "host -> -> device". And looking at your commit carefully you use the -> -> bdrv_attach_aio_context callback to setup a Linux AioContext. -> -> -> -> For some reason the "host device" struct (BlockDriver bdrv_host_device in -> -> block/file-posix.c) does not have a bdrv_attach_aio_context defined. -> -> So a simple change of adding the callback to the struct solves the issue and -> -> the guest starts fine. -> -> -> -> -> -> diff --git a/block/file-posix.c b/block/file-posix.c -> -> index 28824aa..b8d59fb 100644 -> -> --- a/block/file-posix.c -> -> +++ b/block/file-posix.c -> -> @@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = { -> -> .bdrv_refresh_limits = raw_refresh_limits, -> -> .bdrv_io_plug = raw_aio_plug, -> -> .bdrv_io_unplug = raw_aio_unplug, -> -> + .bdrv_attach_aio_context = raw_aio_attach_aio_context, -> -> -> -> .bdrv_co_truncate = raw_co_truncate, -> -> .bdrv_getlength = raw_getlength, -> -> -> -> -> -> -> -> I am not too familiar with block device code in QEMU, so not sure if -> -> this is the right fix or if there are some underlying problems. -> -> -Oh this is quite embarassing! I only added the bdrv_attach_aio_context -> -callback for the file-backed device. Your fix is definitely corect for -> -host device. Let me make sure there weren't any others missed and I will -> -send out a properly formatted patch. Thank you for the quick testing and -> -turnaround! -Farhan, can you respin your patch with proper sign-off and patch description? -Adding qemu-block. - -Hi Christian, - -On 19.07.2018 [08:55:20 +0200], Christian Borntraeger wrote: -> -> -> -On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote: -> -> On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote: -> ->> -> ->> -> ->> On 07/18/2018 09:42 AM, Farhan Ali wrote: -<snip> - -> ->> I am not too familiar with block device code in QEMU, so not sure if -> ->> this is the right fix or if there are some underlying problems. -> -> -> -> Oh this is quite embarassing! I only added the bdrv_attach_aio_context -> -> callback for the file-backed device. Your fix is definitely corect for -> -> host device. Let me make sure there weren't any others missed and I will -> -> send out a properly formatted patch. Thank you for the quick testing and -> -> turnaround! -> -> -Farhan, can you respin your patch with proper sign-off and patch description? -> -Adding qemu-block. -I sent it yesterday, sorry I didn't cc everyone from this e-mail: -http://lists.nongnu.org/archive/html/qemu-block/2018-07/msg00516.html -Thanks, -Nish - diff --git a/results/classifier/015/unknown/25842545 b/results/classifier/015/unknown/25842545 deleted file mode 100644 index 3d1966907..000000000 --- a/results/classifier/015/unknown/25842545 +++ /dev/null @@ -1,229 +0,0 @@ -user-level: 0.958 -risc-v: 0.929 -mistranslation: 0.928 -register: 0.879 -VMM: 0.871 -KVM: 0.867 -ppc: 0.866 -vnc: 0.862 -TCG: 0.857 -device: 0.847 -virtual: 0.843 -hypervisor: 0.841 -debug: 0.836 -performance: 0.831 -semantic: 0.829 -PID: 0.829 -peripherals: 0.829 -boot: 0.824 -assembly: 0.824 -graphic: 0.822 -x86: 0.821 -i386: 0.819 -arm: 0.818 -permissions: 0.817 -alpha: 0.809 -socket: 0.808 -kernel: 0.808 -files: 0.806 -architecture: 0.804 -operating system: 0.798 -network: 0.796 - -[Qemu-devel] [Bug?] Guest pause because VMPTRLD failed in KVM - -Hello, - - We encountered a problem that a guest paused because the KMOD report VMPTRLD -failed. - -The related information is as follows: - -1) Qemu command: - /usr/bin/qemu-kvm -name omu1 -S -machine pc-i440fx-2.3,accel=kvm,usb=off -cpu -host -m 15625 -realtime mlock=off -smp 8,sockets=1,cores=8,threads=1 -uuid -a2aacfff-6583-48b4-b6a4-e6830e519931 -no-user-config -nodefaults -chardev -socket,id=charmonitor,path=/var/lib/libvirt/qemu/omu1.monitor,server,nowait --mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown --boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device -virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive -file=/home/env/guest1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native - -device -virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0 - -drive -file=/home/env/guest_300G.img,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native - -device -virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 - -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device -virtio-net-pci,netdev=hostnet0,id=net0,mac=00:00:80:05:00:00,bus=pci.0,addr=0x3 --netdev tap,fd=27,id=hostnet1,vhost=on,vhostfd=28 -device -virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:80:05:00:01,bus=pci.0,addr=0x4 --chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 --device usb-tablet,id=input0 -vnc 0.0.0.0:0 -device -cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device -virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on - - 2) Qemu log: - KVM: entry failed, hardware error 0x4 - RAX=00000000ffffffed RBX=ffff8803fa2d7fd8 RCX=0100000000000000 -RDX=0000000000000000 - RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8803fa2d7e90 -RSP=ffff8803fa2efe90 - R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 -R11=000000000000b69a - R12=0000000000000001 R13=ffffffff81a25b40 R14=0000000000000000 -R15=ffff8803fa2d7fd8 - RIP=ffffffff81053e16 RFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 - ES =0000 0000000000000000 ffffffff 00c00000 - CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] - SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] - DS =0000 0000000000000000 ffffffff 00c00000 - FS =0000 0000000000000000 ffffffff 00c00000 - GS =0000 ffff88040f540000 ffffffff 00c00000 - LDT=0000 0000000000000000 ffffffff 00c00000 - TR =0040 ffff88040f550a40 00002087 00008b00 DPL=0 TSS64-busy - GDT= ffff88040f549000 0000007f - IDT= ffffffffff529000 00000fff - CR0=80050033 CR2=00007f81ca0c5000 CR3=00000003f5081000 CR4=000407e0 - DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 -DR3=0000000000000000 - DR6=00000000ffff0ff0 DR7=0000000000000400 - EFER=0000000000000d01 - Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? -?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? - - 3) Demsg - [347315.028339] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed - klogd 1.4.1, ---------- state change ---------- - [347315.039506] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed - [347315.051728] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed - [347315.057472] vmwrite error: reg 6c0a value ffff88307e66e480 (err -2120672384) - [347315.064567] Pid: 69523, comm: qemu-kvm Tainted: GF X -3.0.93-0.8-default #1 - [347315.064569] Call Trace: - [347315.064587] [<ffffffff810049d5>] dump_trace+0x75/0x300 - [347315.064595] [<ffffffff8145e3e3>] dump_stack+0x69/0x6f - [347315.064617] [<ffffffffa03738de>] vmx_vcpu_load+0x11e/0x1d0 [kvm_intel] - [347315.064647] [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm] - [347315.064669] [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0 - [347315.064676] [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7 - [347315.064687] [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm] - [347315.064703] [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm] - [347315.064732] [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 -[kvm] - [347315.064759] [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] - [347315.064771] [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0 - [347315.064776] [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0 - [347315.064783] [<ffffffff81469272>] system_call_fastpath+0x16/0x1b - [347315.064797] [<00007fee51969ce7>] 0x7fee51969ce6 - [347315.064799] vmwrite error: reg 6c0c value ffff88307e664000 (err -2120630272) - [347315.064802] Pid: 69523, comm: qemu-kvm Tainted: GF X -3.0.93-0.8-default #1 - [347315.064803] Call Trace: - [347315.064807] [<ffffffff810049d5>] dump_trace+0x75/0x300 - [347315.064811] [<ffffffff8145e3e3>] dump_stack+0x69/0x6f - [347315.064817] [<ffffffffa03738ec>] vmx_vcpu_load+0x12c/0x1d0 [kvm_intel] - [347315.064832] [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm] - [347315.064851] [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0 - [347315.064855] [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7 - [347315.064865] [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm] - [347315.064880] [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm] - [347315.064907] [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 -[kvm] - [347315.064933] [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] - [347315.064943] [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0 - [347315.064947] [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0 - [347315.064951] [<ffffffff81469272>] system_call_fastpath+0x16/0x1b - [347315.064957] [<00007fee51969ce7>] 0x7fee51969ce6 - [347315.064959] vmwrite error: reg 6c10 value 0 (err 0) - - 4) The isssue can't be reporduced. I search the Intel VMX sepc about reaseons -of vmptrld failure: - The instruction fails if its operand is not properly aligned, sets -unsupported physical-address bits, or is equal to the VMXON - pointer. In addition, the instruction fails if the 32 bits in memory -referenced by the operand do not match the VMCS - revision identifier supported by this processor. - - But I can't find any cues from the KVM source code. It seems each - error conditions is impossible in theory. :( - -Any suggestions will be appreciated! Paolo? - --- -Regards, --Gonglei - -On 10/11/2016 15:10, gong lei wrote: -> -4) The isssue can't be reporduced. I search the Intel VMX sepc about -> -reaseons -> -of vmptrld failure: -> -The instruction fails if its operand is not properly aligned, sets -> -unsupported physical-address bits, or is equal to the VMXON -> -pointer. In addition, the instruction fails if the 32 bits in memory -> -referenced by the operand do not match the VMCS -> -revision identifier supported by this processor. -> -> -But I can't find any cues from the KVM source code. It seems each -> -error conditions is impossible in theory. :( -Yes, it should not happen. :( - -If it's not reproducible, it's really hard to say what it was, except a -random memory corruption elsewhere or even a bit flip (!). - -Paolo - -On 2016/11/17 20:39, Paolo Bonzini wrote: -> -> -On 10/11/2016 15:10, gong lei wrote: -> -> 4) The isssue can't be reporduced. I search the Intel VMX sepc about -> -> reaseons -> -> of vmptrld failure: -> -> The instruction fails if its operand is not properly aligned, sets -> -> unsupported physical-address bits, or is equal to the VMXON -> -> pointer. In addition, the instruction fails if the 32 bits in memory -> -> referenced by the operand do not match the VMCS -> -> revision identifier supported by this processor. -> -> -> -> But I can't find any cues from the KVM source code. It seems each -> -> error conditions is impossible in theory. :( -> -Yes, it should not happen. :( -> -> -If it's not reproducible, it's really hard to say what it was, except a -> -random memory corruption elsewhere or even a bit flip (!). -> -> -Paolo -Thanks for your reply, Paolo :) - --- -Regards, --Gonglei - diff --git a/results/classifier/015/unknown/25892827 b/results/classifier/015/unknown/25892827 deleted file mode 100644 index 6346b0c1f..000000000 --- a/results/classifier/015/unknown/25892827 +++ /dev/null @@ -1,1104 +0,0 @@ -risc-v: 0.908 -user-level: 0.889 -permissions: 0.881 -register: 0.876 -KVM: 0.872 -hypervisor: 0.871 -operating system: 0.871 -debug: 0.868 -x86: 0.849 -vnc: 0.846 -mistranslation: 0.842 -boot: 0.839 -network: 0.839 -VMM: 0.839 -device: 0.839 -TCG: 0.837 -virtual: 0.835 -i386: 0.835 -peripherals: 0.833 -graphic: 0.832 -assembly: 0.829 -architecture: 0.825 -semantic: 0.825 -ppc: 0.824 -socket: 0.822 -arm: 0.821 -performance: 0.819 -alpha: 0.816 -kernel: 0.810 -files: 0.804 -PID: 0.792 - -[Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot - -Hi, - -Recently we encountered a problem in our project: 2 CPUs in VM are not brought -up normally after reboot. - -Our host is using KVM kmod 3.6 and QEMU 2.1. -A SLES 11 sp3 VM configured with 8 vcpus, -cpu model is configured with 'host-passthrough'. - -After VM's first time started up, everything seems to be OK. -and then VM is paniced and rebooted. -After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online. - -This is the only message we can get from VM: -VM dmesg shows: -[ 0.069867] Booting Node 0, Processors #1 -[ 5.060042] CPU1: Stuck ?? -[ 5.060499] #2 -[ 5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock -[ 5.088335] KVM setup async PF for cpu 2 -[ 5.092967] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.094405] #3 -[ 5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock -[ 5.108333] KVM setup async PF for cpu 3 -[ 5.113553] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.114970] #4 -[ 5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock -[ 5.128336] KVM setup async PF for cpu 4 -[ 5.134576] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.135998] #5 -[ 5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock -[ 5.152334] KVM setup async PF for cpu 5 -[ 5.154764] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.156467] #6 -[ 5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock -[ 5.172341] KVM setup async PF for cpu 6 -[ 5.180738] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.182173] #7 Ok. -[ 10.170815] CPU7: Stuck ?? -[ 10.171648] Brought up 6 CPUs -[ 10.172394] Total of 6 processors activated (28799.97 BogoMIPS). - -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming -any cpu (Should be in idle state), -All of VCPUs' stacks in host is like bellow: - -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -[<ffffffffffffffff>] 0xffffffffffffffff - -We looked into the kernel codes that could leading to the above 'Stuck' warning, -and found that the only possible is the emulation of 'cpuid' instruct in -kvm/qemu has something wrong. -But since we canât reproduce this problem, we are not quite sure. -Is there any possible that the cupid emulation in kvm/qemu has some bug ? - -Has anyone come across these problem before? Or any idea? - -Thanks, -zhanghailiang - -On 06/07/2015 09:54, zhanghailiang wrote: -> -> -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -> -consuming any cpu (Should be in idle state), -> -All of VCPUs' stacks in host is like bellow: -> -> -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -> -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -> -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -> -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -> -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -> -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -> -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -> -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -> -[<ffffffffffffffff>] 0xffffffffffffffff -> -> -We looked into the kernel codes that could leading to the above 'Stuck' -> -warning, -> -and found that the only possible is the emulation of 'cpuid' instruct in -> -kvm/qemu has something wrong. -> -But since we canât reproduce this problem, we are not quite sure. -> -Is there any possible that the cupid emulation in kvm/qemu has some bug ? -Can you explain the relationship to the cpuid emulation? What do the -traces say about vcpus 1 and 7? - -Paolo - -On 2015/7/6 16:45, Paolo Bonzini wrote: -On 06/07/2015 09:54, zhanghailiang wrote: -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -consuming any cpu (Should be in idle state), -All of VCPUs' stacks in host is like bellow: - -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -[<ffffffffffffffff>] 0xffffffffffffffff - -We looked into the kernel codes that could leading to the above 'Stuck' -warning, -and found that the only possible is the emulation of 'cpuid' instruct in -kvm/qemu has something wrong. -But since we canât reproduce this problem, we are not quite sure. -Is there any possible that the cupid emulation in kvm/qemu has some bug ? -Can you explain the relationship to the cpuid emulation? What do the -traces say about vcpus 1 and 7? -OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -located in -do_boot_cpu(). It's in BSP context, the call process is: -BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() --> wakeup_secondary_via_INIT() to trigger APs. -It will wait 5s for APs to startup, if some AP not startup normally, it will -print 'CPU%d Stuck' or 'CPU%d: Not responding'. - -If it prints 'Stuck', it means the AP has received the SIPI interrupt and -begins to execute the code -'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before -smp_callin()(smpboot.c). -The follow is the starup process of BSP and AP. -BSP: -start_kernel() - ->smp_init() - ->smp_boot_cpus() - ->do_boot_cpu() - ->start_ip = trampoline_address(); //set the address that AP will go -to execute - ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU - ->for (timeout = 0; timeout < 50000; timeout++) - if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if AP -startup or not - -APs: -ENTRY(trampoline_data) (trampoline_64.S) - ->ENTRY(secondary_startup_64) (head_64.S) - ->start_secondary() (smpboot.c) - ->cpu_init(); - ->smp_callin(); - ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP comes -here, the BSP will not prints the error message. - -From above call process, we can be sure that, the AP has been stuck between -trampoline_data and the cpumask_set_cpu() in -smp_callin(), we look through these codes path carefully, and only found a -'hlt' instruct that could block the process. -It is located in trampoline_data(): - -ENTRY(trampoline_data) - ... - - call verify_cpu # Verify the cpu supports long mode - testl %eax, %eax # Check for return code - jnz no_longmode - - ... - -no_longmode: - hlt - jmp no_longmode - -For the process verify_cpu(), -we can only find the 'cpuid' sensitive instruct that could lead VM exit from -No-root mode. -This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to -the fail in verify_cpu. - -From the message in VM, we know vcpu1 and vcpu7 is something wrong. -[ 5.060042] CPU1: Stuck ?? -[ 10.170815] CPU7: Stuck ?? -[ 10.171648] Brought up 6 CPUs - -Besides, the follow is the cpus message got from host. -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command -instance-0000000 -* CPU #0: pc=0x00007f64160c683d thread_id=68570 - CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 - CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 - CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 - CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 - CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 - CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 - CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 - -Oh, i also forgot to mention in the above message that, we have bond each vCPU -to different physical CPU in -host. - -Thanks, -zhanghailiang - -On 06/07/2015 11:59, zhanghailiang wrote: -> -> -> -Besides, the follow is the cpus message got from host. -> -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh -> -qemu-monitor-command instance-0000000 -> -* CPU #0: pc=0x00007f64160c683d thread_id=68570 -> -CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 -> -CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 -> -CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 -> -CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 -> -CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 -> -CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 -> -CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 -> -> -Oh, i also forgot to mention in the above message that, we have bond -> -each vCPU to different physical CPU in -> -host. -Can you capture a trace on the host (trace-cmd record -e kvm) and send -it privately? Please note which CPUs get stuck, since I guess it's not -always 1 and 7. - -Paolo - -On Mon, 6 Jul 2015 17:59:10 +0800 -zhanghailiang <address@hidden> wrote: - -> -On 2015/7/6 16:45, Paolo Bonzini wrote: -> -> -> -> -> -> On 06/07/2015 09:54, zhanghailiang wrote: -> ->> -> ->> From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -> ->> consuming any cpu (Should be in idle state), -> ->> All of VCPUs' stacks in host is like bellow: -> ->> -> ->> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -> ->> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -> ->> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -> ->> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -> ->> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -> ->> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -> ->> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b -> ->> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -> ->> [<ffffffffffffffff>] 0xffffffffffffffff -> ->> -> ->> We looked into the kernel codes that could leading to the above 'Stuck' -> ->> warning, -in current upstream there isn't any printk(...Stuck...) left since that code -path -has been reworked. -I've often seen this on over-committed host during guest CPUs up/down torture -test. -Could you update guest kernel to upstream and see if issue reproduces? - -> ->> and found that the only possible is the emulation of 'cpuid' instruct in -> ->> kvm/qemu has something wrong. -> ->> But since we canât reproduce this problem, we are not quite sure. -> ->> Is there any possible that the cupid emulation in kvm/qemu has some bug ? -> -> -> -> Can you explain the relationship to the cpuid emulation? What do the -> -> traces say about vcpus 1 and 7? -> -> -OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -> -located in -> -do_boot_cpu(). It's in BSP context, the call process is: -> -BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() -> --> wakeup_secondary_via_INIT() to trigger APs. -> -It will wait 5s for APs to startup, if some AP not startup normally, it will -> -print 'CPU%d Stuck' or 'CPU%d: Not responding'. -> -> -If it prints 'Stuck', it means the AP has received the SIPI interrupt and -> -begins to execute the code -> -'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places -> -before smp_callin()(smpboot.c). -> -The follow is the starup process of BSP and AP. -> -BSP: -> -start_kernel() -> -->smp_init() -> -->smp_boot_cpus() -> -->do_boot_cpu() -> -->start_ip = trampoline_address(); //set the address that AP will -> -go to execute -> -->wakeup_secondary_cpu_via_init(); // kick the secondary CPU -> -->for (timeout = 0; timeout < 50000; timeout++) -> -if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if -> -AP startup or not -> -> -APs: -> -ENTRY(trampoline_data) (trampoline_64.S) -> -->ENTRY(secondary_startup_64) (head_64.S) -> -->start_secondary() (smpboot.c) -> -->cpu_init(); -> -->smp_callin(); -> -->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP -> -comes here, the BSP will not prints the error message. -> -> -From above call process, we can be sure that, the AP has been stuck between -> -trampoline_data and the cpumask_set_cpu() in -> -smp_callin(), we look through these codes path carefully, and only found a -> -'hlt' instruct that could block the process. -> -It is located in trampoline_data(): -> -> -ENTRY(trampoline_data) -> -... -> -> -call verify_cpu # Verify the cpu supports long mode -> -testl %eax, %eax # Check for return code -> -jnz no_longmode -> -> -... -> -> -no_longmode: -> -hlt -> -jmp no_longmode -> -> -For the process verify_cpu(), -> -we can only find the 'cpuid' sensitive instruct that could lead VM exit from -> -No-root mode. -> -This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to -> -the fail in verify_cpu. -> -> -From the message in VM, we know vcpu1 and vcpu7 is something wrong. -> -[ 5.060042] CPU1: Stuck ?? -> -[ 10.170815] CPU7: Stuck ?? -> -[ 10.171648] Brought up 6 CPUs -> -> -Besides, the follow is the cpus message got from host. -> -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh -> -qemu-monitor-command instance-0000000 -> -* CPU #0: pc=0x00007f64160c683d thread_id=68570 -> -CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 -> -CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 -> -CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 -> -CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 -> -CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 -> -CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 -> -CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 -> -> -Oh, i also forgot to mention in the above message that, we have bond each -> -vCPU to different physical CPU in -> -host. -> -> -Thanks, -> -zhanghailiang -> -> -> -> -> --- -> -To unsubscribe from this list: send the line "unsubscribe kvm" in -> -the body of a message to address@hidden -> -More majordomo info at -http://vger.kernel.org/majordomo-info.html - -On 2015/7/7 19:23, Igor Mammedov wrote: -On Mon, 6 Jul 2015 17:59:10 +0800 -zhanghailiang <address@hidden> wrote: -On 2015/7/6 16:45, Paolo Bonzini wrote: -On 06/07/2015 09:54, zhanghailiang wrote: -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -consuming any cpu (Should be in idle state), -All of VCPUs' stacks in host is like bellow: - -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -[<ffffffffffffffff>] 0xffffffffffffffff - -We looked into the kernel codes that could leading to the above 'Stuck' -warning, -in current upstream there isn't any printk(...Stuck...) left since that code -path -has been reworked. -I've often seen this on over-committed host during guest CPUs up/down torture -test. -Could you update guest kernel to upstream and see if issue reproduces? -Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to -reproduce it. - -For your test case, is it a kernel bug? -Or is there any related patch could solve your test problem been merged into -upstream ? - -Thanks, -zhanghailiang -and found that the only possible is the emulation of 'cpuid' instruct in -kvm/qemu has something wrong. -But since we canât reproduce this problem, we are not quite sure. -Is there any possible that the cupid emulation in kvm/qemu has some bug ? -Can you explain the relationship to the cpuid emulation? What do the -traces say about vcpus 1 and 7? -OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -located in -do_boot_cpu(). It's in BSP context, the call process is: -BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() --> wakeup_secondary_via_INIT() to trigger APs. -It will wait 5s for APs to startup, if some AP not startup normally, it will -print 'CPU%d Stuck' or 'CPU%d: Not responding'. - -If it prints 'Stuck', it means the AP has received the SIPI interrupt and -begins to execute the code -'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before -smp_callin()(smpboot.c). -The follow is the starup process of BSP and AP. -BSP: -start_kernel() - ->smp_init() - ->smp_boot_cpus() - ->do_boot_cpu() - ->start_ip = trampoline_address(); //set the address that AP will -go to execute - ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU - ->for (timeout = 0; timeout < 50000; timeout++) - if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if -AP startup or not - -APs: -ENTRY(trampoline_data) (trampoline_64.S) - ->ENTRY(secondary_startup_64) (head_64.S) - ->start_secondary() (smpboot.c) - ->cpu_init(); - ->smp_callin(); - ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP -comes here, the BSP will not prints the error message. - - From above call process, we can be sure that, the AP has been stuck between -trampoline_data and the cpumask_set_cpu() in -smp_callin(), we look through these codes path carefully, and only found a -'hlt' instruct that could block the process. -It is located in trampoline_data(): - -ENTRY(trampoline_data) - ... - - call verify_cpu # Verify the cpu supports long mode - testl %eax, %eax # Check for return code - jnz no_longmode - - ... - -no_longmode: - hlt - jmp no_longmode - -For the process verify_cpu(), -we can only find the 'cpuid' sensitive instruct that could lead VM exit from -No-root mode. -This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to -the fail in verify_cpu. - - From the message in VM, we know vcpu1 and vcpu7 is something wrong. -[ 5.060042] CPU1: Stuck ?? -[ 10.170815] CPU7: Stuck ?? -[ 10.171648] Brought up 6 CPUs - -Besides, the follow is the cpus message got from host. -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command -instance-0000000 -* CPU #0: pc=0x00007f64160c683d thread_id=68570 - CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 - CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 - CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 - CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 - CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 - CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 - CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 - -Oh, i also forgot to mention in the above message that, we have bond each vCPU -to different physical CPU in -host. - -Thanks, -zhanghailiang - - - - --- -To unsubscribe from this list: send the line "unsubscribe kvm" in -the body of a message to address@hidden -More majordomo info at -http://vger.kernel.org/majordomo-info.html -. - -On Tue, 7 Jul 2015 19:43:35 +0800 -zhanghailiang <address@hidden> wrote: - -> -On 2015/7/7 19:23, Igor Mammedov wrote: -> -> On Mon, 6 Jul 2015 17:59:10 +0800 -> -> zhanghailiang <address@hidden> wrote: -> -> -> ->> On 2015/7/6 16:45, Paolo Bonzini wrote: -> ->>> -> ->>> -> ->>> On 06/07/2015 09:54, zhanghailiang wrote: -> ->>>> -> ->>>> From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -> ->>>> consuming any cpu (Should be in idle state), -> ->>>> All of VCPUs' stacks in host is like bellow: -> ->>>> -> ->>>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -> ->>>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -> ->>>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -> ->>>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -> ->>>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -> ->>>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -> ->>>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b -> ->>>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -> ->>>> [<ffffffffffffffff>] 0xffffffffffffffff -> ->>>> -> ->>>> We looked into the kernel codes that could leading to the above 'Stuck' -> ->>>> warning, -> -> in current upstream there isn't any printk(...Stuck...) left since that -> -> code path -> -> has been reworked. -> -> I've often seen this on over-committed host during guest CPUs up/down -> -> torture test. -> -> Could you update guest kernel to upstream and see if issue reproduces? -> -> -> -> -Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to -> -reproduce it. -> -> -For your test case, is it a kernel bug? -> -Or is there any related patch could solve your test problem been merged into -> -upstream ? -I don't remember all prerequisite patches but you should be able to find -http://marc.info/?l=linux-kernel&m=140326703108009&w=2 -"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it" -and then look for dependencies. - - -> -> -Thanks, -> -zhanghailiang -> -> ->>>> and found that the only possible is the emulation of 'cpuid' instruct in -> ->>>> kvm/qemu has something wrong. -> ->>>> But since we canât reproduce this problem, we are not quite sure. -> ->>>> Is there any possible that the cupid emulation in kvm/qemu has some bug ? -> ->>> -> ->>> Can you explain the relationship to the cpuid emulation? What do the -> ->>> traces say about vcpus 1 and 7? -> ->> -> ->> OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -> ->> located in -> ->> do_boot_cpu(). It's in BSP context, the call process is: -> ->> BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> -> ->> do_boot_cpu() -> wakeup_secondary_via_INIT() to trigger APs. -> ->> It will wait 5s for APs to startup, if some AP not startup normally, it -> ->> will print 'CPU%d Stuck' or 'CPU%d: Not responding'. -> ->> -> ->> If it prints 'Stuck', it means the AP has received the SIPI interrupt and -> ->> begins to execute the code -> ->> 'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places -> ->> before smp_callin()(smpboot.c). -> ->> The follow is the starup process of BSP and AP. -> ->> BSP: -> ->> start_kernel() -> ->> ->smp_init() -> ->> ->smp_boot_cpus() -> ->> ->do_boot_cpu() -> ->> ->start_ip = trampoline_address(); //set the address that AP -> ->> will go to execute -> ->> ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU -> ->> ->for (timeout = 0; timeout < 50000; timeout++) -> ->> if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// -> ->> check if AP startup or not -> ->> -> ->> APs: -> ->> ENTRY(trampoline_data) (trampoline_64.S) -> ->> ->ENTRY(secondary_startup_64) (head_64.S) -> ->> ->start_secondary() (smpboot.c) -> ->> ->cpu_init(); -> ->> ->smp_callin(); -> ->> ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP -> ->> comes here, the BSP will not prints the error message. -> ->> -> ->> From above call process, we can be sure that, the AP has been stuck -> ->> between trampoline_data and the cpumask_set_cpu() in -> ->> smp_callin(), we look through these codes path carefully, and only found a -> ->> 'hlt' instruct that could block the process. -> ->> It is located in trampoline_data(): -> ->> -> ->> ENTRY(trampoline_data) -> ->> ... -> ->> -> ->> call verify_cpu # Verify the cpu supports long mode -> ->> testl %eax, %eax # Check for return code -> ->> jnz no_longmode -> ->> -> ->> ... -> ->> -> ->> no_longmode: -> ->> hlt -> ->> jmp no_longmode -> ->> -> ->> For the process verify_cpu(), -> ->> we can only find the 'cpuid' sensitive instruct that could lead VM exit -> ->> from No-root mode. -> ->> This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading -> ->> to the fail in verify_cpu. -> ->> -> ->> From the message in VM, we know vcpu1 and vcpu7 is something wrong. -> ->> [ 5.060042] CPU1: Stuck ?? -> ->> [ 10.170815] CPU7: Stuck ?? -> ->> [ 10.171648] Brought up 6 CPUs -> ->> -> ->> Besides, the follow is the cpus message got from host. -> ->> 80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh -> ->> qemu-monitor-command instance-0000000 -> ->> * CPU #0: pc=0x00007f64160c683d thread_id=68570 -> ->> CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 -> ->> CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 -> ->> CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 -> ->> CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 -> ->> CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 -> ->> CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 -> ->> CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 -> ->> -> ->> Oh, i also forgot to mention in the above message that, we have bond each -> ->> vCPU to different physical CPU in -> ->> host. -> ->> -> ->> Thanks, -> ->> zhanghailiang -> ->> -> ->> -> ->> -> ->> -> ->> -- -> ->> To unsubscribe from this list: send the line "unsubscribe kvm" in -> ->> the body of a message to address@hidden -> ->> More majordomo info at -http://vger.kernel.org/majordomo-info.html -> -> -> -> -> -> . -> -> -> -> -> - -On 2015/7/7 20:21, Igor Mammedov wrote: -On Tue, 7 Jul 2015 19:43:35 +0800 -zhanghailiang <address@hidden> wrote: -On 2015/7/7 19:23, Igor Mammedov wrote: -On Mon, 6 Jul 2015 17:59:10 +0800 -zhanghailiang <address@hidden> wrote: -On 2015/7/6 16:45, Paolo Bonzini wrote: -On 06/07/2015 09:54, zhanghailiang wrote: -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -consuming any cpu (Should be in idle state), -All of VCPUs' stacks in host is like bellow: - -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -[<ffffffffffffffff>] 0xffffffffffffffff - -We looked into the kernel codes that could leading to the above 'Stuck' -warning, -in current upstream there isn't any printk(...Stuck...) left since that code -path -has been reworked. -I've often seen this on over-committed host during guest CPUs up/down torture -test. -Could you update guest kernel to upstream and see if issue reproduces? -Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to -reproduce it. - -For your test case, is it a kernel bug? -Or is there any related patch could solve your test problem been merged into -upstream ? -I don't remember all prerequisite patches but you should be able to find -http://marc.info/?l=linux-kernel&m=140326703108009&w=2 -"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it" -and then look for dependencies. -Er, we have investigated this patch, and it is not related to our problem, :) - -Thanks. -Thanks, -zhanghailiang -and found that the only possible is the emulation of 'cpuid' instruct in -kvm/qemu has something wrong. -But since we canât reproduce this problem, we are not quite sure. -Is there any possible that the cupid emulation in kvm/qemu has some bug ? -Can you explain the relationship to the cpuid emulation? What do the -traces say about vcpus 1 and 7? -OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -located in -do_boot_cpu(). It's in BSP context, the call process is: -BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() --> wakeup_secondary_via_INIT() to trigger APs. -It will wait 5s for APs to startup, if some AP not startup normally, it will -print 'CPU%d Stuck' or 'CPU%d: Not responding'. - -If it prints 'Stuck', it means the AP has received the SIPI interrupt and -begins to execute the code -'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before -smp_callin()(smpboot.c). -The follow is the starup process of BSP and AP. -BSP: -start_kernel() - ->smp_init() - ->smp_boot_cpus() - ->do_boot_cpu() - ->start_ip = trampoline_address(); //set the address that AP will -go to execute - ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU - ->for (timeout = 0; timeout < 50000; timeout++) - if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if -AP startup or not - -APs: -ENTRY(trampoline_data) (trampoline_64.S) - ->ENTRY(secondary_startup_64) (head_64.S) - ->start_secondary() (smpboot.c) - ->cpu_init(); - ->smp_callin(); - ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP -comes here, the BSP will not prints the error message. - - From above call process, we can be sure that, the AP has been stuck between -trampoline_data and the cpumask_set_cpu() in -smp_callin(), we look through these codes path carefully, and only found a -'hlt' instruct that could block the process. -It is located in trampoline_data(): - -ENTRY(trampoline_data) - ... - - call verify_cpu # Verify the cpu supports long mode - testl %eax, %eax # Check for return code - jnz no_longmode - - ... - -no_longmode: - hlt - jmp no_longmode - -For the process verify_cpu(), -we can only find the 'cpuid' sensitive instruct that could lead VM exit from -No-root mode. -This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to -the fail in verify_cpu. - - From the message in VM, we know vcpu1 and vcpu7 is something wrong. -[ 5.060042] CPU1: Stuck ?? -[ 10.170815] CPU7: Stuck ?? -[ 10.171648] Brought up 6 CPUs - -Besides, the follow is the cpus message got from host. -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command -instance-0000000 -* CPU #0: pc=0x00007f64160c683d thread_id=68570 - CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 - CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 - CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 - CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 - CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 - CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 - CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 - -Oh, i also forgot to mention in the above message that, we have bond each vCPU -to different physical CPU in -host. - -Thanks, -zhanghailiang - - - - --- -To unsubscribe from this list: send the line "unsubscribe kvm" in -the body of a message to address@hidden -More majordomo info at -http://vger.kernel.org/majordomo-info.html -. -. - diff --git a/results/classifier/015/unknown/31349848 b/results/classifier/015/unknown/31349848 deleted file mode 100644 index 760572b3c..000000000 --- a/results/classifier/015/unknown/31349848 +++ /dev/null @@ -1,181 +0,0 @@ -permissions: 0.908 -PID: 0.903 -device: 0.881 -graphic: 0.876 -register: 0.870 -virtual: 0.869 -risc-v: 0.865 -performance: 0.864 -assembly: 0.855 -vnc: 0.854 -arm: 0.852 -semantic: 0.846 -socket: 0.846 -user-level: 0.845 -architecture: 0.845 -alpha: 0.837 -operating system: 0.830 -TCG: 0.828 -KVM: 0.827 -debug: 0.826 -files: 0.820 -boot: 0.815 -hypervisor: 0.815 -x86: 0.815 -kernel: 0.809 -VMM: 0.801 -ppc: 0.792 -i386: 0.782 -mistranslation: 0.781 -peripherals: 0.770 -network: 0.769 - -[Qemu-devel] [BUG] qemu stuck when detach host-usb device - -Description of problem: -The guest has a host-usb device(Kingston Technology DataTraveler 100 G3/G4/SE9 -G2), which is attached -to xhci controller(on host). Qemu will stuck if I detach it from guest. - -How reproducible: -100% - -Steps to Reproduce: -1. Use usb stick to copy files in guest , make it busy working. -2. virsh detach-device vm_name usb.xml - -Then qemu will stuck for 20s, I found this is because libusb_release_interface -block for 20s. -Dmesg prints: - -[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed. -[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed. -[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed. -[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed. - -Is this a hardware error or software's bug? - -On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote: -> -Description of problem: -> -The guest has a host-usb device(Kingston Technology DataTraveler 100 -> -G3/G4/SE9 G2), which is attached -> -to xhci controller(on host). Qemu will stuck if I detach it from guest. -> -> -How reproducible: -> -100% -> -> -Steps to Reproduce: -> -1. Use usb stick to copy files in guest , make it busy working. -> -2. virsh detach-device vm_name usb.xml -> -> -Then qemu will stuck for 20s, I found this is because -> -libusb_release_interface block for 20s. -> -Dmesg prints: -> -> -[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed. -> -[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed. -> -[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed. -> -[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed. -> -> -Is this a hardware error or software's bug? -I'd guess software error, could be is libusb or (host) linux kernel. -Cc'ing libusb-devel. - -cheers, - Gerd - -> ------Original Message----- -> -From: Gerd Hoffmann [ -mailto:address@hidden -> -Sent: Tuesday, November 27, 2018 2:09 PM -> -To: linzhecheng <address@hidden> -> -Cc: address@hidden; wangxin (U) <address@hidden>; -> -Zhoujian (jay) <address@hidden>; address@hidden -> -Subject: Re: [Qemu-devel] [BUG] qemu stuck when detach host-usb device -> -> -On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote: -> -> Description of problem: -> -> The guest has a host-usb device(Kingston Technology DataTraveler 100 -> -> G3/G4/SE9 G2), which is attached to xhci controller(on host). Qemu will -> -> stuck -> -if I detach it from guest. -> -> -> -> How reproducible: -> -> 100% -> -> -> -> Steps to Reproduce: -> -> 1. Use usb stick to copy files in guest , make it busy working. -> -> 2. virsh detach-device vm_name usb.xml -> -> -> -> Then qemu will stuck for 20s, I found this is because -> -> libusb_release_interface -> -block for 20s. -> -> Dmesg prints: -> -> -> -> [35442.034861] usb 4-2.1: Disable of device-initiated U1 failed. -> -> [35447.034993] usb 4-2.1: Disable of device-initiated U2 failed. -> -> [35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed. -> -> [35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed. -> -> -> -> Is this a hardware error or software's bug? -> -> -I'd guess software error, could be is libusb or (host) linux kernel. -> -Cc'ing libusb-devel. -Perhaps it's usb driver's bug. Could you also reproduce it? -> -> -cheers, -> -Gerd - diff --git a/results/classifier/015/unknown/32484936 b/results/classifier/015/unknown/32484936 deleted file mode 100644 index 5eb961a23..000000000 --- a/results/classifier/015/unknown/32484936 +++ /dev/null @@ -1,250 +0,0 @@ -PID: 0.839 -i386: 0.838 -x86: 0.836 -assembly: 0.835 -semantic: 0.832 -register: 0.831 -vnc: 0.830 -device: 0.830 -alpha: 0.829 -socket: 0.829 -kernel: 0.828 -permissions: 0.826 -debug: 0.825 -arm: 0.824 -virtual: 0.822 -hypervisor: 0.821 -architecture: 0.821 -files: 0.816 -performance: 0.815 -peripherals: 0.815 -graphic: 0.813 -risc-v: 0.813 -ppc: 0.813 -network: 0.811 -VMM: 0.810 -boot: 0.810 -TCG: 0.809 -operating system: 0.805 -user-level: 0.796 -mistranslation: 0.794 -KVM: 0.793 - -[Qemu-devel] [Snapshot Bug?]Qcow2 meta data corruption - -Hi all, -There was a problem about qcow2 image file happened in my serval vms and I could not figure it out, -so have to ask for some help. -Here is the thing: -At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms. -parts of check report: -3-Leaked cluster 2926229 refcount=1 reference=0 -4-Leaked cluster 3021181 refcount=1 reference=0 -5-Leaked cluster 3021182 refcount=1 reference=0 -6-Leaked cluster 3021183 refcount=1 reference=0 -7-Leaked cluster 3021184 refcount=1 reference=0 -8-ERROR cluster 3102547 refcount=3 reference=4 -9-ERROR cluster 3111536 refcount=3 reference=4 -10-ERROR cluster 3113369 refcount=3 reference=4 -11-ERROR cluster 3235590 refcount=10 reference=11 -12-ERROR cluster 3235591 refcount=10 reference=11 -423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts. -424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts. -426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts. -After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms. -Like this: -table entry conflict (with our qcow2 check -tool): -a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7 -b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7 -table entry conflict : -a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19 -b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19 -table entry conflict : -a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18 -b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18 -table entry conflict : -a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17 -b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17 -table entry conflict : -a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16 -b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16 -I think the problem is relate to the snapshot create, delete. But I cant reproduce it . -Can Anyone give a hint about how this happen? -Qemu version 2.0.1, I download the source code and make install it. -Qemu parameters: -/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -Thanks -Sangfor VT. -leijian - -Hi all, -There was a problem about qcow2 image file happened in my serval vms and I could not figure it out, -so have to ask for some help. -Here is the thing: -At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms. -parts of check report: -3-Leaked cluster 2926229 refcount=1 reference=0 -4-Leaked cluster 3021181 refcount=1 reference=0 -5-Leaked cluster 3021182 refcount=1 reference=0 -6-Leaked cluster 3021183 refcount=1 reference=0 -7-Leaked cluster 3021184 refcount=1 reference=0 -8-ERROR cluster 3102547 refcount=3 reference=4 -9-ERROR cluster 3111536 refcount=3 reference=4 -10-ERROR cluster 3113369 refcount=3 reference=4 -11-ERROR cluster 3235590 refcount=10 reference=11 -12-ERROR cluster 3235591 refcount=10 reference=11 -423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts. -424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts. -426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. -430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts. -After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms. -Like this: -table entry conflict (with our qcow2 check -tool): -a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7 -b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7 -table entry conflict : -a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19 -b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19 -table entry conflict : -a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18 -b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18 -table entry conflict : -a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17 -b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17 -table entry conflict : -a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16 -b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16 -I think the problem is relate to the snapshot create, delete. But I cant reproduce it . -Can Anyone give a hint about how this happen? -Qemu version 2.0.1, I download the source code and make install it. -Qemu parameters: -/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -Thanks -Sangfor VT. -leijian - -Am 03.04.2015 um 12:04 hat leijian geschrieben: -> -Hi all, -> -> -There was a problem about qcow2 image file happened in my serval vms and I -> -could not figure it out, -> -so have to ask for some help. -> -[...] -> -I think the problem is relate to the snapshot create, delete. But I cant -> -reproduce it . -> -Can Anyone give a hint about how this happen? -How did you create/delete your snapshots? - -More specifically, did you take care to never access your image from -more than one process (except if both are read-only)? It happens -occasionally that people use 'qemu-img snapshot' while the VM is -running. This is wrong and can corrupt the image. - -Kevin - -On 04/07/2015 03:33 AM, Kevin Wolf wrote: -> -More specifically, did you take care to never access your image from -> -more than one process (except if both are read-only)? It happens -> -occasionally that people use 'qemu-img snapshot' while the VM is -> -running. This is wrong and can corrupt the image. -Since this has been done by more than one person, I'm wondering if there -is something we can do in the qcow2 format itself to make it harder for -the casual user to cause corruption. Maybe if we declare some bit or -extension header for an image open for writing, which other readers can -use as a warning ("this image is being actively modified; reading it may -fail"), and other writers can use to deny access ("another process is -already modifying this image"), where a writer should set that bit -before writing anything else in the file, then clear it on exit. Of -course, you'd need a way to override the bit to actively clear it to -recover from the case of a writer dying unexpectedly without resetting -it normally. And it won't help the case of a reader opening the file -first, followed by a writer, where the reader could still get thrown off -track. - -Or maybe we could document in the qcow2 format that all readers and -writers should attempt to obtain the appropriate flock() permissions [or -other appropriate advisory locking scheme] over the file header, so that -cooperating processes that both use advisory locking will know when the -file is in use by another process. - --- -Eric Blake eblake redhat com +1-919-301-3266 -Libvirt virtualization library -http://libvirt.org -signature.asc -Description: -OpenPGP digital signature - - -I created/deleted the snapshot by using qmp command "snapshot_blkdev_internal"/"snapshot_delete_blkdev_internal", and for avoiding the case you mentioned above, I have added the flock() permission in the qemu_open(). -Here is the test of doing qemu-img snapshot to a running vm: -Diskfile:/sf/data/36c81f660e38b3b001b183da50b477d89_f8bc123b3e74/images/host-f8bc123b3e74/4a8d8728fcdc/Devried30030.vm/vm-disk-1.qcow2 is used! errno=Resource temporarily unavailable -Does the two cluster entry happen to be the same because of the refcount of using cluster decrease to 0 unexpectedly and is allocated again? -If it was not accessing the image from more than one process, any other exceptions I can test for? -Thanks -leijian -From: -Eric Blake -Date: -2015-04-07 23:27 -To: -Kevin Wolf -; -leijian -CC: -qemu-devel -; -stefanha -Subject: -Re: [Qemu-devel] [Snapshot Bug?]Qcow2 meta data -corruption -On 04/07/2015 03:33 AM, Kevin Wolf wrote: -> More specifically, did you take care to never access your image from -> more than one process (except if both are read-only)? It happens -> occasionally that people use 'qemu-img snapshot' while the VM is -> running. This is wrong and can corrupt the image. -Since this has been done by more than one person, I'm wondering if there -is something we can do in the qcow2 format itself to make it harder for -the casual user to cause corruption. Maybe if we declare some bit or -extension header for an image open for writing, which other readers can -use as a warning ("this image is being actively modified; reading it may -fail"), and other writers can use to deny access ("another process is -already modifying this image"), where a writer should set that bit -before writing anything else in the file, then clear it on exit. Of -course, you'd need a way to override the bit to actively clear it to -recover from the case of a writer dying unexpectedly without resetting -it normally. And it won't help the case of a reader opening the file -first, followed by a writer, where the reader could still get thrown off -track. -Or maybe we could document in the qcow2 format that all readers and -writers should attempt to obtain the appropriate flock() permissions [or -other appropriate advisory locking scheme] over the file header, so that -cooperating processes that both use advisory locking will know when the -file is in use by another process. --- -Eric Blake eblake redhat com +1-919-301-3266 -Libvirt virtualization library http://libvirt.org - diff --git a/results/classifier/015/unknown/57756589 b/results/classifier/015/unknown/57756589 deleted file mode 100644 index 5891931d8..000000000 --- a/results/classifier/015/unknown/57756589 +++ /dev/null @@ -1,1448 +0,0 @@ -peripherals: 0.875 -hypervisor: 0.863 -mistranslation: 0.861 -register: 0.858 -architecture: 0.856 -device: 0.853 -vnc: 0.851 -virtual: 0.845 -permissions: 0.842 -assembly: 0.841 -performance: 0.839 -ppc: 0.838 -semantic: 0.835 -operating system: 0.835 -TCG: 0.833 -VMM: 0.833 -arm: 0.828 -boot: 0.827 -user-level: 0.826 -graphic: 0.824 -network: 0.822 -socket: 0.820 -PID: 0.819 -KVM: 0.817 -kernel: 0.817 -files: 0.816 -x86: 0.814 -alpha: 0.810 -debug: 0.803 -i386: 0.782 -risc-v: 0.755 - -[Qemu-devel] 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang - -amost like wikiï¼but panic in Primary Node. - - - - -setp: - -1 - -Primary Node. - -x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio --vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb --usbdevice tablet\ - - -drive -if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1, - - -children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=qcow2 - -S \ - - -netdev -tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \ - - -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \ - - -netdev -tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ - - -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66 \ - - -chardev socket,id=mirror0,host=9.61.1.8,port=9003,server,nowait -chardev -socket,id=compare1,host=9.61.1.8,port=9004,server,nowait \ - - -chardev socket,id=compare0,host=9.61.1.8,port=9001,server,nowait -chardev -socket,id=compare0-0,host=9.61.1.8,port=9001 \ - - -chardev socket,id=compare_out,host=9.61.1.8,port=9005,server,nowait \ - - -chardev socket,id=compare_out0,host=9.61.1.8,port=9005 \ - - -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \ - - -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out --object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \ - - -object -colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0 - -2 Second node: - -x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 --name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb --usbdevice tablet\ - - -drive -if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=qcow2,node-name=node0 - \ - - -drive -if=virtio,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,top-id=active-disk0,file.file.filename=/mnt/ramfstest/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfstest/hidden_disk.img,file.backing.backing=colo-disk0 - \ - - -netdev -tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \ - - -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \ - - -netdev -tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ - - -device e1000,netdev=hn0,mac=52:a4:00:12:78:66 -chardev -socket,id=red0,host=9.61.1.8,port=9003 \ - - -chardev socket,id=red1,host=9.61.1.8,port=9004 \ - - -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \ - - -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \ - - -object filter-rewriter,id=rew0,netdev=hn0,queue=all -incoming tcp:0:8888 - -3 Secondary node: - -{'execute':'qmp_capabilities'} - -{ 'execute': 'nbd-server-start', - - 'arguments': {'addr': {'type': 'inet', 'data': {'host': '9.61.1.7', 'port': -'8889'} } } - -} - -{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': -true } } - -4:Primary Nodeï¼ - -{'execute':'qmp_capabilities'} - - -{ 'execute': 'human-monitor-command', - - 'arguments': {'command-line': 'drive_add -n buddy -driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0'}} - -{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': -'node0' } } - -{ 'execute': 'migrate-set-capabilities', - - 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } -] } } - -{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:9.61.1.7:8888' } } - - - - -then can see two runing VMs, whenever you make changes to PVM, SVM will be -synced. - - - - -5ï¼Primary Nodeï¼ - -echo c ï¼ /proc/sysrq-trigger - - - - -ï¼ï¼Secondary node: - -{ 'execute': 'nbd-server-stop' } - -{ "execute": "x-colo-lost-heartbeat" } - - - - -then can see the Secondary node hang at recvmsg recvmsg . - - - - - - - - - - - - -åå§é®ä»¶ - - - -åä»¶äººï¼ address@hidden -æ¶ä»¶äººï¼ç广10165992 address@hidden -æéäººï¼ address@hidden address@hidden -æ¥ æ ï¼2017å¹´03æ21æ¥ 16:27 -主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang - - - - - -Hi, - -On 2017/3/21 16:10, address@hidden wrote: -ï¼ Thank youã -ï¼ -ï¼ I have test areadyã -ï¼ -ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã -ï¼ -ï¼ Incorrding -http://wiki.qemu-project.org/Features/COLO -ï¼kill Primary Node qemu -will not produce the problem,but Primary Node panic canã -ï¼ -ï¼ I think due to the feature of channel does not support -QIO_CHANNEL_FEATURE_SHUTDOWN. -ï¼ -ï¼ - -Yes, you are right, when we do failover for primary/secondary VM, we will -shutdown the related -fd in case it is stuck in the read/write fd. - -It seems that you didn't follow the above introduction exactly to do the test. -Could you -share your test procedures ? Especially the commands used in the test. - -Thanks, -Hailiang - -ï¼ when failover,channel_shutdown could not shut down the channel. -ï¼ -ï¼ -ï¼ so the colo_process_incoming_thread will hang at recvmsg. -ï¼ -ï¼ -ï¼ I test a patch: -ï¼ -ï¼ -ï¼ diff --git a/migration/socket.c b/migration/socket.c -ï¼ -ï¼ -ï¼ index 13966f1..d65a0ea 100644 -ï¼ -ï¼ -ï¼ --- a/migration/socket.c -ï¼ -ï¼ -ï¼ +++ b/migration/socket.c -ï¼ -ï¼ -ï¼ @@ -147,8 +147,9 @@ static gboolean -socket_accept_incoming_migration(QIOChannel *ioc, -ï¼ -ï¼ -ï¼ } -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ trace_migration_socket_incoming_accepted() -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") -ï¼ -ï¼ -ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) -ï¼ -ï¼ -ï¼ migration_channel_process_incoming(migrate_get_current(), -ï¼ -ï¼ -ï¼ QIO_CHANNEL(sioc)) -ï¼ -ï¼ -ï¼ object_unref(OBJECT(sioc)) -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ My test will not hang any more. -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ åå§é®ä»¶ -ï¼ -ï¼ -ï¼ -ï¼ åä»¶äººï¼ address@hidden -ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden -ï¼ æéäººï¼ address@hidden address@hidden -ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 -ï¼ ä¸» é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ Hi,Wang. -ï¼ -ï¼ You can test this branch: -ï¼ -ï¼ -https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk -ï¼ -ï¼ and please follow wiki ensure your own configuration correctly. -ï¼ -ï¼ -http://wiki.qemu-project.org/Features/COLO -ï¼ -ï¼ -ï¼ Thanks -ï¼ -ï¼ Zhang Chen -ï¼ -ï¼ -ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: -ï¼ ï¼ -ï¼ ï¼ hi. -ï¼ ï¼ -ï¼ ï¼ I test the git qemu master have the same problem. -ï¼ ï¼ -ï¼ ï¼ (gdb) bt -ï¼ ï¼ -ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, -ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 -ï¼ ï¼ -ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read -ï¼ ï¼ (address@hidden, address@hidden "", -ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 -ï¼ ï¼ -ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ ï¼ -ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at -ï¼ ï¼ migration/qemu-file.c:295 -ï¼ ï¼ -ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, -ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 -ï¼ ï¼ -ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at -ï¼ ï¼ migration/qemu-file.c:568 -ï¼ ï¼ -ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at -ï¼ ï¼ migration/qemu-file.c:648 -ï¼ ï¼ -ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, -ï¼ ï¼ address@hidden) at migration/colo.c:244 -ï¼ ï¼ -ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized -ï¼ ï¼ outï¼, address@hidden, -ï¼ ï¼ address@hidden) -ï¼ ï¼ -ï¼ ï¼ at migration/colo.c:264 -ï¼ ï¼ -ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread -ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 -ï¼ ï¼ -ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ ï¼ -ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 -ï¼ ï¼ -ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ ï¼ -ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" -ï¼ ï¼ -ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN -ï¼ ï¼ -ï¼ ï¼ $3 = 0 -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ (gdb) bt -ï¼ ï¼ -ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, -ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 -ï¼ ï¼ -ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at -ï¼ ï¼ gmain.c:3054 -ï¼ ï¼ -ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, -ï¼ ï¼ address@hidden) at gmain.c:3630 -ï¼ ï¼ -ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 -ï¼ ï¼ -ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at -ï¼ ï¼ util/main-loop.c:258 -ï¼ ï¼ -ï¼ ï¼ #5 main_loop_wait (address@hidden) at -ï¼ ï¼ util/main-loop.c:506 -ï¼ ï¼ -ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 -ï¼ ï¼ -ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized -ï¼ ï¼ outï¼) at vl.c:4709 -ï¼ ï¼ -ï¼ ï¼ (gdb) p ioc-ï¼features -ï¼ ï¼ -ï¼ ï¼ $1 = 6 -ï¼ ï¼ -ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ ï¼ -ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ May be socket_accept_incoming_migration should -ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ thank you. -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ åå§é®ä»¶ -ï¼ ï¼ address@hidden -ï¼ ï¼ address@hidden -ï¼ ï¼ address@hidden@huawei.comï¼ -ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 -ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: -ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU -ï¼ ï¼ ï¼ Wiki]( -http://wiki.qemu-project.org/Features/COLO -). -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. -ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. -ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": -ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's -ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. -ï¼ ï¼ ï¼ Do the colo have any plan for development? -ï¼ ï¼ -ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. -ï¼ ï¼ -ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! -ï¼ ï¼ -ï¼ ï¼ In our internal version can run it successfully, -ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. -ï¼ ï¼ Next time if you have some question about COLO, -ï¼ ï¼ please cc me and zhanghailiang address@hidden -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ Thanks -ï¼ ï¼ Zhang Chen -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 -ï¼ ï¼ ï¼ (gdb) bt -ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 -ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, -ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at -ï¼ ï¼ ï¼ io/channel-socket.c:497 -ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, -ï¼ ï¼ ï¼ address@hidden "", address@hidden, -ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 -ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at -ï¼ ï¼ ï¼ migration/qemu-file.c:257 -ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, -ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 -ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at -ï¼ ï¼ ï¼ migration/qemu-file.c:523 -ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at -ï¼ ï¼ ï¼ migration/qemu-file.c:603 -ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, -ï¼ ï¼ ï¼ address@hidden) at migration/colo..c:215 -ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, -ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at -ï¼ ï¼ ï¼ migration/colo.c:546 -ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at -ï¼ ï¼ ï¼ migration/colo.c:649 -ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -- -ï¼ ï¼ ï¼ View this message in context: -http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html -ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ -- -ï¼ ï¼ Thanks -ï¼ ï¼ Zhang Chen -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ -ï¼ ï¼ -ï¼ - -diff --git a/migration/socket.c b/migration/socket.c - - -index 13966f1..d65a0ea 100644 - - ---- a/migration/socket.c - - -+++ b/migration/socket.c - - -@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel -*ioc, - - - } - - - - - - trace_migration_socket_incoming_accepted() - - - - - - qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") - - -+ qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) - - - migration_channel_process_incoming(migrate_get_current(), - - - QIO_CHANNEL(sioc)) - - - object_unref(OBJECT(sioc)) - - - - -Is this patch ok? - -I have test it . The test could not hang any more. - - - - - - - - - - - - -åå§é®ä»¶ - - - -åä»¶äººï¼ address@hidden -æ¶ä»¶äººï¼ address@hidden address@hidden -æéäººï¼ address@hidden address@hidden address@hidden -æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11 -主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang - - - - - -On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: -ï¼ * Hailiang Zhang (address@hidden) wrote: -ï¼ï¼ Hi, -ï¼ï¼ -ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug. -ï¼ï¼ -ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in -ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do failover, -ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration) -ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if -ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN). -ï¼ï¼ -ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden -ï¼ï¼ -ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write() -ï¼ï¼ if we tried to cancel migration. -ï¼ï¼ -ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, -Error **errp) -ï¼ï¼ { -ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") -ï¼ï¼ migration_channel_connect(s, ioc, NULL) -ï¼ï¼ ... ... -ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN) above, -ï¼ï¼ and the -ï¼ï¼ migrate_fd_cancel() -ï¼ï¼ { -ï¼ï¼ ... ... -ï¼ï¼ if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) { -ï¼ï¼ qemu_file_shutdown(f) --ï¼ This will not take effect. No ? -ï¼ï¼ } -ï¼ï¼ } -ï¼ -ï¼ (cc'd in Daniel Berrange). -ï¼ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) -at the -ï¼ top of qio_channel_socket_new so I think that's safe isn't it? -ï¼ - -Hmm, you are right, this problem is only exist for the migration incoming fd, -thanks. - -ï¼ Dave -ï¼ -ï¼ï¼ Thanks, -ï¼ï¼ Hailiang -ï¼ï¼ -ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote: -ï¼ï¼ï¼ Thank youã -ï¼ï¼ï¼ -ï¼ï¼ï¼ I have test areadyã -ï¼ï¼ï¼ -ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã -ï¼ï¼ï¼ -ï¼ï¼ï¼ Incorrding -http://wiki.qemu-project.org/Features/COLO -ï¼kill Primary Node -qemu will not produce the problem,but Primary Node panic canã -ï¼ï¼ï¼ -ï¼ï¼ï¼ I think due to the feature of channel does not support -QIO_CHANNEL_FEATURE_SHUTDOWN. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ I test a patch: -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ index 13966f1..d65a0ea 100644 -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ --- a/migration/socket.c -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ +++ b/migration/socket.c -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean -socket_accept_incoming_migration(QIOChannel *ioc, -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ } -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ trace_migration_socket_incoming_accepted() -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN) -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ migration_channel_process_incoming(migrate_get_current(), -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ QIO_CHANNEL(sioc)) -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ object_unref(OBJECT(sioc)) -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ My test will not hang any more. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ åå§é®ä»¶ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ åä»¶äººï¼ address@hidden -ï¼ï¼ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden -ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden -ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 -ï¼ï¼ï¼ 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ Hi,Wang. -ï¼ï¼ï¼ -ï¼ï¼ï¼ You can test this branch: -ï¼ï¼ï¼ -ï¼ï¼ï¼ -https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk -ï¼ï¼ï¼ -ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -http://wiki.qemu-project.org/Features/COLO -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ Thanks -ï¼ï¼ï¼ -ï¼ï¼ï¼ Zhang Chen -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ hi. -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem. -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) bt -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, -ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read -ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "", -ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at -ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, -ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at -ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at -ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, -ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized -ï¼ï¼ï¼ ï¼ outï¼, address@hidden, -ï¼ï¼ï¼ ï¼ address@hidden) -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ at migration/colo.c:264 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread -ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ $3 = 0 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) bt -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, -ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at -ï¼ï¼ï¼ ï¼ gmain.c:3054 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, -ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at -ï¼ï¼ï¼ ï¼ util/main-loop.c:258 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #5 main_loop_wait (address@hidden) at -ï¼ï¼ï¼ ï¼ util/main-loop.c:506 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized -ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ $1 = 6 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should -ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ thank you. -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ åå§é®ä»¶ -ï¼ï¼ï¼ ï¼ address@hidden -ï¼ï¼ï¼ ï¼ address@hidden -ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼ -ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 -ï¼ï¼ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: -ï¼ï¼ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU -ï¼ï¼ï¼ ï¼ ï¼ Wiki]( -http://wiki.qemu-project.org/Features/COLO -). -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. -ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. -ï¼ï¼ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": -ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's -ï¼ï¼ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. -ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development? -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ In our internal version can run it successfully, -ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. -ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO, -ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ Thanks -ï¼ï¼ï¼ ï¼ Zhang Chen -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 -ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt -ï¼ï¼ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 -ï¼ï¼ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, -ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) -at -ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497 -ï¼ï¼ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, -ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden, -ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 -ï¼ï¼ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ï¼ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at -ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257 -ï¼ï¼ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, -ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 -ï¼ï¼ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at -ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523 -ï¼ï¼ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at -ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603 -ï¼ï¼ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, -ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215 -ï¼ï¼ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, -ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at -ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546 -ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at -ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649 -ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6 -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -- -ï¼ï¼ï¼ ï¼ ï¼ View this message in context: -http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html -ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -- -ï¼ï¼ï¼ ï¼ Thanks -ï¼ï¼ï¼ ï¼ Zhang Chen -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ -ï¼ï¼ -ï¼ -- -ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK -ï¼ -ï¼ . -ï¼ - -Hi, - -On 2017/3/22 9:42, address@hidden wrote: -diff --git a/migration/socket.c b/migration/socket.c - - -index 13966f1..d65a0ea 100644 - - ---- a/migration/socket.c - - -+++ b/migration/socket.c - - -@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel -*ioc, - - - } - - - - - - trace_migration_socket_incoming_accepted() - - - - - - qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") - - -+ qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) - - - migration_channel_process_incoming(migrate_get_current(), - - - QIO_CHANNEL(sioc)) - - - object_unref(OBJECT(sioc)) - - - - -Is this patch ok? -Yes, i think this works, but a better way maybe to call -qio_channel_set_feature() -in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the -socket accept fd, -Or fix it by this: - -diff --git a/io/channel-socket.c b/io/channel-socket.c -index f546c68..ce6894c 100644 ---- a/io/channel-socket.c -+++ b/io/channel-socket.c -@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, - Error **errp) - { - QIOChannelSocket *cioc; -- -- cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); -- cioc->fd = -1; -+ -+ cioc = qio_channel_socket_new(); - cioc->remoteAddrLen = sizeof(ioc->remoteAddr); - cioc->localAddrLen = sizeof(ioc->localAddr); - - -Thanks, -Hailiang -I have test it . The test could not hang any more. - - - - - - - - - - - - -åå§é®ä»¶ - - - -åä»¶äººï¼ address@hidden -æ¶ä»¶äººï¼ address@hidden address@hidden -æéäººï¼ address@hidden address@hidden address@hidden -æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11 -主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang - - - - - -On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: -ï¼ * Hailiang Zhang (address@hidden) wrote: -ï¼ï¼ Hi, -ï¼ï¼ -ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug. -ï¼ï¼ -ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in -ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do failover, -ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration) -ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if -ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN). -ï¼ï¼ -ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden -ï¼ï¼ -ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write() -ï¼ï¼ if we tried to cancel migration. -ï¼ï¼ -ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, -Error **errp) -ï¼ï¼ { -ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") -ï¼ï¼ migration_channel_connect(s, ioc, NULL) -ï¼ï¼ ... ... -ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN) above, -ï¼ï¼ and the -ï¼ï¼ migrate_fd_cancel() -ï¼ï¼ { -ï¼ï¼ ... ... -ï¼ï¼ if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) { -ï¼ï¼ qemu_file_shutdown(f) --ï¼ This will not take effect. No ? -ï¼ï¼ } -ï¼ï¼ } -ï¼ -ï¼ (cc'd in Daniel Berrange). -ï¼ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) -at the -ï¼ top of qio_channel_socket_new so I think that's safe isn't it? -ï¼ - -Hmm, you are right, this problem is only exist for the migration incoming fd, -thanks. - -ï¼ Dave -ï¼ -ï¼ï¼ Thanks, -ï¼ï¼ Hailiang -ï¼ï¼ -ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote: -ï¼ï¼ï¼ Thank youã -ï¼ï¼ï¼ -ï¼ï¼ï¼ I have test areadyã -ï¼ï¼ï¼ -ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã -ï¼ï¼ï¼ -ï¼ï¼ï¼ Incorrding -http://wiki.qemu-project.org/Features/COLO -ï¼kill Primary Node -qemu will not produce the problem,but Primary Node panic canã -ï¼ï¼ï¼ -ï¼ï¼ï¼ I think due to the feature of channel does not support -QIO_CHANNEL_FEATURE_SHUTDOWN. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ I test a patch: -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ index 13966f1..d65a0ea 100644 -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ --- a/migration/socket.c -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ +++ b/migration/socket.c -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean -socket_accept_incoming_migration(QIOChannel *ioc, -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ } -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ trace_migration_socket_incoming_accepted() -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN) -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ migration_channel_process_incoming(migrate_get_current(), -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ QIO_CHANNEL(sioc)) -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ object_unref(OBJECT(sioc)) -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ My test will not hang any more. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ åå§é®ä»¶ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ åä»¶äººï¼ address@hidden -ï¼ï¼ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden -ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden -ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 -ï¼ï¼ï¼ 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ Hi,Wang. -ï¼ï¼ï¼ -ï¼ï¼ï¼ You can test this branch: -ï¼ï¼ï¼ -ï¼ï¼ï¼ -https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk -ï¼ï¼ï¼ -ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly. -ï¼ï¼ï¼ -ï¼ï¼ï¼ -http://wiki.qemu-project.org/Features/COLO -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ Thanks -ï¼ï¼ï¼ -ï¼ï¼ï¼ Zhang Chen -ï¼ï¼ï¼ -ï¼ï¼ï¼ -ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ hi. -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem. -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) bt -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, -ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read -ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "", -ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at -ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, -ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at -ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at -ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, -ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized -ï¼ï¼ï¼ ï¼ outï¼, address@hidden, -ï¼ï¼ï¼ ï¼ address@hidden) -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ at migration/colo.c:264 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread -ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ $3 = 0 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) bt -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, -ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at -ï¼ï¼ï¼ ï¼ gmain.c:3054 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, -ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at -ï¼ï¼ï¼ ï¼ util/main-loop.c:258 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #5 main_loop_wait (address@hidden) at -ï¼ï¼ï¼ ï¼ util/main-loop.c:506 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized -ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ $1 = 6 -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should -ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ thank you. -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ åå§é®ä»¶ -ï¼ï¼ï¼ ï¼ address@hidden -ï¼ï¼ï¼ ï¼ address@hidden -ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼ -ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 -ï¼ï¼ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: -ï¼ï¼ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU -ï¼ï¼ï¼ ï¼ ï¼ Wiki]( -http://wiki.qemu-project.org/Features/COLO -). -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. -ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. -ï¼ï¼ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": -ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's -ï¼ï¼ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. -ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development? -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ In our internal version can run it successfully, -ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. -ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO, -ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ Thanks -ï¼ï¼ï¼ ï¼ Zhang Chen -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 -ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt -ï¼ï¼ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 -ï¼ï¼ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, -ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) -at -ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497 -ï¼ï¼ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, -ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden, -ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 -ï¼ï¼ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ï¼ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at -ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257 -ï¼ï¼ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, -ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 -ï¼ï¼ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at -ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523 -ï¼ï¼ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at -ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603 -ï¼ï¼ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, -ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215 -ï¼ï¼ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, -ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at -ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546 -ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at -ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649 -ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6 -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -- -ï¼ï¼ï¼ ï¼ ï¼ View this message in context: -http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html -ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -- -ï¼ï¼ï¼ ï¼ Thanks -ï¼ï¼ï¼ ï¼ Zhang Chen -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ ï¼ -ï¼ï¼ï¼ -ï¼ï¼ -ï¼ -- -ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK -ï¼ -ï¼ . -ï¼ - diff --git a/results/classifier/015/unknown/70294255 b/results/classifier/015/unknown/70294255 deleted file mode 100644 index f8e359531..000000000 --- a/results/classifier/015/unknown/70294255 +++ /dev/null @@ -1,1088 +0,0 @@ -risc-v: 0.863 -mistranslation: 0.862 -assembly: 0.861 -PID: 0.859 -semantic: 0.858 -socket: 0.858 -device: 0.857 -user-level: 0.857 -graphic: 0.857 -arm: 0.856 -debug: 0.854 -permissions: 0.854 -architecture: 0.851 -performance: 0.850 -kernel: 0.848 -network: 0.846 -operating system: 0.844 -register: 0.842 -vnc: 0.837 -alpha: 0.834 -files: 0.832 -virtual: 0.832 -hypervisor: 0.828 -peripherals: 0.819 -boot: 0.811 -i386: 0.811 -KVM: 0.806 -x86: 0.803 -ppc: 0.800 -TCG: 0.792 -VMM: 0.784 - -[Qemu-devel] 答复: Re: 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang - -hi: - -yes.it is better. - -And should we delete - - - - -#ifdef WIN32 - - QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL) - -#endif - - - - -in qio_channel_socket_acceptï¼ - -qio_channel_socket_new already have it. - - - - - - - - - - - - -åå§é®ä»¶ - - - -åä»¶äººï¼ address@hidden -æ¶ä»¶äººï¼ç广10165992 -æéäººï¼ address@hidden address@hidden address@hidden address@hidden -æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03 -主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: çå¤: Re: [BUG]COLO failover hang - - - - - -Hi, - -On 2017/3/22 9:42, address@hidden wrote: -ï¼ diff --git a/migration/socket.c b/migration/socket.c -ï¼ -ï¼ -ï¼ index 13966f1..d65a0ea 100644 -ï¼ -ï¼ -ï¼ --- a/migration/socket.c -ï¼ -ï¼ -ï¼ +++ b/migration/socket.c -ï¼ -ï¼ -ï¼ @@ -147,8 +147,9 @@ static gboolean -socket_accept_incoming_migration(QIOChannel *ioc, -ï¼ -ï¼ -ï¼ } -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ trace_migration_socket_incoming_accepted() -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") -ï¼ -ï¼ -ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) -ï¼ -ï¼ -ï¼ migration_channel_process_incoming(migrate_get_current(), -ï¼ -ï¼ -ï¼ QIO_CHANNEL(sioc)) -ï¼ -ï¼ -ï¼ object_unref(OBJECT(sioc)) -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ Is this patch ok? -ï¼ - -Yes, i think this works, but a better way maybe to call -qio_channel_set_feature() -in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the -socket accept fd, -Or fix it by this: - -diff --git a/io/channel-socket.c b/io/channel-socket.c -index f546c68..ce6894c 100644 ---- a/io/channel-socket.c -+++ b/io/channel-socket.c -@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, - Error **errp) - { - QIOChannelSocket *cioc -- -- cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)) -- cioc-ï¼fd = -1 -+ -+ cioc = qio_channel_socket_new() - cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr) - cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr) - - -Thanks, -Hailiang - -ï¼ I have test it . The test could not hang any more. -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ åå§é®ä»¶ -ï¼ -ï¼ -ï¼ -ï¼ åä»¶äººï¼ address@hidden -ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden -ï¼ æéäººï¼ address@hidden address@hidden address@hidden -ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11 -ï¼ ä¸» é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: -ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote: -ï¼ ï¼ï¼ Hi, -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug. -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in -ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do -failover, -ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration) -ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if -ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN). -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write() -ï¼ ï¼ï¼ if we tried to cancel migration. -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, -Error **errp) -ï¼ ï¼ï¼ { -ï¼ ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") -ï¼ ï¼ï¼ migration_channel_connect(s, ioc, NULL) -ï¼ ï¼ï¼ ... ... -ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN) above, -ï¼ ï¼ï¼ and the -ï¼ ï¼ï¼ migrate_fd_cancel() -ï¼ ï¼ï¼ { -ï¼ ï¼ï¼ ... ... -ï¼ ï¼ï¼ if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) { -ï¼ ï¼ï¼ qemu_file_shutdown(f) --ï¼ This will not take effect. No ? -ï¼ ï¼ï¼ } -ï¼ ï¼ï¼ } -ï¼ ï¼ -ï¼ ï¼ (cc'd in Daniel Berrange). -ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, -QIO_CHANNEL_FEATURE_SHUTDOWN) at the -ï¼ ï¼ top of qio_channel_socket_new so I think that's safe isn't it? -ï¼ ï¼ -ï¼ -ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, -thanks. -ï¼ -ï¼ ï¼ Dave -ï¼ ï¼ -ï¼ ï¼ï¼ Thanks, -ï¼ ï¼ï¼ Hailiang -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote: -ï¼ ï¼ï¼ï¼ Thank youã -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ I have test areadyã -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same -placeã -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ Incorrding -http://wiki.qemu-project.org/Features/COLO -ï¼kill Primary Node -qemu will not produce the problem,but Primary Node panic canã -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support -QIO_CHANNEL_FEATURE_SHUTDOWN. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ I test a patch: -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644 -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ --- a/migration/socket.c -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean -socket_accept_incoming_migration(QIOChannel *ioc, -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ } -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ trace_migration_socket_incoming_accepted() -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), -"migration-socket-incoming") -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN) -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ migration_channel_process_incoming(migrate_get_current(), -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ QIO_CHANNEL(sioc)) -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ object_unref(OBJECT(sioc)) -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ My test will not hang any more. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ åå§é®ä»¶ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden -ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden -ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden -ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 -ï¼ ï¼ï¼ï¼ 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ Hi,Wang. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ You can test this branch: -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -http://wiki.qemu-project.org/Features/COLO -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ Thanks -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ Zhang Chen -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ hi. -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem. -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, -ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read -ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "", -ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at -ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, -ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized -ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ address@hidden) -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ at migration/colo.c:264 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread -ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ $3 = 0 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, -ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at -ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, -ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at -ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #5 main_loop_wait (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized -ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ $1 = 6 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should -ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ thank you. -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶ -ï¼ ï¼ï¼ï¼ ï¼ address@hidden -ï¼ ï¼ï¼ï¼ ï¼ address@hidden -ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼ -ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 -ï¼ ï¼ï¼ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: -ï¼ ï¼ï¼ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU -ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki]( -http://wiki.qemu-project.org/Features/COLO -). -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. -ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. -ï¼ ï¼ï¼ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": -ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's -ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. -ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development? -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully, -ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. -ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO, -ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ Thanks -ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized -outï¼, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, -errp=0x0) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message -(errp=0x7f3d62bfaa48, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -- -ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context: -http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html -ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -- -ï¼ ï¼ï¼ï¼ ï¼ Thanks -ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ -ï¼ ï¼ -- -ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK -ï¼ ï¼ -ï¼ ï¼ . -ï¼ ï¼ -ï¼ - -On 2017/3/22 16:09, address@hidden wrote: -hi: - -yes.it is better. - -And should we delete -Yes, you are right. -#ifdef WIN32 - - QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL) - -#endif - - - - -in qio_channel_socket_acceptï¼ - -qio_channel_socket_new already have it. - - - - - - - - - - - - -åå§é®ä»¶ - - - -åä»¶äººï¼ address@hidden -æ¶ä»¶äººï¼ç广10165992 -æéäººï¼ address@hidden address@hidden address@hidden address@hidden -æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03 -主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: çå¤: Re: [BUG]COLO failover hang - - - - - -Hi, - -On 2017/3/22 9:42, address@hidden wrote: -ï¼ diff --git a/migration/socket.c b/migration/socket.c -ï¼ -ï¼ -ï¼ index 13966f1..d65a0ea 100644 -ï¼ -ï¼ -ï¼ --- a/migration/socket.c -ï¼ -ï¼ -ï¼ +++ b/migration/socket.c -ï¼ -ï¼ -ï¼ @@ -147,8 +147,9 @@ static gboolean -socket_accept_incoming_migration(QIOChannel *ioc, -ï¼ -ï¼ -ï¼ } -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ trace_migration_socket_incoming_accepted() -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") -ï¼ -ï¼ -ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) -ï¼ -ï¼ -ï¼ migration_channel_process_incoming(migrate_get_current(), -ï¼ -ï¼ -ï¼ QIO_CHANNEL(sioc)) -ï¼ -ï¼ -ï¼ object_unref(OBJECT(sioc)) -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ Is this patch ok? -ï¼ - -Yes, i think this works, but a better way maybe to call -qio_channel_set_feature() -in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the -socket accept fd, -Or fix it by this: - -diff --git a/io/channel-socket.c b/io/channel-socket.c -index f546c68..ce6894c 100644 ---- a/io/channel-socket.c -+++ b/io/channel-socket.c -@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, - Error **errp) - { - QIOChannelSocket *cioc -- -- cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)) -- cioc-ï¼fd = -1 -+ -+ cioc = qio_channel_socket_new() - cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr) - cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr) - - -Thanks, -Hailiang - -ï¼ I have test it . The test could not hang any more. -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ åå§é®ä»¶ -ï¼ -ï¼ -ï¼ -ï¼ åä»¶äººï¼ address@hidden -ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden -ï¼ æéäººï¼ address@hidden address@hidden address@hidden -ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11 -ï¼ ä¸» é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ -ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: -ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote: -ï¼ ï¼ï¼ Hi, -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug. -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in -ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do -failover, -ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration) -ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if -ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN). -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write() -ï¼ ï¼ï¼ if we tried to cancel migration. -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, -Error **errp) -ï¼ ï¼ï¼ { -ï¼ ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") -ï¼ ï¼ï¼ migration_channel_connect(s, ioc, NULL) -ï¼ ï¼ï¼ ... ... -ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN) above, -ï¼ ï¼ï¼ and the -ï¼ ï¼ï¼ migrate_fd_cancel() -ï¼ ï¼ï¼ { -ï¼ ï¼ï¼ ... ... -ï¼ ï¼ï¼ if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) { -ï¼ ï¼ï¼ qemu_file_shutdown(f) --ï¼ This will not take effect. No ? -ï¼ ï¼ï¼ } -ï¼ ï¼ï¼ } -ï¼ ï¼ -ï¼ ï¼ (cc'd in Daniel Berrange). -ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, -QIO_CHANNEL_FEATURE_SHUTDOWN) at the -ï¼ ï¼ top of qio_channel_socket_new so I think that's safe isn't it? -ï¼ ï¼ -ï¼ -ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, -thanks. -ï¼ -ï¼ ï¼ Dave -ï¼ ï¼ -ï¼ ï¼ï¼ Thanks, -ï¼ ï¼ï¼ Hailiang -ï¼ ï¼ï¼ -ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote: -ï¼ ï¼ï¼ï¼ Thank youã -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ I have test areadyã -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same -placeã -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ Incorrding -http://wiki.qemu-project.org/Features/COLO -ï¼kill Primary Node -qemu will not produce the problem,but Primary Node panic canã -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support -QIO_CHANNEL_FEATURE_SHUTDOWN. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ I test a patch: -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644 -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ --- a/migration/socket.c -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean -socket_accept_incoming_migration(QIOChannel *ioc, -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ } -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ trace_migration_socket_incoming_accepted() -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), -"migration-socket-incoming") -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), -QIO_CHANNEL_FEATURE_SHUTDOWN) -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ migration_channel_process_incoming(migrate_get_current(), -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ QIO_CHANNEL(sioc)) -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ object_unref(OBJECT(sioc)) -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ My test will not hang any more. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ åå§é®ä»¶ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden -ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden -ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden -ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 -ï¼ ï¼ï¼ï¼ 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ Hi,Wang. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ You can test this branch: -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly. -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -http://wiki.qemu-project.org/Features/COLO -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ Thanks -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ Zhang Chen -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ hi. -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem. -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, -ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read -ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "", -ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at -ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, -ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized -ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ address@hidden) -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ at migration/colo.c:264 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread -ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ $3 = 0 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, -ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at -ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, -ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at -ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #5 main_loop_wait (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized -ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ $1 = 6 -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should -ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ thank you. -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶ -ï¼ ï¼ï¼ï¼ ï¼ address@hidden -ï¼ ï¼ï¼ï¼ ï¼ address@hidden -ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼ -ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 -ï¼ ï¼ï¼ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: -ï¼ ï¼ï¼ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU -ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki]( -http://wiki.qemu-project.org/Features/COLO -). -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. -ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. -ï¼ ï¼ï¼ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": -ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's -ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. -ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development? -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully, -ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. -ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO, -ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ Thanks -ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized -outï¼, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, -errp=0x0) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message -(errp=0x7f3d62bfaa48, -ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at -ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6 -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -- -ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context: -http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html -ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -- -ï¼ ï¼ï¼ï¼ ï¼ Thanks -ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ ï¼ -ï¼ ï¼ï¼ï¼ -ï¼ ï¼ï¼ -ï¼ ï¼ -- -ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK -ï¼ ï¼ -ï¼ ï¼ . -ï¼ ï¼ -ï¼ - diff --git a/results/classifier/015/unknown/80615920 b/results/classifier/015/unknown/80615920 deleted file mode 100644 index 219084c13..000000000 --- a/results/classifier/015/unknown/80615920 +++ /dev/null @@ -1,375 +0,0 @@ -user-level: 0.849 -risc-v: 0.809 -KVM: 0.803 -mistranslation: 0.800 -TCG: 0.785 -x86: 0.779 -operating system: 0.777 -peripherals: 0.777 -i386: 0.773 -vnc: 0.768 -ppc: 0.768 -hypervisor: 0.764 -VMM: 0.759 -performance: 0.758 -permissions: 0.758 -register: 0.756 -architecture: 0.755 -files: 0.751 -boot: 0.750 -virtual: 0.749 -device: 0.748 -assembly: 0.747 -debug: 0.746 -arm: 0.744 -kernel: 0.738 -semantic: 0.737 -network: 0.732 -socket: 0.732 -graphic: 0.730 -PID: 0.727 -alpha: 0.726 - -[BUG] accel/tcg: cpu_exec_longjmp_cleanup: assertion failed: (cpu == current_cpu) - -It seems there is a bug in SIGALRM handling when 486 system emulates x86_64 -code. - -This code: - -#include <stdio.h> -#include <stdlib.h> -#include <pthread.h> -#include <signal.h> -#include <unistd.h> - -pthread_t thread1, thread2; - -// Signal handler for SIGALRM -void alarm_handler(int sig) { - // Do nothing, just wake up the other thread -} - -// Thread 1 function -void* thread1_func(void* arg) { - // Set up the signal handler for SIGALRM - signal(SIGALRM, alarm_handler); - - // Wait for 5 seconds - sleep(1); - - // Send SIGALRM signal to thread 2 - pthread_kill(thread2, SIGALRM); - - return NULL; -} - -// Thread 2 function -void* thread2_func(void* arg) { - // Wait for the SIGALRM signal - pause(); - - printf("Thread 2 woke up!\n"); - - return NULL; -} - -int main() { - // Create thread 1 - if (pthread_create(&thread1, NULL, thread1_func, NULL) != 0) { - fprintf(stderr, "Failed to create thread 1\n"); - return 1; - } - - // Create thread 2 - if (pthread_create(&thread2, NULL, thread2_func, NULL) != 0) { - fprintf(stderr, "Failed to create thread 2\n"); - return 1; - } - - // Wait for both threads to finish - pthread_join(thread1, NULL); - pthread_join(thread2, NULL); - - return 0; -} - - -Fails with this -strace log (there are also unsupported syscalls 334 and 435, -but it seems it doesn't affect the code much): - -... -736 rt_sigaction(SIGALRM,0x000000001123ec20,0x000000001123ecc0) = 0 -736 clock_nanosleep(CLOCK_REALTIME,0,{tv_sec = 1,tv_nsec = 0},{tv_sec = -1,tv_nsec = 0}) -736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x0000000010800b38,8) = 0 -736 Unknown syscall 435 -736 -clone(CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID| - ... -736 rt_sigprocmask(SIG_SETMASK,0x0000000010800b38,NULL,8) -736 set_robust_list(0x11a419a0,0) = -1 errno=38 (Function not implemented) -736 rt_sigprocmask(SIG_SETMASK,0x0000000011a41fb0,NULL,8) = 0 - = 0 -736 pause(0,0,2,277186368,0,295966400) -736 -futex(0x000000001123f990,FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET,738,NULL,NULL,0) - = 0 -736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x000000001123ee88,8) = 0 -736 getpid() = 736 -736 tgkill(736,739,SIGALRM) = 0 - = -1 errno=4 (Interrupted system call) ---- SIGALRM {si_signo=SIGALRM, si_code=SI_TKILL, si_pid=736, si_uid=0} --- -0x48874a != 0x3c69e10 -736 rt_sigprocmask(SIG_SETMASK,0x000000001123ee88,NULL,8) = 0 -** -ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: -(cpu == current_cpu) -Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion -failed: (cpu == current_cpu) -0x48874a != 0x3c69e10 -** -ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: -(cpu == current_cpu) -Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion -failed: (cpu == current_cpu) -# - -The code fails either with or without -singlestep, the command line: - -/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep /opt/x86_64/alarm.bin - -Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't -use RDTSC on i486" [1], -with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints -current pointers of -cpu and current_cpu (line "0x48874a != 0x3c69e10"). - -config.log (built as a part of buildroot, basically the minimal possible -configuration for running x86_64 on 486): - -# Configured with: -'/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/build/qemu-8.1.1/configure' - '--prefix=/usr' -'--cross-prefix=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/i486-buildroot-linux-gnu-' - '--audio-drv-list=' -'--python=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/python3' - -'--ninja=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/ninja' -'--disable-alsa' '--disable-bpf' '--disable-brlapi' '--disable-bsd-user' -'--disable-cap-ng' '--disable-capstone' '--disable-containers' -'--disable-coreaudio' '--disable-curl' '--disable-curses' -'--disable-dbus-display' '--disable-docs' '--disable-dsound' '--disable-hvf' -'--disable-jack' '--disable-libiscsi' '--disable-linux-aio' -'--disable-linux-io-uring' '--disable-malloc-trim' '--disable-membarrier' -'--disable-mpath' '--disable-netmap' '--disable-opengl' '--disable-oss' -'--disable-pa' '--disable-rbd' '--disable-sanitizers' '--disable-selinux' -'--disable-sparse' '--disable-strip' '--disable-vde' '--disable-vhost-crypto' -'--disable-vhost-user-blk-server' '--disable-virtfs' '--disable-whpx' -'--disable-xen' '--disable-attr' '--disable-kvm' '--disable-vhost-net' -'--disable-download' '--disable-hexagon-idef-parser' '--disable-system' -'--enable-linux-user' '--target-list=x86_64-linux-user' '--disable-vhost-user' -'--disable-slirp' '--disable-sdl' '--disable-fdt' '--enable-trace-backends=nop' -'--disable-tools' '--disable-guest-agent' '--disable-fuse' -'--disable-fuse-lseek' '--disable-seccomp' '--disable-libssh' -'--disable-libusb' '--disable-vnc' '--disable-nettle' '--disable-numa' -'--disable-pipewire' '--disable-spice' '--disable-usb-redir' -'--disable-install-blobs' - -Emulation of the same x86_64 code with qemu 6.2.0 installed on another x86_64 -native machine works fine. - -[1] -https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg05387.html -Best regards, -Petr - -On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote: -> -> -It seems there is a bug in SIGALRM handling when 486 system emulates x86_64 -> -code. -486 host is pretty well out of support currently. Can you reproduce -this on a less ancient host CPU type ? - -> -ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: -> -(cpu == current_cpu) -> -Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: -> -assertion failed: (cpu == current_cpu) -> -0x48874a != 0x3c69e10 -> -** -> -ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: -> -(cpu == current_cpu) -> -Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: -> -assertion failed: (cpu == current_cpu) -What compiler version do you build QEMU with? That -assert is there because we have seen some buggy compilers -in the past which don't correctly preserve the variable -value as the setjmp/longjmp spec requires them to. - -thanks --- PMM - -Dne 27. 11. 23 v 10:37 Peter Maydell napsal(a): -> -On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote: -> -> -> -> It seems there is a bug in SIGALRM handling when 486 system emulates x86_64 -> -> code. -> -> -486 host is pretty well out of support currently. Can you reproduce -> -this on a less ancient host CPU type ? -> -It seems it only fails when the code is compiled for i486. QEMU built with the -same compiler with -march=i586 and above runs on the same physical hardware -without a problem. All -march= variants were executed on ryzen 3600. - -> -> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion -> -> failed: (cpu == current_cpu) -> -> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: -> -> assertion failed: (cpu == current_cpu) -> -> 0x48874a != 0x3c69e10 -> -> ** -> -> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion -> -> failed: (cpu == current_cpu) -> -> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: -> -> assertion failed: (cpu == current_cpu) -> -> -What compiler version do you build QEMU with? That -> -assert is there because we have seen some buggy compilers -> -in the past which don't correctly preserve the variable -> -value as the setjmp/longjmp spec requires them to. -> -i486 and i586+ code variants were compiled with GCC 13.2.0 (more exactly, -slackware64 current multilib distribution). - -i486 binary which runs on the real 486 is also GCC 13.2.0 and installed as a -part of the buildroot crosscompiler (about two week old git snapshot). - -> -thanks -> --- PMM -best regards, -Petr - -On 11/25/23 07:08, Petr Cvek wrote: -ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: -(cpu == current_cpu) -Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion -failed: (cpu == current_cpu) -# - -The code fails either with or without -singlestep, the command line: - -/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep /opt/x86_64/alarm.bin - -Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't use -RDTSC on i486" [1], -with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints -current pointers of -cpu and current_cpu (line "0x48874a != 0x3c69e10"). -If you try this again with 8.2-rc2, you should not see an assertion failure. -You should see instead - -QEMU internal SIGILL {code=ILLOPC, addr=0x12345678} -which I think more accurately summarizes the situation of attempting RDTSC on hardware -that does not support it. -r~ - -Dne 29. 11. 23 v 15:25 Richard Henderson napsal(a): -> -On 11/25/23 07:08, Petr Cvek wrote: -> -> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion -> -> failed: (cpu == current_cpu) -> -> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: -> -> assertion failed: (cpu == current_cpu) -> -> # -> -> -> -> The code fails either with or without -singlestep, the command line: -> -> -> -> /usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep -> -> /opt/x86_64/alarm.bin -> -> -> -> Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't -> -> use RDTSC on i486" [1], -> -> with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now -> -> prints current pointers of -> -> cpu and current_cpu (line "0x48874a != 0x3c69e10"). -> -> -> -If you try this again with 8.2-rc2, you should not see an assertion failure. -> -You should see instead -> -> -QEMU internal SIGILL {code=ILLOPC, addr=0x12345678} -> -> -which I think more accurately summarizes the situation of attempting RDTSC on -> -hardware that does not support it. -> -> -Compilation of vanilla qemu v8.2.0-rc2 with -march=i486 by GCC 13.2.0 and -running the resulting binary on ryzen still leads to: - -** -ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion failed: -(cpu == current_cpu) -Bail out! ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion -failed: (cpu == current_cpu) -Aborted - -> -> -r~ -Petr - |