38 files changed, 0 insertions, 31374 deletions
diff --git a/classification_output/04/other/02364653 b/classification_output/04/other/02364653
deleted file mode 100644
index 5b8062a67..000000000
--- a/classification_output/04/other/02364653
+++ /dev/null
@@ -1,371 +0,0 @@
-other: 0.956
-graphic: 0.948
-semantic: 0.942
-assembly: 0.936
-device: 0.928
-instruction: 0.927
-boot: 0.925
-socket: 0.924
-vnc: 0.922
-mistranslation: 0.912
-KVM: 0.911
-network: 0.881
-
-[Qemu-devel] [BUG] Inappropriate size of target_sigset_t
-
-Hello, Peter, Laurent,
-
-While working on another problem yesterday, I think I discovered a 
-long-standing bug in QEMU Linux user mode: our target_sigset_t structure is 
-eight times smaller as it should be!
-
-In this code segment from syscalls_def.h:
-
-#ifdef TARGET_MIPS
-#define TARGET_NSIG        128
-#else
-#define TARGET_NSIG        64
-#endif
-#define TARGET_NSIG_BPW    TARGET_ABI_BITS
-#define TARGET_NSIG_WORDS  (TARGET_NSIG / TARGET_NSIG_BPW)
-
-typedef struct {
-    abi_ulong sig[TARGET_NSIG_WORDS];
-} target_sigset_t;
-
-... TARGET_ABI_BITS should be replaced by eight times smaller constant (in 
-fact, semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is 
-needed is actually "a byte per signal" in target_sigset_t, and we allow "a bit 
-per signal").
-
-All this probably sounds to you like something impossible, since this code is 
-in QEMU "since forever", but I checked everything, and the bug seems real. I 
-wish you can prove me wrong.
-
-I just wanted to let you know about this, given the sensitive timing of current 
-softfreeze, and the fact that I won't be able to do more investigation on this 
-in coming weeks, since I am busy with other tasks, but perhaps you can analyze 
-and do something which you consider appropriate.
-
-Yours,
-Aleksandar
-
-Le 03/07/2019 Ã  21:46, Aleksandar Markovic a Ã©critÂ :
->
-Hello, Peter, Laurent,
->
->
-While working on another problem yesterday, I think I discovered a
->
-long-standing bug in QEMU Linux user mode: our target_sigset_t structure is
->
-eight times smaller as it should be!
->
->
-In this code segment from syscalls_def.h:
->
->
-#ifdef TARGET_MIPS
->
-#define TARGET_NSIG      128
->
-#else
->
-#define TARGET_NSIG      64
->
-#endif
->
-#define TARGET_NSIG_BPW          TARGET_ABI_BITS
->
-#define TARGET_NSIG_WORDS  (TARGET_NSIG / TARGET_NSIG_BPW)
->
->
-typedef struct {
->
-abi_ulong sig[TARGET_NSIG_WORDS];
->
-} target_sigset_t;
->
->
-... TARGET_ABI_BITS should be replaced by eight times smaller constant (in
->
-fact, semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is
->
-needed is actually "a byte per signal" in target_sigset_t, and we allow "a
->
-bit per signal").
-TARGET_NSIG is divided by TARGET_ABI_BITS which gives you the number of
-abi_ulong words we need in target_sigset_t.
-
->
-All this probably sounds to you like something impossible, since this code is
->
-in QEMU "since forever", but I checked everything, and the bug seems real. I
->
-wish you can prove me wrong.
->
->
-I just wanted to let you know about this, given the sensitive timing of
->
-current softfreeze, and the fact that I won't be able to do more
->
-investigation on this in coming weeks, since I am busy with other tasks, but
->
-perhaps you can analyze and do something which you consider appropriate.
-If I compare with kernel, it looks good:
-
-In Linux:
-
-  arch/mips/include/uapi/asm/signal.h
-
-  #define _NSIG           128
-  #define _NSIG_BPW       (sizeof(unsigned long) * 8)
-  #define _NSIG_WORDS     (_NSIG / _NSIG_BPW)
-
-  typedef struct {
-          unsigned long sig[_NSIG_WORDS];
-  } sigset_t;
-
-_NSIG_BPW is 8 * 8 = 64 on MIPS64 or 4 * 8 = 32 on MIPS
-
-In QEMU:
-
-TARGET_NSIG_BPW is TARGET_ABI_BITS which is  TARGET_LONG_BITS which is
-64 on MIPS64 and 32 on MIPS.
-
-I think there is no problem.
-
-Thanks,
-Laurent
-
-From: Laurent Vivier <address@hidden>
->
-If I compare with kernel, it looks good:
->
-...
->
-I think there is no problem.
-Sure, thanks for such fast response - again, I am glad if you are right. 
-However, for some reason, glibc (and musl too) define sigset_t differently than 
-kernel. Please take a look. I am not sure if this is covered fine in our code.
-
-Yours,
-Aleksandar
-
->
-Thanks,
->
-Laurent
-
-On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote:
->
->
-From: Laurent Vivier <address@hidden>
->
-> If I compare with kernel, it looks good:
->
-> ...
->
-> I think there is no problem.
->
->
-Sure, thanks for such fast response - again, I am glad if you are right.
->
-However, for some reason, glibc (and musl too) define sigset_t differently
->
-than kernel. Please take a look. I am not sure if this is covered fine in our
->
-code.
-Yeah, the libc definitions of sigset_t don't match the
-kernel ones (this is for obscure historical reasons IIRC).
-We're providing implementations of the target
-syscall interface, so our target_sigset_t should be the
-target kernel's version (and the target libc's version doesn't
-matter to us). On the other hand we will be using the
-host libc version, I think, so a little caution is required
-and it's possible we have some bugs in our code.
-
-thanks
--- PMM
-
->
-From: Peter Maydell <address@hidden>
->
->
-On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote:
->
->
->
-> From: Laurent Vivier <address@hidden>
->
-> > If I compare with kernel, it looks good:
->
-> > ...
->
-> > I think there is no problem.
->
->
->
-> Sure, thanks for such fast response - again, I am glad if you are right.
->
-> However, for some reason, glibc (and musl too) define sigset_t differently
->
-> than kernel. Please take a look. I am not sure if this is covered fine in
->
-> our code.
->
->
-Yeah, the libc definitions of sigset_t don't match the
->
-kernel ones (this is for obscure historical reasons IIRC).
->
-We're providing implementations of the target
->
-syscall interface, so our target_sigset_t should be the
->
-target kernel's version (and the target libc's version doesn't
->
-matter to us). On the other hand we will be using the
->
-host libc version, I think, so a little caution is required
->
-and it's possible we have some bugs in our code.
-OK, I gather than this is not something that requires our immediate attention 
-(for 4.1), but we can analyze it later on.
-
-Thanks for response!!
-
-Sincerely,
-Aleksandar
-
->
-thanks
->
--- PMM
-
-Le 03/07/2019 Ã  22:28, Peter Maydell a Ã©critÂ :
->
-On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote:
->
->
->
-> From: Laurent Vivier <address@hidden>
->
->> If I compare with kernel, it looks good:
->
->> ...
->
->> I think there is no problem.
->
->
->
-> Sure, thanks for such fast response - again, I am glad if you are right.
->
-> However, for some reason, glibc (and musl too) define sigset_t differently
->
-> than kernel. Please take a look. I am not sure if this is covered fine in
->
-> our code.
->
->
-Yeah, the libc definitions of sigset_t don't match the
->
-kernel ones (this is for obscure historical reasons IIRC).
->
-We're providing implementations of the target
->
-syscall interface, so our target_sigset_t should be the
->
-target kernel's version (and the target libc's version doesn't
->
-matter to us). On the other hand we will be using the
->
-host libc version, I think, so a little caution is required
->
-and it's possible we have some bugs in our code.
-It's why we need host_to_target_sigset_internal() and
-target_to_host_sigset_internal() that translates bits and bytes between
-guest kernel interface and host libc interface.
-
-void host_to_target_sigset_internal(target_sigset_t *d,
-                                    const sigset_t *s)
-{
-    int i;
-    target_sigemptyset(d);
-    for (i = 1; i <= TARGET_NSIG; i++) {
-        if (sigismember(s, i)) {
-            target_sigaddset(d, host_to_target_signal(i));
-        }
-    }
-}
-
-void target_to_host_sigset_internal(sigset_t *d,
-                                    const target_sigset_t *s)
-{
-    int i;
-    sigemptyset(d);
-    for (i = 1; i <= TARGET_NSIG; i++) {
-        if (target_sigismember(s, i)) {
-            sigaddset(d, target_to_host_signal(i));
-        }
-    }
-}
-
-Thanks,
-Laurent
-
-Hi Aleksandar,
-
-On Wed, Jul 3, 2019 at 12:48 PM Aleksandar Markovic
-<address@hidden> wrote:
->
-#define TARGET_NSIG_BPW    TARGET_ABI_BITS
->
-#define TARGET_NSIG_WORDS  (TARGET_NSIG / TARGET_NSIG_BPW)
->
->
-typedef struct {
->
-abi_ulong sig[TARGET_NSIG_WORDS];
->
-} target_sigset_t;
->
->
-... TARGET_ABI_BITS should be replaced by eight times smaller constant (in
->
-fact,
->
-semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is needed
->
-is actually "a byte per signal" in target_sigset_t, and we allow "a bit per
->
-signal").
-Why do we need a byte per target signal, if the functions in linux-user/signal.c
-operate with bits?
-
--- 
-Thanks.
--- Max
-
->
-Why do we need a byte per target signal, if the functions in
->
-linux-user/signal.c
->
-operate with bits?
-Max,
-
-I did not base my findings on code analysis, but on dumping size/offsets of 
-elements of some structures, as they are emulated in QEMU, and in real systems. 
-So, I can't really answer your question.
-
-Yours,
-Aleksandar
-
->
---
->
-Thanks.
->
--- Max
-
diff --git a/classification_output/04/other/02572177 b/classification_output/04/other/02572177
deleted file mode 100644
index 10871daea..000000000
--- a/classification_output/04/other/02572177
+++ /dev/null
@@ -1,429 +0,0 @@
-other: 0.869
-instruction: 0.794
-device: 0.791
-semantic: 0.770
-assembly: 0.756
-graphic: 0.747
-socket: 0.742
-network: 0.708
-vnc: 0.706
-mistranslation: 0.693
-KVM: 0.669
-boot: 0.658
-
-[Qemu-devel] 答复: Re:  [BUG]COLO failover hang
-
-hi.
-
-
-I test the git qemu master have the same problem.
-
-
-(gdb) bt
-
-
-#0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, niov=1, 
-fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-
-
-#1  0x00007f658e4aa0c2 in qio_channel_read (address@hidden, address@hidden "", 
-address@hidden, address@hidden) at io/channel.c:114
-
-
-#2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, 
-buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at 
-migration/qemu-file-channel.c:78
-
-
-#3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
-migration/qemu-file.c:295
-
-
-#4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, address@hidden) at 
-migration/qemu-file.c:555
-
-
-#5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at 
-migration/qemu-file.c:568
-
-
-#6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at 
-migration/qemu-file.c:648
-
-
-#7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
-address@hidden) at migration/colo.c:244
-
-
-#8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized outï¼, 
-address@hidden, address@hidden)
-
-
-    at migration/colo.c:264
-
-
-#9  0x00007f658e3e740e in colo_process_incoming_thread (opaque=0x7f658eb30360 
-ï¼mis_current.31286ï¼) at migration/colo.c:577
-
-
-#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-
-
-#11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-
-
-(gdb) p ioc-ï¼name
-
-
-$2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-
-
-(gdb) p ioc-ï¼features          Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-
-
-$3 = 0
-
-
-
-
-
-(gdb) bt
-
-
-#0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, condition=G_IO_IN, 
-opaque=0x7fdcceeafa90) at migration/socket.c:137
-
-
-#1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at 
-gmain.c:3054
-
-
-#2  g_main_context_dispatch (context=ï¼optimized outï¼, address@hidden) at 
-gmain.c:3630
-
-
-#3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-
-
-#4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at util/main-loop.c:258
-
-
-#5  main_loop_wait (address@hidden) at util/main-loop.c:506
-
-
-#6  0x00007fdccb526187 in main_loop () at vl.c:1898
-
-
-#7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized outï¼) at 
-vl.c:4709
-
-
-(gdb) p ioc-ï¼features
-
-
-$1 = 6
-
-
-(gdb) p ioc-ï¼name
-
-
-$2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-
-
-
-
-
-May be socket_accept_incoming_migration should call 
-qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-
-
-
-
-
-thank you.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ16æ¥ 14:46
-ä¸» é¢ ï¼Re: [Qemu-devel] COLO failover hang
-
-
-
-
-
-
-
-On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼
-ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼
-ï¼ I found that the colo in qemu is not complete yet.
-ï¼ Do the colo have any plan for development?
-
-Yes, We are developing. You can see some of patch we pushing.
-
-ï¼ Has anyone ever run it successfully? Any help is appreciated!
-
-In our internal version can run it successfully,
-The failover detail you can ask Zhanghailiang for help.
-Next time if you have some question about COLO,
-please cc me and zhanghailiang address@hidden
-
-
-Thanks
-Zhang Chen
-
-
-ï¼
-ï¼
-ï¼
-ï¼ centos7.2+qemu2.7.50
-ï¼ (gdb) bt
-ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ io/channel-socket.c:497
-ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ address@hidden "", address@hidden,
-ï¼ address@hidden) at io/channel.c:97
-ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ migration/qemu-file-channel.c:78
-ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ migration/qemu-file.c:257
-ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ migration/qemu-file.c:523
-ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ migration/qemu-file.c:603
-ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ address@hidden) at migration/colo.c:215
-ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ migration/colo.c:546
-ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ migration/colo.c:649
-ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ --
-ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼
-ï¼
-ï¼
-ï¼
-
--- 
-Thanks
-Zhang Chen
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-hi.
-
-I test the git qemu master have the same problem.
-
-(gdb) bt
-#0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-#1  0x00007f658e4aa0c2 in qio_channel_read
-(address@hidden, address@hidden "",
-address@hidden, address@hidden) at io/channel.c:114
-#2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-migration/qemu-file-channel.c:78
-#3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-migration/qemu-file.c:295
-#4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-address@hidden) at migration/qemu-file.c:555
-#5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-migration/qemu-file.c:568
-#6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-migration/qemu-file.c:648
-#7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-address@hidden) at migration/colo.c:244
-#8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-outï¼, address@hidden,
-address@hidden)
-at migration/colo.c:264
-#9  0x00007f658e3e740e in colo_process_incoming_thread
-(opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-
-#11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-
-(gdb) p ioc-ï¼name
-
-$2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-
-(gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-
-$3 = 0
-
-
-(gdb) bt
-#0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-#1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-gmain.c:3054
-#2  g_main_context_dispatch (context=ï¼optimized outï¼,
-address@hidden) at gmain.c:3630
-#3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-#4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-util/main-loop.c:258
-#5  main_loop_wait (address@hidden) at
-util/main-loop.c:506
-#6  0x00007fdccb526187 in main_loop () at vl.c:1898
-#7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-outï¼) at vl.c:4709
-(gdb) p ioc-ï¼features
-
-$1 = 6
-
-(gdb) p ioc-ï¼name
-
-$2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-May be socket_accept_incoming_migration should
-call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-thank you.
-
-
-
-
-
-åå§é®ä»¶
-address@hidden;
-*æ¶ä»¶äººï¼*çå¹¿10165992;address@hidden;
-address@hidden;address@hidden;
-*æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-*ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-
-
-
-
-On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼
-ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼
-ï¼ I found that the colo in qemu is not complete yet.
-ï¼ Do the colo have any plan for development?
-
-Yes, We are developing. You can see some of patch we pushing.
-
-ï¼ Has anyone ever run it successfully? Any help is appreciated!
-
-In our internal version can run it successfully,
-The failover detail you can ask Zhanghailiang for help.
-Next time if you have some question about COLO,
-please cc me and zhanghailiang address@hidden
-
-
-Thanks
-Zhang Chen
-
-
-ï¼
-ï¼
-ï¼
-ï¼ centos7.2+qemu2.7.50
-ï¼ (gdb) bt
-ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ io/channel-socket.c:497
-ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ address@hidden "", address@hidden,
-ï¼ address@hidden) at io/channel.c:97
-ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ migration/qemu-file-channel.c:78
-ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ migration/qemu-file.c:257
-ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ migration/qemu-file.c:523
-ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ migration/qemu-file.c:603
-ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ address@hidden) at migration/colo.c:215
-ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ migration/colo.c:546
-ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ migration/colo.c:649
-ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ --
-ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼
-ï¼
-ï¼
-ï¼
-
---
-Thanks
-Zhang Chen
---
-Thanks
-Zhang Chen
-
diff --git a/classification_output/04/other/12869209 b/classification_output/04/other/12869209
deleted file mode 100644
index 6e8310c01..000000000
--- a/classification_output/04/other/12869209
+++ /dev/null
@@ -1,96 +0,0 @@
-other: 0.964
-device: 0.951
-assembly: 0.937
-mistranslation: 0.935
-instruction: 0.919
-socket: 0.906
-semantic: 0.891
-vnc: 0.885
-graphic: 0.879
-network: 0.858
-KVM: 0.857
-boot: 0.837
-
-[BUG FIX][PATCH v3 0/3] vhost-user-blk: fix bug on device disconnection during initialization
-
-This is a series fixing a bug in
-          host-user-blk.
-Is there any chance for it to be considered for the next rc?
-Thanks!
-Denis
-On 29.03.2021 16:44, Denis Plotnikov
-      wrote:
-ping!
-On 25.03.2021 18:12, Denis Plotnikov
-        wrote:
-v3:
-  * 0003: a new patch added fixing the problem on vm shutdown
-    I stumbled on this bug after v2 sending.
-  * 0001: gramma fixing (Raphael)
-  * 0002: commit message fixing (Raphael)
-
-v2:
-  * split the initial patch into two (Raphael)
-  * rename init to realized (Raphael)
-  * remove unrelated comment (Raphael)
-
-When the vhost-user-blk device lose the connection to the daemon during
-the initialization phase it kills qemu because of the assert in the code.
-The series fixes the bug.
-
-0001 is preparation for the fix
-0002 fixes the bug, patch description has the full motivation for the series
-0003 (added in v3) fix bug on vm shutdown
-
-Denis Plotnikov (3):
-  vhost-user-blk: use different event handlers on initialization
-  vhost-user-blk: perform immediate cleanup if disconnect on
-    initialization
-  vhost-user-blk: add immediate cleanup on shutdown
-
- hw/block/vhost-user-blk.c | 79 ++++++++++++++++++++++++---------------
- 1 file changed, 48 insertions(+), 31 deletions(-)
-
-On 01.04.2021 14:21, Denis Plotnikov wrote:
-This is a series fixing a bug in host-user-blk.
-More specifically, it's not just a bug but crasher.
-
-Valentine
-Is there any chance for it to be considered for the next rc?
-
-Thanks!
-
-Denis
-
-On 29.03.2021 16:44, Denis Plotnikov wrote:
-ping!
-
-On 25.03.2021 18:12, Denis Plotnikov wrote:
-v3:
-   * 0003: a new patch added fixing the problem on vm shutdown
-     I stumbled on this bug after v2 sending.
-   * 0001: gramma fixing (Raphael)
-   * 0002: commit message fixing (Raphael)
-
-v2:
-   * split the initial patch into two (Raphael)
-   * rename init to realized (Raphael)
-   * remove unrelated comment (Raphael)
-
-When the vhost-user-blk device lose the connection to the daemon during
-the initialization phase it kills qemu because of the assert in the code.
-The series fixes the bug.
-
-0001 is preparation for the fix
-0002 fixes the bug, patch description has the full motivation for the series
-0003 (added in v3) fix bug on vm shutdown
-
-Denis Plotnikov (3):
-   vhost-user-blk: use different event handlers on initialization
-   vhost-user-blk: perform immediate cleanup if disconnect on
-     initialization
-   vhost-user-blk: add immediate cleanup on shutdown
-
-  hw/block/vhost-user-blk.c | 79 ++++++++++++++++++++++++---------------
-  1 file changed, 48 insertions(+), 31 deletions(-)
-
diff --git a/classification_output/04/other/13442371 b/classification_output/04/other/13442371
deleted file mode 100644
index 189f6cbb8..000000000
--- a/classification_output/04/other/13442371
+++ /dev/null
@@ -1,377 +0,0 @@
-other: 0.886
-device: 0.883
-assembly: 0.877
-KVM: 0.872
-instruction: 0.861
-mistranslation: 0.859
-vnc: 0.858
-graphic: 0.850
-semantic: 0.850
-socket: 0.831
-boot: 0.815
-network: 0.811
-
-[Qemu-devel] [BUG] nanoMIPS support problem related to extract2 support for i386 TCG target
-
-Hello, Richard, Peter, and others.
-
-As a part of activities before 4.1 release, I tested nanoMIPS support
-in QEMU (which was officially fully integrated in 4.0, is currently
-limited to system mode only, and was tested in a similar fashion right
-prior to 4.0).
-
-This support appears to be broken now. Following command line works in
-4.0, but results in kernel panic for the current tip of the tree:
-
-~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
--cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
-1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
-"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
-
-(kernel and rootfs image files used in this commend line can be
-downloaded from the locations mentioned in our user guide)
-
-The quick bisect points to the commit:
-
-commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
-Author: Richard Henderson <address@hidden>
-Date:   Mon Feb 25 11:42:35 2019 -0800
-
-    tcg/i386: Support INDEX_op_extract2_{i32,i64}
-
-    Signed-off-by: Richard Henderson <address@hidden>
-
-Please advise on further actions.
-
-Yours,
-Aleksandar
-
-On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic
-<address@hidden> wrote:
->
->
-Hello, Richard, Peter, and others.
->
->
-As a part of activities before 4.1 release, I tested nanoMIPS support
->
-in QEMU (which was officially fully integrated in 4.0, is currently
->
-limited to system mode only, and was tested in a similar fashion right
->
-prior to 4.0).
->
->
-This support appears to be broken now. Following command line works in
->
-4.0, but results in kernel panic for the current tip of the tree:
->
->
-~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
->
--cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
->
-1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
->
-"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
->
->
-(kernel and rootfs image files used in this commend line can be
->
-downloaded from the locations mentioned in our user guide)
->
->
-The quick bisect points to the commit:
->
->
-commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
->
-Author: Richard Henderson <address@hidden>
->
-Date:   Mon Feb 25 11:42:35 2019 -0800
->
->
-tcg/i386: Support INDEX_op_extract2_{i32,i64}
->
->
-Signed-off-by: Richard Henderson <address@hidden>
->
->
-Please advise on further actions.
->
-Just to add a data point:
-
-If the following change is applied:
-
-diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
-index 928e8b8..b6a4cf2 100644
---- a/tcg/i386/tcg-target.h
-+++ b/tcg/i386/tcg-target.h
-@@ -124,7 +124,7 @@ extern bool have_avx2;
- #define TCG_TARGET_HAS_deposit_i32      1
- #define TCG_TARGET_HAS_extract_i32      1
- #define TCG_TARGET_HAS_sextract_i32     1
--#define TCG_TARGET_HAS_extract2_i32     1
-+#define TCG_TARGET_HAS_extract2_i32     0
- #define TCG_TARGET_HAS_movcond_i32      1
- #define TCG_TARGET_HAS_add2_i32         1
- #define TCG_TARGET_HAS_sub2_i32         1
-@@ -163,7 +163,7 @@ extern bool have_avx2;
- #define TCG_TARGET_HAS_deposit_i64      1
- #define TCG_TARGET_HAS_extract_i64      1
- #define TCG_TARGET_HAS_sextract_i64     0
--#define TCG_TARGET_HAS_extract2_i64     1
-+#define TCG_TARGET_HAS_extract2_i64     0
- #define TCG_TARGET_HAS_movcond_i64      1
- #define TCG_TARGET_HAS_add2_i64         1
- #define TCG_TARGET_HAS_sub2_i64         1
-
-... the problem disappears.
-
-
->
-Yours,
->
-Aleksandar
-
-On Fri, Jul 12, 2019 at 8:19 PM Aleksandar Markovic
-<address@hidden> wrote:
->
->
-On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic
->
-<address@hidden> wrote:
->
->
->
-> Hello, Richard, Peter, and others.
->
->
->
-> As a part of activities before 4.1 release, I tested nanoMIPS support
->
-> in QEMU (which was officially fully integrated in 4.0, is currently
->
-> limited to system mode only, and was tested in a similar fashion right
->
-> prior to 4.0).
->
->
->
-> This support appears to be broken now. Following command line works in
->
-> 4.0, but results in kernel panic for the current tip of the tree:
->
->
->
-> ~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
->
-> -cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
->
-> 1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
->
-> "mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
->
->
->
-> (kernel and rootfs image files used in this commend line can be
->
-> downloaded from the locations mentioned in our user guide)
->
->
->
-> The quick bisect points to the commit:
->
->
->
-> commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
->
-> Author: Richard Henderson <address@hidden>
->
-> Date:   Mon Feb 25 11:42:35 2019 -0800
->
->
->
->     tcg/i386: Support INDEX_op_extract2_{i32,i64}
->
->
->
->     Signed-off-by: Richard Henderson <address@hidden>
->
->
->
-> Please advise on further actions.
->
->
->
->
-Just to add a data point:
->
->
-If the following change is applied:
->
->
-diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
->
-index 928e8b8..b6a4cf2 100644
->
---- a/tcg/i386/tcg-target.h
->
-+++ b/tcg/i386/tcg-target.h
->
-@@ -124,7 +124,7 @@ extern bool have_avx2;
->
-#define TCG_TARGET_HAS_deposit_i32      1
->
-#define TCG_TARGET_HAS_extract_i32      1
->
-#define TCG_TARGET_HAS_sextract_i32     1
->
--#define TCG_TARGET_HAS_extract2_i32     1
->
-+#define TCG_TARGET_HAS_extract2_i32     0
->
-#define TCG_TARGET_HAS_movcond_i32      1
->
-#define TCG_TARGET_HAS_add2_i32         1
->
-#define TCG_TARGET_HAS_sub2_i32         1
->
-@@ -163,7 +163,7 @@ extern bool have_avx2;
->
-#define TCG_TARGET_HAS_deposit_i64      1
->
-#define TCG_TARGET_HAS_extract_i64      1
->
-#define TCG_TARGET_HAS_sextract_i64     0
->
--#define TCG_TARGET_HAS_extract2_i64     1
->
-+#define TCG_TARGET_HAS_extract2_i64     0
->
-#define TCG_TARGET_HAS_movcond_i64      1
->
-#define TCG_TARGET_HAS_add2_i64         1
->
-#define TCG_TARGET_HAS_sub2_i64         1
->
->
-... the problem disappears.
->
-It looks the problem is in this code segment in of tcg_gen_deposit_i32():
-
-        if (ofs == 0) {
-            tcg_gen_extract2_i32(ret, arg1, arg2, len);
-            tcg_gen_rotli_i32(ret, ret, len);
-            goto done;
-        }
-
-)
-
-If that code segment is deleted altogether (which effectively forces
-usage of "fallback" part of tcg_gen_deposit_i32()), the problem also
-vanishes (without changes from my previous mail).
-
->
->
-> Yours,
->
-> Aleksandar
-
-Aleksandar Markovic <address@hidden> writes:
-
->
-Hello, Richard, Peter, and others.
->
->
-As a part of activities before 4.1 release, I tested nanoMIPS support
->
-in QEMU (which was officially fully integrated in 4.0, is currently
->
-limited to system mode only, and was tested in a similar fashion right
->
-prior to 4.0).
->
->
-This support appears to be broken now. Following command line works in
->
-4.0, but results in kernel panic for the current tip of the tree:
->
->
-~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
->
--cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
->
-1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
->
-"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
->
->
-(kernel and rootfs image files used in this commend line can be
->
-downloaded from the locations mentioned in our user guide)
->
->
-The quick bisect points to the commit:
->
->
-commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
->
-Author: Richard Henderson <address@hidden>
->
-Date:   Mon Feb 25 11:42:35 2019 -0800
->
->
-tcg/i386: Support INDEX_op_extract2_{i32,i64}
->
->
-Signed-off-by: Richard Henderson <address@hidden>
->
->
-Please advise on further actions.
-Please see the fix:
-
-  Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32
-  Date: Tue,  9 Jul 2019 14:19:00 +0200
-  Message-Id: <address@hidden>
-
->
->
-Yours,
->
-Aleksandar
---
-Alex BennÃ©e
-
-On Sat, Jul 13, 2019 at 9:21 AM Alex BennÃ©e <address@hidden> wrote:
->
->
-Please see the fix:
->
->
-Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32
->
-Date: Tue,  9 Jul 2019 14:19:00 +0200
->
-Message-Id: <address@hidden>
->
-Thanks, this fixed the behavior.
-
-Sincerely,
-Aleksandar
-
->
->
->
->
-> Yours,
->
-> Aleksandar
->
->
->
---
->
-Alex BennÃ©e
->
-
diff --git a/classification_output/04/other/16056596 b/classification_output/04/other/16056596
deleted file mode 100644
index c08ed7aa7..000000000
--- a/classification_output/04/other/16056596
+++ /dev/null
@@ -1,106 +0,0 @@
-other: 0.980
-semantic: 0.979
-assembly: 0.975
-instruction: 0.975
-device: 0.973
-boot: 0.971
-graphic: 0.970
-mistranslation: 0.961
-socket: 0.952
-vnc: 0.946
-network: 0.940
-KVM: 0.934
-
-[BUG][powerpc] KVM Guest Boot Failure and Hang at "Booting Linux via __start()"
-
-Bug Description:
-Encountering a boot failure when launching a KVM guest with
-'qemu-system-ppc64'. The guest hangs at boot, and the QEMU monitor
-crashes.
-Reproduction Steps:
-# qemu-system-ppc64 --version
-QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
-Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
-# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
-pseries,accel=kvm \
--m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
-  -device virtio-scsi-pci,id=scsi \
--drive
-file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
-\
--device scsi-hd,drive=drive0,bus=scsi.0 \
-  -netdev bridge,id=net0,br=virbr0 \
-  -device virtio-net-pci,netdev=net0 \
-  -serial pty \
-  -device virtio-balloon-pci \
-  -cpu host
-QEMU 9.2.50 monitor - type 'help' for more information
-char device redirected to /dev/pts/2 (label serial0)
-(qemu)
-(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
-unavailable: IRQ_XIVE capability must be present for KVM
-Falling back to kernel-irqchip=off
-** Qemu Hang
-
-(In another ssh session)
-# screen /dev/pts/2
-Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
-(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
-(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
-15:20:17 UTC 2024
-Detected machine type: 0000000000000101
-command line:
-BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
-root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
-Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
-Calling ibm,client-architecture-support... done
-memory layout at init:
-  memory_limit : 0000000000000000 (16 MB aligned)
-  alloc_bottom : 0000000008200000
-  alloc_top    : 0000000030000000
-  alloc_top_hi : 0000000800000000
-  rmo_top      : 0000000030000000
-  ram_top      : 0000000800000000
-instantiating rtas at 0x000000002fff0000... done
-prom_hold_cpus: skipped
-copying OF device tree...
-Building dt strings...
-Building dt structure...
-Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
-Device tree struct  0x0000000008220000 -> 0x0000000008230000
-Quiescing Open Firmware ...
-Booting Linux via __start() @ 0x0000000000440000 ...
-** Guest Console Hang
-
-
-Git Bisect:
-Performing git bisect points to the following patch:
-# git bisect bad
-e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
-commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
-Author: Nicholas Piggin <npiggin@gmail.com>
-Date:   Thu Dec 19 13:40:31 2024 +1000
-
-    target/ppc: fix timebase register reset state
-(H)DEC and PURR get reset before icount does, which causes them to
-be
-skewed and not match the init state. This can cause replay to not
-match the recorded trace exactly. For DEC and HDEC this is usually
-not
-noticable since they tend to get programmed before affecting the
-    target machine. PURR has been observed to cause replay bugs when
-    running Linux.
-
-    Fix this by resetting using a time of 0.
-
-    Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
-    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
-
- hw/ppc/ppc.c | 11 ++++++++---
- 1 file changed, 8 insertions(+), 3 deletions(-)
-
-
-Reverting the patch helps boot the guest.
-Thanks,
-Misbah Anjum N
-
diff --git a/classification_output/04/other/16201167 b/classification_output/04/other/16201167
deleted file mode 100644
index f2ac5def2..000000000
--- a/classification_output/04/other/16201167
+++ /dev/null
@@ -1,108 +0,0 @@
-other: 0.954
-mistranslation: 0.947
-vnc: 0.946
-graphic: 0.937
-semantic: 0.933
-KVM: 0.928
-instruction: 0.922
-device: 0.911
-assembly: 0.910
-socket: 0.892
-network: 0.864
-boot: 0.845
-
-[BUG] Qemu abort with error "kvm_mem_ioeventfd_add: error adding ioeventfd: File exists (17)"
-
-Hi list,
-
-When I did some tests in my virtual domain with live-attached virtio deivces, I 
-got a coredump file of Qemu.
-
-The error print from qemu is "kvm_mem_ioeventfd_add: error adding ioeventfd: 
-File exists (17)".
-And the call trace in the coredump file displays as below:
-#0  0x0000ffff89acecc8 in ?? () from /usr/lib64/libc.so.6
-#1  0x0000ffff89a8acbc in raise () from /usr/lib64/libc.so.6
-#2  0x0000ffff89a78d2c in abort () from /usr/lib64/libc.so.6
-#3  0x0000aaaabd7ccf1c in kvm_mem_ioeventfd_add (listener=<optimized out>, 
-section=<optimized out>, match_data=<optimized out>, data=<optimized out>, 
-e=<optimized out>) at ../accel/kvm/kvm-all.c:1607
-#4  0x0000aaaabd6e0304 in address_space_add_del_ioeventfds (fds_old_nb=164, 
-fds_old=0xffff5c80a1d0, fds_new_nb=160, fds_new=0xffff5c565080, 
-as=0xaaaabdfa8810 <address_space_memory>)
-    at ../softmmu/memory.c:795
-#5  address_space_update_ioeventfds (as=0xaaaabdfa8810 <address_space_memory>) 
-at ../softmmu/memory.c:856
-#6  0x0000aaaabd6e24d8 in memory_region_commit () at ../softmmu/memory.c:1113
-#7  0x0000aaaabd6e25c4 in memory_region_transaction_commit () at 
-../softmmu/memory.c:1144
-#8  0x0000aaaabd394eb4 in pci_bridge_update_mappings 
-(br=br@entry=0xaaaae755f7c0) at ../hw/pci/pci_bridge.c:248
-#9  0x0000aaaabd394f4c in pci_bridge_write_config (d=0xaaaae755f7c0, 
-address=44, val=<optimized out>, len=4) at ../hw/pci/pci_bridge.c:272
-#10 0x0000aaaabd39a928 in rp_write_config (d=0xaaaae755f7c0, address=44, 
-val=128, len=4) at ../hw/pci-bridge/pcie_root_port.c:39
-#11 0x0000aaaabd6df328 in memory_region_write_accessor (mr=0xaaaae63898d0, 
-addr=65580, value=<optimized out>, size=4, shift=<optimized out>, 
-mask=<optimized out>, attrs=...) at ../softmmu/memory.c:494
-#12 0x0000aaaabd6dcb6c in access_with_adjusted_size (addr=addr@entry=65580, 
-value=value@entry=0xffff817adc78, size=size@entry=4, access_size_min=<optimized 
-out>, access_size_max=<optimized out>,
-    access_fn=access_fn@entry=0xaaaabd6df284 <memory_region_write_accessor>, 
-mr=mr@entry=0xaaaae63898d0, attrs=attrs@entry=...) at ../softmmu/memory.c:556
-#13 0x0000aaaabd6e0dc8 in memory_region_dispatch_write 
-(mr=mr@entry=0xaaaae63898d0, addr=65580, data=<optimized out>, op=MO_32, 
-attrs=attrs@entry=...) at ../softmmu/memory.c:1534
-#14 0x0000aaaabd6d0574 in flatview_write_continue (fv=fv@entry=0xffff5c02da00, 
-addr=addr@entry=275146407980, attrs=attrs@entry=..., 
-ptr=ptr@entry=0xffff8aa8c028, len=len@entry=4,
-    addr1=<optimized out>, l=<optimized out>, mr=mr@entry=0xaaaae63898d0) at 
-/usr/src/debug/qemu-6.2.0-226.aarch64/include/qemu/host-utils.h:165
-#15 0x0000aaaabd6d4584 in flatview_write (len=4, buf=0xffff8aa8c028, attrs=..., 
-addr=275146407980, fv=0xffff5c02da00) at ../softmmu/physmem.c:3375
-#16 address_space_write (as=<optimized out>, addr=275146407980, attrs=..., 
-buf=buf@entry=0xffff8aa8c028, len=4) at ../softmmu/physmem.c:3467
-#17 0x0000aaaabd6d462c in address_space_rw (as=<optimized out>, addr=<optimized 
-out>, attrs=..., attrs@entry=..., buf=buf@entry=0xffff8aa8c028, len=<optimized 
-out>, is_write=<optimized out>)
-    at ../softmmu/physmem.c:3477
-#18 0x0000aaaabd7cf6e8 in kvm_cpu_exec (cpu=cpu@entry=0xaaaae625dfd0) at 
-../accel/kvm/kvm-all.c:2970
-#19 0x0000aaaabd7d09bc in kvm_vcpu_thread_fn (arg=arg@entry=0xaaaae625dfd0) at 
-../accel/kvm/kvm-accel-ops.c:49
-#20 0x0000aaaabd94ccd8 in qemu_thread_start (args=<optimized out>) at 
-../util/qemu-thread-posix.c:559
-
-
-By printing more info in the coredump file, I found that the addr of 
-fds_old[146] and fds_new[146] are same, but fds_old[146] belonged to a 
-live-attached virtio-scsi device while fds_new[146] was owned by another 
-live-attached virtio-net.
-The reason why addr conflicted was then been found from vm's console log. Just 
-before qemu aborted, the guest kernel crashed and kdump.service booted the 
-dump-capture kernel where re-alloced address for the devices.
-Because those virtio devices were live-attached after vm creating, different 
-addr may been assigned to them in the dump-capture kernel:
-
-the initial kernel booting log:
-[    1.663297] pci 0000:00:02.1: BAR 14: assigned [mem 0x11900000-0x11afffff]
-[    1.664560] pci 0000:00:02.1: BAR 15: assigned [mem 
-0x8001800000-0x80019fffff 64bit pref]
-
-the dump-capture kernel booting log:
-[    1.845211] pci 0000:00:02.0: BAR 14: assigned [mem 0x11900000-0x11bfffff]
-[    1.846542] pci 0000:00:02.0: BAR 15: assigned [mem 
-0x8001800000-0x8001afffff 64bit pref]
-
-
-I think directly aborting the qemu process may not be the best choice in this 
-case cuz it will interrupt the work of kdump.service so that failed to generate 
-memory dump of the crashed guest kernel.
-Perhaps, IMO, the error could be simply ignored in this case and just let kdump 
-to reboot the system after memory-dump finishing, but I failed to find a 
-suitable judgment in the codes.
-
-Any solution for this problem? Hope I can get some helps here.
-
-Hao
-
diff --git a/classification_output/04/other/16228234 b/classification_output/04/other/16228234
deleted file mode 100644
index dcb69d043..000000000
--- a/classification_output/04/other/16228234
+++ /dev/null
@@ -1,1852 +0,0 @@
-other: 0.535
-mistranslation: 0.518
-KVM: 0.445
-instruction: 0.442
-network: 0.440
-device: 0.439
-assembly: 0.435
-vnc: 0.420
-semantic: 0.411
-graphic: 0.408
-boot: 0.402
-socket: 0.401
-
-[Qemu-devel] [Bug?] BQL about live migration
-
-Hello Juan & Dave,
-
-We hit a bug in our test:
-Network error occurs when migrating a guest, libvirt then rollback the
-migration, causes qemu coredump
-qemu log:
-2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
- {"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event": "STOP"}
-2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
- qmp_cmd_name: migrate_cancel
-2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
- {"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event": 
-"MIGRATION", "data": {"status": "cancelling"}}
-2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
- qmp_cmd_name: cont
-2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
- virtio-balloon device status is 7 that means DRIVER OK
-2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
- virtio-net device status is 7 that means DRIVER OK
-2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
- virtio-blk device status is 7 that means DRIVER OK
-2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
- virtio-serial device status is 7 that means DRIVER OK
-2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|: 
-vm_state-notify:3ms
-2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
- {"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event": 
-"RESUME"}
-2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|:
- this iteration cycle takes 3s, new dirtied data:0MB
-2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
- {"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event": 
-"MIGRATION_PASS", "data": {"pass": 3}}
-2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for 
-(131583/18446744073709551615)
-qemu-kvm: /home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519: 
-virtio_net_save: Assertion `!n->vhost_started' failed.
-2017-03-01 12:54:43.028: shutting down
-
->
-From qemu log, qemu received and processed migrate_cancel/cont qmp commands
-after guest been stopped and entered the last round of migration. Then
-migration thread try to save device state when guest is running(started by
-cont command), causes assert and coredump.
-This is because in last iter, we call cpu_synchronize_all_states() to
-synchronize vcpu states, this call will release qemu_global_mutex and wait
-for do_kvm_cpu_synchronize_state() to be executed on target vcpu:
-(gdb) bt
-#0  0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
-/lib64/libpthread.so.0
-#1  0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0 <qemu_work_cond>, 
-mutex=0x7f764445eba0 <qemu_global_mutex>) at util/qemu-thread-posix.c:132
-#2  0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, func=0x7f7643a46413 
-<do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at 
-/mnt/public/yanghy/qemu-kvm/cpus.c:995
-#3  0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at 
-/mnt/public/yanghy/qemu-kvm/kvm-all.c:1805
-#4  0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at 
-/mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457
-#5  0x00007f7643a2db0c in cpu_synchronize_all_states () at 
-/mnt/public/yanghy/qemu-kvm/cpus.c:766
-#6  0x00007f7643a67b5b in qemu_savevm_state_complete_precopy (f=0x7f76462f2d30, 
-iterable_only=false) at /mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051
-#7  0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0 
-<current_migration.37571>, current_active_state=4, 
-old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at 
-migration/migration.c:1753
-#8  0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0 
-<current_migration.37571>) at migration/migration.c:1922
-#9  0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0
-#10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6
-(gdb) p iothread_locked
-$1 = true
-
-and then, qemu main thread been executed, it won't block because migration
-thread released the qemu_global_mutex:
-(gdb) thr 1
-[Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))]
-#0  os_host_main_loop_wait (timeout=931565) at main-loop.c:270
-270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout %d\n", 
-timeout);
-(gdb) p iothread_locked
-$2 = true
-(gdb) l 268
-263
-264         ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, 
-timeout);
-265
-266
-267         if (timeout) {
-268             qemu_mutex_lock_iothread();
-269             if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
-270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout %d\n", 
-timeout);
-271             }
-272         }
-(gdb)
-
-So, although we've hold iothread_lock in stop&copy phase of migration, we
-can't guarantee the iothread been locked all through the stop & copy phase,
-any thoughts on how to solve this problem?
-
-
-Thanks,
--Gonglei
-
-On Fri, 03/03 09:29, Gonglei (Arei) wrote:
->
-Hello Juan & Dave,
->
->
-We hit a bug in our test:
->
-Network error occurs when migrating a guest, libvirt then rollback the
->
-migration, causes qemu coredump
->
-qemu log:
->
-2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
->
-{"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event":
->
-"STOP"}
->
-2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
->
-qmp_cmd_name: migrate_cancel
->
-2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
->
-{"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event":
->
-"MIGRATION", "data": {"status": "cancelling"}}
->
-2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
->
-qmp_cmd_name: cont
->
-2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
-virtio-balloon device status is 7 that means DRIVER OK
->
-2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
-virtio-net device status is 7 that means DRIVER OK
->
-2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
-virtio-blk device status is 7 that means DRIVER OK
->
-2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
-virtio-serial device status is 7 that means DRIVER OK
->
-2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|:
->
-vm_state-notify:3ms
->
-2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
->
-{"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event":
->
-"RESUME"}
->
-2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|:
->
-this iteration cycle takes 3s, new dirtied data:0MB
->
-2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
->
-{"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event":
->
-"MIGRATION_PASS", "data": {"pass": 3}}
->
-2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for
->
-(131583/18446744073709551615)
->
-qemu-kvm:
->
-/home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519:
->
-virtio_net_save: Assertion `!n->vhost_started' failed.
->
-2017-03-01 12:54:43.028: shutting down
->
->
-From qemu log, qemu received and processed migrate_cancel/cont qmp commands
->
-after guest been stopped and entered the last round of migration. Then
->
-migration thread try to save device state when guest is running(started by
->
-cont command), causes assert and coredump.
->
-This is because in last iter, we call cpu_synchronize_all_states() to
->
-synchronize vcpu states, this call will release qemu_global_mutex and wait
->
-for do_kvm_cpu_synchronize_state() to be executed on target vcpu:
->
-(gdb) bt
->
-#0  0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from
->
-/lib64/libpthread.so.0
->
-#1  0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0
->
-<qemu_work_cond>, mutex=0x7f764445eba0 <qemu_global_mutex>) at
->
-util/qemu-thread-posix.c:132
->
-#2  0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, func=0x7f7643a46413
->
-<do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at
->
-/mnt/public/yanghy/qemu-kvm/cpus.c:995
->
-#3  0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at
->
-/mnt/public/yanghy/qemu-kvm/kvm-all.c:1805
->
-#4  0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at
->
-/mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457
->
-#5  0x00007f7643a2db0c in cpu_synchronize_all_states () at
->
-/mnt/public/yanghy/qemu-kvm/cpus.c:766
->
-#6  0x00007f7643a67b5b in qemu_savevm_state_complete_precopy
->
-(f=0x7f76462f2d30, iterable_only=false) at
->
-/mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051
->
-#7  0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0
->
-<current_migration.37571>, current_active_state=4,
->
-old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at
->
-migration/migration.c:1753
->
-#8  0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0
->
-<current_migration.37571>) at migration/migration.c:1922
->
-#9  0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0
->
-#10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6
->
-(gdb) p iothread_locked
->
-$1 = true
->
->
-and then, qemu main thread been executed, it won't block because migration
->
-thread released the qemu_global_mutex:
->
-(gdb) thr 1
->
-[Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))]
->
-#0  os_host_main_loop_wait (timeout=931565) at main-loop.c:270
->
-270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
->
-%d\n", timeout);
->
-(gdb) p iothread_locked
->
-$2 = true
->
-(gdb) l 268
->
-263
->
-264         ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len,
->
-timeout);
->
-265
->
-266
->
-267         if (timeout) {
->
-268             qemu_mutex_lock_iothread();
->
-269             if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
->
-270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
->
-%d\n", timeout);
->
-271             }
->
-272         }
->
-(gdb)
->
->
-So, although we've hold iothread_lock in stop&copy phase of migration, we
->
-can't guarantee the iothread been locked all through the stop & copy phase,
->
-any thoughts on how to solve this problem?
-Could you post a backtrace of the assertion?
-
-Fam
-
-On 2017/3/3 18:42, Fam Zheng wrote:
->
-On Fri, 03/03 09:29, Gonglei (Arei) wrote:
->
-> Hello Juan & Dave,
->
->
->
-> We hit a bug in our test:
->
-> Network error occurs when migrating a guest, libvirt then rollback the
->
-> migration, causes qemu coredump
->
-> qemu log:
->
-> 2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
->
->  {"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event":
->
-> "STOP"}
->
-> 2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
->
->  qmp_cmd_name: migrate_cancel
->
-> 2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
->
->  {"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event":
->
-> "MIGRATION", "data": {"status": "cancelling"}}
->
-> 2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
->
->  qmp_cmd_name: cont
->
-> 2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
->  virtio-balloon device status is 7 that means DRIVER OK
->
-> 2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
->  virtio-net device status is 7 that means DRIVER OK
->
-> 2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
->  virtio-blk device status is 7 that means DRIVER OK
->
-> 2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
->  virtio-serial device status is 7 that means DRIVER OK
->
-> 2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|:
->
-> vm_state-notify:3ms
->
-> 2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
->
->  {"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event":
->
-> "RESUME"}
->
-> 2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|:
->
->  this iteration cycle takes 3s, new dirtied data:0MB
->
-> 2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
->
->  {"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event":
->
-> "MIGRATION_PASS", "data": {"pass": 3}}
->
-> 2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for
->
-> (131583/18446744073709551615)
->
-> qemu-kvm:
->
-> /home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519:
->
-> virtio_net_save: Assertion `!n->vhost_started' failed.
->
-> 2017-03-01 12:54:43.028: shutting down
->
->
->
-> From qemu log, qemu received and processed migrate_cancel/cont qmp commands
->
-> after guest been stopped and entered the last round of migration. Then
->
-> migration thread try to save device state when guest is running(started by
->
-> cont command), causes assert and coredump.
->
-> This is because in last iter, we call cpu_synchronize_all_states() to
->
-> synchronize vcpu states, this call will release qemu_global_mutex and wait
->
-> for do_kvm_cpu_synchronize_state() to be executed on target vcpu:
->
-> (gdb) bt
->
-> #0  0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from
->
-> /lib64/libpthread.so.0
->
-> #1  0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0
->
-> <qemu_work_cond>, mutex=0x7f764445eba0 <qemu_global_mutex>) at
->
-> util/qemu-thread-posix.c:132
->
-> #2  0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80,
->
-> func=0x7f7643a46413 <do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at
->
-> /mnt/public/yanghy/qemu-kvm/cpus.c:995
->
-> #3  0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at
->
-> /mnt/public/yanghy/qemu-kvm/kvm-all.c:1805
->
-> #4  0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at
->
-> /mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457
->
-> #5  0x00007f7643a2db0c in cpu_synchronize_all_states () at
->
-> /mnt/public/yanghy/qemu-kvm/cpus.c:766
->
-> #6  0x00007f7643a67b5b in qemu_savevm_state_complete_precopy
->
-> (f=0x7f76462f2d30, iterable_only=false) at
->
-> /mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051
->
-> #7  0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0
->
-> <current_migration.37571>, current_active_state=4,
->
-> old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at
->
-> migration/migration.c:1753
->
-> #8  0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0
->
-> <current_migration.37571>) at migration/migration.c:1922
->
-> #9  0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0
->
-> #10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6
->
-> (gdb) p iothread_locked
->
-> $1 = true
->
->
->
-> and then, qemu main thread been executed, it won't block because migration
->
-> thread released the qemu_global_mutex:
->
-> (gdb) thr 1
->
-> [Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))]
->
-> #0  os_host_main_loop_wait (timeout=931565) at main-loop.c:270
->
-> 270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
->
-> %d\n", timeout);
->
-> (gdb) p iothread_locked
->
-> $2 = true
->
-> (gdb) l 268
->
-> 263
->
-> 264         ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len,
->
-> timeout);
->
-> 265
->
-> 266
->
-> 267         if (timeout) {
->
-> 268             qemu_mutex_lock_iothread();
->
-> 269             if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
->
-> 270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
->
-> %d\n", timeout);
->
-> 271             }
->
-> 272         }
->
-> (gdb)
->
->
->
-> So, although we've hold iothread_lock in stop&copy phase of migration, we
->
-> can't guarantee the iothread been locked all through the stop & copy phase,
->
-> any thoughts on how to solve this problem?
->
->
-Could you post a backtrace of the assertion?
-#0  0x00007f97b1fbe5d7 in raise () from /usr/lib64/libc.so.6
-#1  0x00007f97b1fbfcc8 in abort () from /usr/lib64/libc.so.6
-#2  0x00007f97b1fb7546 in __assert_fail_base () from /usr/lib64/libc.so.6
-#3  0x00007f97b1fb75f2 in __assert_fail () from /usr/lib64/libc.so.6
-#4  0x000000000049fd19 in virtio_net_save (f=0x7f97a8ca44d0, 
-opaque=0x7f97a86e9018) at /usr/src/debug/qemu-kvm-2.6.0/hw/
-#5  0x000000000047e380 in vmstate_save_old_style (address@hidden, 
-address@hidden, se=0x7f9
-#6  0x000000000047fb93 in vmstate_save (address@hidden, address@hidden, 
-address@hidden
-#7  0x0000000000481ad2 in qemu_savevm_state_complete_precopy (f=0x7f97a8ca44d0, 
-address@hidden)
-#8  0x00000000006c6b60 in migration_completion (address@hidden 
-<current_migration.38312>, current_active_state=curre
-    address@hidden) at migration/migration.c:1761
-#9  0x00000000006c71db in migration_thread (address@hidden 
-<current_migration.38312>) at migration/migrati
-
->
->
-Fam
->
---
-Thanks,
-Yang
-
-* Gonglei (Arei) (address@hidden) wrote:
->
-Hello Juan & Dave,
-cc'ing in pbonzini since it's magic involving cpu_synrhonize_all_states()
-
->
-We hit a bug in our test:
->
-Network error occurs when migrating a guest, libvirt then rollback the
->
-migration, causes qemu coredump
->
-qemu log:
->
-2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
->
-{"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event":
->
-"STOP"}
->
-2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
->
-qmp_cmd_name: migrate_cancel
->
-2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
->
-{"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event":
->
-"MIGRATION", "data": {"status": "cancelling"}}
->
-2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
->
-qmp_cmd_name: cont
->
-2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
-virtio-balloon device status is 7 that means DRIVER OK
->
-2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
-virtio-net device status is 7 that means DRIVER OK
->
-2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
-virtio-blk device status is 7 that means DRIVER OK
->
-2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
->
-virtio-serial device status is 7 that means DRIVER OK
->
-2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|:
->
-vm_state-notify:3ms
->
-2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
->
-{"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event":
->
-"RESUME"}
->
-2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|:
->
-this iteration cycle takes 3s, new dirtied data:0MB
->
-2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
->
-{"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event":
->
-"MIGRATION_PASS", "data": {"pass": 3}}
->
-2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for
->
-(131583/18446744073709551615)
->
-qemu-kvm:
->
-/home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519:
->
-virtio_net_save: Assertion `!n->vhost_started' failed.
->
-2017-03-01 12:54:43.028: shutting down
->
->
-From qemu log, qemu received and processed migrate_cancel/cont qmp commands
->
-after guest been stopped and entered the last round of migration. Then
->
-migration thread try to save device state when guest is running(started by
->
-cont command), causes assert and coredump.
->
-This is because in last iter, we call cpu_synchronize_all_states() to
->
-synchronize vcpu states, this call will release qemu_global_mutex and wait
->
-for do_kvm_cpu_synchronize_state() to be executed on target vcpu:
->
-(gdb) bt
->
-#0  0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from
->
-/lib64/libpthread.so.0
->
-#1  0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0
->
-<qemu_work_cond>, mutex=0x7f764445eba0 <qemu_global_mutex>) at
->
-util/qemu-thread-posix.c:132
->
-#2  0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, func=0x7f7643a46413
->
-<do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at
->
-/mnt/public/yanghy/qemu-kvm/cpus.c:995
->
-#3  0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at
->
-/mnt/public/yanghy/qemu-kvm/kvm-all.c:1805
->
-#4  0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at
->
-/mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457
->
-#5  0x00007f7643a2db0c in cpu_synchronize_all_states () at
->
-/mnt/public/yanghy/qemu-kvm/cpus.c:766
->
-#6  0x00007f7643a67b5b in qemu_savevm_state_complete_precopy
->
-(f=0x7f76462f2d30, iterable_only=false) at
->
-/mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051
->
-#7  0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0
->
-<current_migration.37571>, current_active_state=4,
->
-old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at
->
-migration/migration.c:1753
->
-#8  0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0
->
-<current_migration.37571>) at migration/migration.c:1922
->
-#9  0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0
->
-#10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6
->
-(gdb) p iothread_locked
->
-$1 = true
->
->
-and then, qemu main thread been executed, it won't block because migration
->
-thread released the qemu_global_mutex:
->
-(gdb) thr 1
->
-[Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))]
->
-#0  os_host_main_loop_wait (timeout=931565) at main-loop.c:270
->
-270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
->
-%d\n", timeout);
->
-(gdb) p iothread_locked
->
-$2 = true
->
-(gdb) l 268
->
-263
->
-264         ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len,
->
-timeout);
->
-265
->
-266
->
-267         if (timeout) {
->
-268             qemu_mutex_lock_iothread();
->
-269             if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
->
-270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
->
-%d\n", timeout);
->
-271             }
->
-272         }
->
-(gdb)
->
->
-So, although we've hold iothread_lock in stop&copy phase of migration, we
->
-can't guarantee the iothread been locked all through the stop & copy phase,
->
-any thoughts on how to solve this problem?
-Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
-their were times when run_on_cpu would have to drop the BQL and I worried about 
-it,
-but this is the 1st time I've seen an error due to it.
-
-Do you know what the migration state was at that point? Was it 
-MIGRATION_STATUS_CANCELLING?
-I'm thinking perhaps we should stop 'cont' from continuing while migration is in
-MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED - so 
-that
-perhaps libvirt could avoid sending the 'cont' until then?
-
-Dave
-
-
->
->
-Thanks,
->
--Gonglei
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
->
-Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
->
-their were times when run_on_cpu would have to drop the BQL and I worried
->
-about it,
->
-but this is the 1st time I've seen an error due to it.
->
->
-Do you know what the migration state was at that point? Was it
->
-MIGRATION_STATUS_CANCELLING?
->
-I'm thinking perhaps we should stop 'cont' from continuing while migration is
->
-in
->
-MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED - so
->
-that
->
-perhaps libvirt could avoid sending the 'cont' until then?
-No, there's no event, though I thought libvirt would poll until
-"query-migrate" returns the cancelled state.  Of course that is a small
-consolation, because a segfault is unacceptable.
-
-One possibility is to suspend the monitor in qmp_migrate_cancel and
-resume it (with add_migration_state_change_notifier) when we hit the
-CANCELLED state.  I'm not sure what the latency would be between the end
-of migrate_fd_cancel and finally reaching CANCELLED.
-
-Paolo
-
-* Paolo Bonzini (address@hidden) wrote:
->
->
->
-On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
->
-> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
->
-> their were times when run_on_cpu would have to drop the BQL and I worried
->
-> about it,
->
-> but this is the 1st time I've seen an error due to it.
->
->
->
-> Do you know what the migration state was at that point? Was it
->
-> MIGRATION_STATUS_CANCELLING?
->
-> I'm thinking perhaps we should stop 'cont' from continuing while migration
->
-> is in
->
-> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED -
->
-> so that
->
-> perhaps libvirt could avoid sending the 'cont' until then?
->
->
-No, there's no event, though I thought libvirt would poll until
->
-"query-migrate" returns the cancelled state.  Of course that is a small
->
-consolation, because a segfault is unacceptable.
-I think you might get an event if you set the new migrate capability called
-'events' on!
-
-void migrate_set_state(int *state, int old_state, int new_state)
-{
-    if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
-        trace_migrate_set_state(new_state);
-        migrate_generate_event(new_state);
-    }
-}
-
-static void migrate_generate_event(int new_state)
-{
-    if (migrate_use_events()) {
-        qapi_event_send_migration(new_state, &error_abort); 
-    }
-}
-
-That event feature went in sometime after 2.3.0.
-
->
-One possibility is to suspend the monitor in qmp_migrate_cancel and
->
-resume it (with add_migration_state_change_notifier) when we hit the
->
-CANCELLED state.  I'm not sure what the latency would be between the end
->
-of migrate_fd_cancel and finally reaching CANCELLED.
-I don't like suspending monitors; it can potentially take quite a significant
-time to do a cancel.
-How about making 'cont' fail if we're in CANCELLING?
-
-I'd really love to see the 'run_on_cpu' being more careful about the BQL;
-we really need all of the rest of the devices to stay quiesced at times.
-
-Dave
-
->
-Paolo
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
->
-* Paolo Bonzini (address@hidden) wrote:
->
->
->
->
->
-> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
->
->> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
->
->> their were times when run_on_cpu would have to drop the BQL and I worried
->
->> about it,
->
->> but this is the 1st time I've seen an error due to it.
->
->>
->
->> Do you know what the migration state was at that point? Was it
->
->> MIGRATION_STATUS_CANCELLING?
->
->> I'm thinking perhaps we should stop 'cont' from continuing while migration
->
->> is in
->
->> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED -
->
->> so that
->
->> perhaps libvirt could avoid sending the 'cont' until then?
->
->
->
-> No, there's no event, though I thought libvirt would poll until
->
-> "query-migrate" returns the cancelled state.  Of course that is a small
->
-> consolation, because a segfault is unacceptable.
->
->
-I think you might get an event if you set the new migrate capability called
->
-'events' on!
->
->
-void migrate_set_state(int *state, int old_state, int new_state)
->
-{
->
-if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
->
-trace_migrate_set_state(new_state);
->
-migrate_generate_event(new_state);
->
-}
->
-}
->
->
-static void migrate_generate_event(int new_state)
->
-{
->
-if (migrate_use_events()) {
->
-qapi_event_send_migration(new_state, &error_abort);
->
-}
->
-}
->
->
-That event feature went in sometime after 2.3.0.
->
->
-> One possibility is to suspend the monitor in qmp_migrate_cancel and
->
-> resume it (with add_migration_state_change_notifier) when we hit the
->
-> CANCELLED state.  I'm not sure what the latency would be between the end
->
-> of migrate_fd_cancel and finally reaching CANCELLED.
->
->
-I don't like suspending monitors; it can potentially take quite a significant
->
-time to do a cancel.
->
-How about making 'cont' fail if we're in CANCELLING?
-Actually I thought that would be the case already (in fact CANCELLING is
-internal only; the outside world sees it as "active" in query-migrate).
-
-Lei, what is the runstate?  (That is, why did cont succeed at all)?
-
-Paolo
-
->
-I'd really love to see the 'run_on_cpu' being more careful about the BQL;
->
-we really need all of the rest of the devices to stay quiesced at times.
-That's not really possible, because of how condition variables work. :(
-
-* Paolo Bonzini (address@hidden) wrote:
->
->
->
-On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
->
-> * Paolo Bonzini (address@hidden) wrote:
->
->>
->
->>
->
->> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
->
->>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago
->
->>> that
->
->>> their were times when run_on_cpu would have to drop the BQL and I worried
->
->>> about it,
->
->>> but this is the 1st time I've seen an error due to it.
->
->>>
->
->>> Do you know what the migration state was at that point? Was it
->
->>> MIGRATION_STATUS_CANCELLING?
->
->>> I'm thinking perhaps we should stop 'cont' from continuing while
->
->>> migration is in
->
->>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED -
->
->>> so that
->
->>> perhaps libvirt could avoid sending the 'cont' until then?
->
->>
->
->> No, there's no event, though I thought libvirt would poll until
->
->> "query-migrate" returns the cancelled state.  Of course that is a small
->
->> consolation, because a segfault is unacceptable.
->
->
->
-> I think you might get an event if you set the new migrate capability called
->
-> 'events' on!
->
->
->
-> void migrate_set_state(int *state, int old_state, int new_state)
->
-> {
->
->     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
->
->         trace_migrate_set_state(new_state);
->
->         migrate_generate_event(new_state);
->
->     }
->
-> }
->
->
->
-> static void migrate_generate_event(int new_state)
->
-> {
->
->     if (migrate_use_events()) {
->
->         qapi_event_send_migration(new_state, &error_abort);
->
->     }
->
-> }
->
->
->
-> That event feature went in sometime after 2.3.0.
->
->
->
->> One possibility is to suspend the monitor in qmp_migrate_cancel and
->
->> resume it (with add_migration_state_change_notifier) when we hit the
->
->> CANCELLED state.  I'm not sure what the latency would be between the end
->
->> of migrate_fd_cancel and finally reaching CANCELLED.
->
->
->
-> I don't like suspending monitors; it can potentially take quite a
->
-> significant
->
-> time to do a cancel.
->
-> How about making 'cont' fail if we're in CANCELLING?
->
->
-Actually I thought that would be the case already (in fact CANCELLING is
->
-internal only; the outside world sees it as "active" in query-migrate).
->
->
-Lei, what is the runstate?  (That is, why did cont succeed at all)?
-I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the device
-save, and that's what we get at the end of a migrate and it's legal to restart
-from there.
-
->
-Paolo
->
->
-> I'd really love to see the 'run_on_cpu' being more careful about the BQL;
->
-> we really need all of the rest of the devices to stay quiesced at times.
->
->
-That's not really possible, because of how condition variables work. :(
-*Really* we need to find a solution to that - there's probably lots of 
-other things that can spring up in that small window other than the
-'cont'.
-
-Dave
-
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 03/03/2017 14:26, Dr. David Alan Gilbert wrote:
->
-* Paolo Bonzini (address@hidden) wrote:
->
->
->
->
->
-> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
->
->> * Paolo Bonzini (address@hidden) wrote:
->
->>>
->
->>>
->
->>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
->
->>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago
->
->>>> that
->
->>>> their were times when run_on_cpu would have to drop the BQL and I worried
->
->>>> about it,
->
->>>> but this is the 1st time I've seen an error due to it.
->
->>>>
->
->>>> Do you know what the migration state was at that point? Was it
->
->>>> MIGRATION_STATUS_CANCELLING?
->
->>>> I'm thinking perhaps we should stop 'cont' from continuing while
->
->>>> migration is in
->
->>>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED -
->
->>>> so that
->
->>>> perhaps libvirt could avoid sending the 'cont' until then?
->
->>>
->
->>> No, there's no event, though I thought libvirt would poll until
->
->>> "query-migrate" returns the cancelled state.  Of course that is a small
->
->>> consolation, because a segfault is unacceptable.
->
->>
->
->> I think you might get an event if you set the new migrate capability called
->
->> 'events' on!
->
->>
->
->> void migrate_set_state(int *state, int old_state, int new_state)
->
->> {
->
->>     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
->
->>         trace_migrate_set_state(new_state);
->
->>         migrate_generate_event(new_state);
->
->>     }
->
->> }
->
->>
->
->> static void migrate_generate_event(int new_state)
->
->> {
->
->>     if (migrate_use_events()) {
->
->>         qapi_event_send_migration(new_state, &error_abort);
->
->>     }
->
->> }
->
->>
->
->> That event feature went in sometime after 2.3.0.
->
->>
->
->>> One possibility is to suspend the monitor in qmp_migrate_cancel and
->
->>> resume it (with add_migration_state_change_notifier) when we hit the
->
->>> CANCELLED state.  I'm not sure what the latency would be between the end
->
->>> of migrate_fd_cancel and finally reaching CANCELLED.
->
->>
->
->> I don't like suspending monitors; it can potentially take quite a
->
->> significant
->
->> time to do a cancel.
->
->> How about making 'cont' fail if we're in CANCELLING?
->
->
->
-> Actually I thought that would be the case already (in fact CANCELLING is
->
-> internal only; the outside world sees it as "active" in query-migrate).
->
->
->
-> Lei, what is the runstate?  (That is, why did cont succeed at all)?
->
->
-I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the device
->
-save, and that's what we get at the end of a migrate and it's legal to restart
->
-from there.
-Yeah, but I think we get there at the end of a failed migrate only.  So
-perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE and forbid
-"cont" from finish-migrate (only allow it from failed-migrate)?
-
-Paolo
-
->
-> Paolo
->
->
->
->> I'd really love to see the 'run_on_cpu' being more careful about the BQL;
->
->> we really need all of the rest of the devices to stay quiesced at times.
->
->
->
-> That's not really possible, because of how condition variables work. :(
->
->
-*Really* we need to find a solution to that - there's probably lots of
->
-other things that can spring up in that small window other than the
->
-'cont'.
->
->
-Dave
->
->
---
->
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
-
-Hi Paolo,
-
-On Fri, Mar 3, 2017 at 9:33 PM, Paolo Bonzini <address@hidden> wrote:
-
->
->
->
-On 03/03/2017 14:26, Dr. David Alan Gilbert wrote:
->
-> * Paolo Bonzini (address@hidden) wrote:
->
->>
->
->>
->
->> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
->
->>> * Paolo Bonzini (address@hidden) wrote:
->
->>>>
->
->>>>
->
->>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
->
->>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while
->
-ago that
->
->>>>> their were times when run_on_cpu would have to drop the BQL and I
->
-worried about it,
->
->>>>> but this is the 1st time I've seen an error due to it.
->
->>>>>
->
->>>>> Do you know what the migration state was at that point? Was it
->
-MIGRATION_STATUS_CANCELLING?
->
->>>>> I'm thinking perhaps we should stop 'cont' from continuing while
->
-migration is in
->
->>>>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit
->
-CANCELLED - so that
->
->>>>> perhaps libvirt could avoid sending the 'cont' until then?
->
->>>>
->
->>>> No, there's no event, though I thought libvirt would poll until
->
->>>> "query-migrate" returns the cancelled state.  Of course that is a
->
-small
->
->>>> consolation, because a segfault is unacceptable.
->
->>>
->
->>> I think you might get an event if you set the new migrate capability
->
-called
->
->>> 'events' on!
->
->>>
->
->>> void migrate_set_state(int *state, int old_state, int new_state)
->
->>> {
->
->>>     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
->
->>>         trace_migrate_set_state(new_state);
->
->>>         migrate_generate_event(new_state);
->
->>>     }
->
->>> }
->
->>>
->
->>> static void migrate_generate_event(int new_state)
->
->>> {
->
->>>     if (migrate_use_events()) {
->
->>>         qapi_event_send_migration(new_state, &error_abort);
->
->>>     }
->
->>> }
->
->>>
->
->>> That event feature went in sometime after 2.3.0.
->
->>>
->
->>>> One possibility is to suspend the monitor in qmp_migrate_cancel and
->
->>>> resume it (with add_migration_state_change_notifier) when we hit the
->
->>>> CANCELLED state.  I'm not sure what the latency would be between the
->
-end
->
->>>> of migrate_fd_cancel and finally reaching CANCELLED.
->
->>>
->
->>> I don't like suspending monitors; it can potentially take quite a
->
-significant
->
->>> time to do a cancel.
->
->>> How about making 'cont' fail if we're in CANCELLING?
->
->>
->
->> Actually I thought that would be the case already (in fact CANCELLING is
->
->> internal only; the outside world sees it as "active" in query-migrate).
->
->>
->
->> Lei, what is the runstate?  (That is, why did cont succeed at all)?
->
->
->
-> I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the
->
-device
->
-> save, and that's what we get at the end of a migrate and it's legal to
->
-restart
->
-> from there.
->
->
-Yeah, but I think we get there at the end of a failed migrate only.  So
->
-perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE
-I think we do not need to introduce a new state here. If we hit 'cont' and
-the run state is RUN_STATE_FINISH_MIGRATE, we could assume that
-migration failed because 'RUN_STATE_FINISH_MIGRATE' only exists on
-source side, means we are finishing migration, a 'cont' at the meantime
-indicates that we are rolling back, otherwise source side should be
-destroyed.
-
-
->
-and forbid
->
-"cont" from finish-migrate (only allow it from failed-migrate)?
->
-The problem of forbid 'cont' here is that it will result in a failed
-migration and the source
-side will remain paused. We actually expect a usable guest when rollback.
-Is there a way to kill migration thread when we're under main thread, if
-there is, we
-could do the following to solve this problem:
-1. 'cont' received during runstate RUN_STATE_FINISH_MIGRATE
-2. kill migration thread
-3. vm_start()
-
-But this only solves 'cont' problem. As Dave said before, other things could
-happen during the small windows while we are finishing migration, that's
-what I was worried about...
-
-
->
-Paolo
->
->
->> Paolo
->
->>
->
->>> I'd really love to see the 'run_on_cpu' being more careful about the
->
-BQL;
->
->>> we really need all of the rest of the devices to stay quiesced at
->
-times.
->
->>
->
->> That's not really possible, because of how condition variables work. :(
->
->
->
-> *Really* we need to find a solution to that - there's probably lots of
->
-> other things that can spring up in that small window other than the
->
-> 'cont'.
->
->
->
-> Dave
->
->
->
-> --
->
-> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
->
->
-
-* Paolo Bonzini (address@hidden) wrote:
->
->
->
-On 03/03/2017 14:26, Dr. David Alan Gilbert wrote:
->
-> * Paolo Bonzini (address@hidden) wrote:
->
->>
->
->>
->
->> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
->
->>> * Paolo Bonzini (address@hidden) wrote:
->
->>>>
->
->>>>
->
->>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
->
->>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago
->
->>>>> that
->
->>>>> their were times when run_on_cpu would have to drop the BQL and I
->
->>>>> worried about it,
->
->>>>> but this is the 1st time I've seen an error due to it.
->
->>>>>
->
->>>>> Do you know what the migration state was at that point? Was it
->
->>>>> MIGRATION_STATUS_CANCELLING?
->
->>>>> I'm thinking perhaps we should stop 'cont' from continuing while
->
->>>>> migration is in
->
->>>>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED
->
->>>>> - so that
->
->>>>> perhaps libvirt could avoid sending the 'cont' until then?
->
->>>>
->
->>>> No, there's no event, though I thought libvirt would poll until
->
->>>> "query-migrate" returns the cancelled state.  Of course that is a small
->
->>>> consolation, because a segfault is unacceptable.
->
->>>
->
->>> I think you might get an event if you set the new migrate capability
->
->>> called
->
->>> 'events' on!
->
->>>
->
->>> void migrate_set_state(int *state, int old_state, int new_state)
->
->>> {
->
->>>     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
->
->>>         trace_migrate_set_state(new_state);
->
->>>         migrate_generate_event(new_state);
->
->>>     }
->
->>> }
->
->>>
->
->>> static void migrate_generate_event(int new_state)
->
->>> {
->
->>>     if (migrate_use_events()) {
->
->>>         qapi_event_send_migration(new_state, &error_abort);
->
->>>     }
->
->>> }
->
->>>
->
->>> That event feature went in sometime after 2.3.0.
->
->>>
->
->>>> One possibility is to suspend the monitor in qmp_migrate_cancel and
->
->>>> resume it (with add_migration_state_change_notifier) when we hit the
->
->>>> CANCELLED state.  I'm not sure what the latency would be between the end
->
->>>> of migrate_fd_cancel and finally reaching CANCELLED.
->
->>>
->
->>> I don't like suspending monitors; it can potentially take quite a
->
->>> significant
->
->>> time to do a cancel.
->
->>> How about making 'cont' fail if we're in CANCELLING?
->
->>
->
->> Actually I thought that would be the case already (in fact CANCELLING is
->
->> internal only; the outside world sees it as "active" in query-migrate).
->
->>
->
->> Lei, what is the runstate?  (That is, why did cont succeed at all)?
->
->
->
-> I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the
->
-> device
->
-> save, and that's what we get at the end of a migrate and it's legal to
->
-> restart
->
-> from there.
->
->
-Yeah, but I think we get there at the end of a failed migrate only.  So
->
-perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE and forbid
->
-"cont" from finish-migrate (only allow it from failed-migrate)?
-OK, I was wrong in my previous statement; we actually go 
-FINISH_MIGRATE->POSTMIGRATE
-so no new state is needed; you shouldn't be restarting the cpu in 
-FINISH_MIGRATE.
-
-My preference is to get libvirt to wait for the transition to POSTMIGRATE before
-it issues the 'cont'.  I'd rather not block the monitor with 'cont' but I'm
-not sure how we'd cleanly make cont fail without breaking existing libvirts
-that usually don't hit this race. (cc'ing in Jiri).
-
-Dave
-
->
-Paolo
->
->
->> Paolo
->
->>
->
->>> I'd really love to see the 'run_on_cpu' being more careful about the BQL;
->
->>> we really need all of the rest of the devices to stay quiesced at times.
->
->>
->
->> That's not really possible, because of how condition variables work. :(
->
->
->
-> *Really* we need to find a solution to that - there's probably lots of
->
-> other things that can spring up in that small window other than the
->
-> 'cont'.
->
->
->
-> Dave
->
->
->
-> --
->
-> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-Hi Dave,
-
-On Fri, Mar 3, 2017 at 9:26 PM, Dr. David Alan Gilbert <address@hidden>
-wrote:
-
->
-* Paolo Bonzini (address@hidden) wrote:
->
->
->
->
->
-> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
->
-> > * Paolo Bonzini (address@hidden) wrote:
->
-> >>
->
-> >>
->
-> >> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
->
-...
->
-> > That event feature went in sometime after 2.3.0.
->
-> >
->
-> >> One possibility is to suspend the monitor in qmp_migrate_cancel and
->
-> >> resume it (with add_migration_state_change_notifier) when we hit the
->
-> >> CANCELLED state.  I'm not sure what the latency would be between the
->
-end
->
-> >> of migrate_fd_cancel and finally reaching CANCELLED.
->
-> >
->
-> > I don't like suspending monitors; it can potentially take quite a
->
-significant
->
-> > time to do a cancel.
->
-> > How about making 'cont' fail if we're in CANCELLING?
->
->
->
-> Actually I thought that would be the case already (in fact CANCELLING is
->
-> internal only; the outside world sees it as "active" in query-migrate).
->
->
->
-> Lei, what is the runstate?  (That is, why did cont succeed at all)?
->
->
-I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the
->
-device
->
-It is RUN_STATE_FINISH_MIGRATE.
-
-
->
-save, and that's what we get at the end of a migrate and it's legal to
->
-restart
->
-from there.
->
->
-> Paolo
->
->
->
-> > I'd really love to see the 'run_on_cpu' being more careful about the
->
-BQL;
->
-> > we really need all of the rest of the devices to stay quiesced at
->
-times.
->
->
->
-> That's not really possible, because of how condition variables work. :(
->
->
-*Really* we need to find a solution to that - there's probably lots of
->
-other things that can spring up in that small window other than the
->
-'cont'.
->
-This is what I was worry about. Not only sync_cpu_state() will call
-run_on_cpu()
-but also vm_stop_force_state() will, both of them did hit the small windows
-in our
-test.
-
-
->
->
-Dave
->
->
---
->
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
-
diff --git a/classification_output/04/other/17743720 b/classification_output/04/other/17743720
deleted file mode 100644
index a3f02429a..000000000
--- a/classification_output/04/other/17743720
+++ /dev/null
@@ -1,779 +0,0 @@
-other: 0.984
-graphic: 0.972
-device: 0.971
-instruction: 0.966
-assembly: 0.966
-semantic: 0.962
-mistranslation: 0.959
-socket: 0.954
-vnc: 0.945
-boot: 0.945
-network: 0.944
-KVM: 0.933
-
-[Qemu-devel] [BUG] living migrate vm pause forever
-
-Sometimes, living migrate vm pause forever, migrate job stop, but very small 
-probability, I canât reproduce.
-qemu wait semaphore from libvirt send migrate continue, however libvirt wait 
-semaphore from qemu send vm pause.
-
-follow stack:
-qemu:
-Thread 6 (Thread 0x7f50445f3700 (LWP 18120)):
-#0  0x00007f504b84d670 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
-#1  0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at 
-qemu-2.12/util/qemu-thread-posix.c:322
-#2  0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, 
-current_active_state=0x7f50445f2ae4, new_state=10)
-    at qemu-2.12/migration/migration.c:2106
-#3  0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at 
-qemu-2.12/migration/migration.c:2137
-#4  migration_iteration_run (s=0x5574ef692f50) at 
-qemu-2.12/migration/migration.c:2311
-#5  migration_thread (opaque=0x5574ef692f50) 
-atqemu-2.12/migration/migration.c:2415
-#6  0x00007f504b847184 in start_thread () from 
-/lib/x86_64-linux-gnu/libpthread.so.0
-#7  0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6
-
-libvirt:
-Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)):
-#0  0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from 
-/lib/x86_64-linux-gnu/libpthread.so.0
-#1  0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at 
-../../../src/util/virthread.c:252
-#2  0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at 
-../../../src/conf/domain_conf.c:3303
-#3  0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, 
-vm=0x7fdbc4002f20, persist_xml=0x0,
-    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss 
-</hostname>\n  
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
-flags=777,
-    resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, 
-nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, 
-migParams=0x7fdb82ffc900)
-    at ../../../src/qemu/qemu_migration.c:3937
-#4  0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, 
-vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 
-"tcp://172.16.202.17:49152",
-    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
- <hos---Type <return> to continue, or q <return> to quit---
-tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
-flags=777,
-    resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, 
-migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900)
-    at ../../../src/qemu/qemu_migration.c:4118
-#5  0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, 
-conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0,
-    uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, 
-nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, 
-migParams=0x7fdb82ffc900,
-    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
- <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
-flags=777,
-    resource=0) at ../../../src/qemu/qemu_migration.c:5030
-#6  0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, 
-conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, 
-dconnuri=0x0,
-    uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, 
-listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, 
-compression=0x7fdb78007990,
-    migParams=0x7fdb82ffc900,
-    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
- <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
-flags=777,
-    dname=0x0, resource=0, v3proto=true) at 
-../../../src/qemu/qemu_migration.c:5124
-#7  0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, 
-xmlin=0x0,
-    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
- <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
-dconnuri=0x0,
-    uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, 
-resource=0) at ../../../src/qemu/qemu_driver.c:12996
-#8  0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, 
-xmlin=0x0,
-    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
- <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
-dconnuri=0x0,
-    uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, 
-bandwidth=0) at ../../../src/libvirt-domain.c:4698
-#9  0x000055d13923a939 in remoteDispatchDomainMigratePerform3 
-(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, 
-rerr=0x7fdb82ffcbc0,
-    args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528
-#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper 
-(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, 
-rerr=0x7fdb82ffcbc0,
-    args=0x7fdb7800b220, ret=0x7fdb78021e90) at 
-../../../daemon/remote_dispatch.h:7944
-#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall (prog=0x55d13af98b50, 
-server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620)
-    at ../../../src/rpc/virnetserverprogram.c:436
-#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, 
-server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620)
-    at ../../../src/rpc/virnetserverprogram.c:307
-#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, 
-client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620)
-    at ../../../src/rpc/virnetserver.c:148
--------------------------------------------------------------------------------------------------------------------------------------
-æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸ååº
-çä¸ªäººæç¾¤ç»ãç¦æ¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
-ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
-é®ä»¶ï¼
-This e-mail and its attachments contain confidential information from New H3C, 
-which is
-intended only for the person or entity whose address is listed above. Any use 
-of the
-information contained herein in any way (including, but not limited to, total 
-or partial
-disclosure, reproduction, or dissemination) by persons other than the intended
-recipient(s) is prohibited. If you receive this e-mail in error, please notify 
-the sender
-by phone or email immediately and delete it!
-
-* Yuchen (address@hidden) wrote:
->
-Sometimes, living migrate vm pause forever, migrate job stop, but very small
->
-probability, I canât reproduce.
->
-qemu wait semaphore from libvirt send migrate continue, however libvirt wait
->
-semaphore from qemu send vm pause.
-Hi,
-  I've copied in Jiri Denemark from libvirt.
-Can you confirm exactly which qemu and libvirt versions you're using
-please.
-
->
-follow stack:
->
-qemu:
->
-Thread 6 (Thread 0x7f50445f3700 (LWP 18120)):
->
-#0  0x00007f504b84d670 in sem_wait () from
->
-/lib/x86_64-linux-gnu/libpthread.so.0
->
-#1  0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at
->
-qemu-2.12/util/qemu-thread-posix.c:322
->
-#2  0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50,
->
-current_active_state=0x7f50445f2ae4, new_state=10)
->
-at qemu-2.12/migration/migration.c:2106
->
-#3  0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at
->
-qemu-2.12/migration/migration.c:2137
->
-#4  migration_iteration_run (s=0x5574ef692f50) at
->
-qemu-2.12/migration/migration.c:2311
->
-#5  migration_thread (opaque=0x5574ef692f50)
->
-atqemu-2.12/migration/migration.c:2415
->
-#6  0x00007f504b847184 in start_thread () from
->
-/lib/x86_64-linux-gnu/libpthread.so.0
->
-#7  0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6
-In migration_maybe_pause we have:
-
-    migrate_set_state(&s->state, *current_active_state,
-                      MIGRATION_STATUS_PRE_SWITCHOVER);
-    qemu_sem_wait(&s->pause_sem);
-    migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
-                      new_state);
-
-the line numbers don't match my 2.12.0 checkout; so I guess that it's
-that qemu_sem_wait it's stuck at.
-
-QEMU must have sent the switch to PRE_SWITCHOVER and that should have
-sent an event to libvirt, and libvirt should notice that - I'm
-not sure how to tell whether libvirt has seen that event yet or not?
-
-Dave
-
->
-libvirt:
->
-Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)):
->
-#0  0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from
->
-/lib/x86_64-linux-gnu/libpthread.so.0
->
-#1  0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at
->
-../../../src/util/virthread.c:252
->
-#2  0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at
->
-../../../src/conf/domain_conf.c:3303
->
-#3  0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0,
->
-vm=0x7fdbc4002f20, persist_xml=0x0,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss
->
-</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-flags=777,
->
-resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0,
->
-nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990,
->
-migParams=0x7fdb82ffc900)
->
-at ../../../src/qemu/qemu_migration.c:3937
->
-#4  0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0,
->
-vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0
->
-"tcp://172.16.202.17:49152",
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n  <hos---Type <return> to continue, or q <return>
->
-to quit---
->
-tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-flags=777,
->
-resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0,
->
-migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900)
->
-at ../../../src/qemu/qemu_migration.c:4118
->
-#5  0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0,
->
-conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0,
->
-uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0,
->
-nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990,
->
-migParams=0x7fdb82ffc900,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-flags=777,
->
-resource=0) at ../../../src/qemu/qemu_migration.c:5030
->
-#6  0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0,
->
-conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0,
->
-dconnuri=0x0,
->
-uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0,
->
-listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0,
->
-compression=0x7fdb78007990,
->
-migParams=0x7fdb82ffc900,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-flags=777,
->
-dname=0x0, resource=0, v3proto=true) at
->
-../../../src/qemu/qemu_migration.c:5124
->
-#7  0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00,
->
-xmlin=0x0,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-dconnuri=0x0,
->
-uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0,
->
-resource=0) at ../../../src/qemu/qemu_driver.c:12996
->
-#8  0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00,
->
-xmlin=0x0,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-dconnuri=0x0,
->
-uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0,
->
-bandwidth=0) at ../../../src/libvirt-domain.c:4698
->
-#9  0x000055d13923a939 in remoteDispatchDomainMigratePerform3
->
-(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620,
->
-rerr=0x7fdb82ffcbc0,
->
-args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528
->
-#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper
->
-(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620,
->
-rerr=0x7fdb82ffcbc0,
->
-args=0x7fdb7800b220, ret=0x7fdb78021e90) at
->
-../../../daemon/remote_dispatch.h:7944
->
-#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall
->
-(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0,
->
-msg=0x55d13afbf620)
->
-at ../../../src/rpc/virnetserverprogram.c:436
->
-#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50,
->
-server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620)
->
-at ../../../src/rpc/virnetserverprogram.c:307
->
-#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60,
->
-client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620)
->
-at ../../../src/rpc/virnetserver.c:148
->
--------------------------------------------------------------------------------------------------------------------------------------
->
-æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸ååº
->
-çä¸ªäººæç¾¤ç»ãç¦æ¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
->
-ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
->
-é®ä»¶ï¼
->
-This e-mail and its attachments contain confidential information from New
->
-H3C, which is
->
-intended only for the person or entity whose address is listed above. Any use
->
-of the
->
-information contained herein in any way (including, but not limited to, total
->
-or partial
->
-disclosure, reproduction, or dissemination) by persons other than the intended
->
-recipient(s) is prohibited. If you receive this e-mail in error, please
->
-notify the sender
->
-by phone or email immediately and delete it!
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-In migration_maybe_pause we have:
-
-    migrate_set_state(&s->state, *current_active_state,
-                      MIGRATION_STATUS_PRE_SWITCHOVER);
-    qemu_sem_wait(&s->pause_sem);
-    migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
-                      new_state);
-
-the line numbers don't match my 2.12.0 checkout; so I guess that it's that 
-qemu_sem_wait it's stuck at.
-
-QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an 
-event to libvirt, and libvirt should notice that - I'm not sure how to tell 
-whether libvirt has seen that event yet or not?
-
-
-Thank you for your attention. 
-Yes, you are right, QEMU wait semaphore in this place.
-I use qemu-2.12.1, libvirt-4.0.0.
-Because I added some debug code, so the line numbers doesn't match open qemu
-
------é®ä»¶åä»¶-----
-åä»¶äºº: Dr. David Alan Gilbert [
-mailto:address@hidden
-] 
-åéæ¶é´: 2019å¹´8æ21æ¥ 19:13
-æ¶ä»¶äºº: yuchen (Cloud) <address@hidden>; address@hidden
-æé: address@hidden
-ä¸»é¢: Re: [Qemu-devel] [BUG] living migrate vm pause forever
-
-* Yuchen (address@hidden) wrote:
->
-Sometimes, living migrate vm pause forever, migrate job stop, but very small
->
-probability, I canât reproduce.
->
-qemu wait semaphore from libvirt send migrate continue, however libvirt wait
->
-semaphore from qemu send vm pause.
-Hi,
-  I've copied in Jiri Denemark from libvirt.
-Can you confirm exactly which qemu and libvirt versions you're using please.
-
->
-follow stack:
->
-qemu:
->
-Thread 6 (Thread 0x7f50445f3700 (LWP 18120)):
->
-#0  0x00007f504b84d670 in sem_wait () from
->
-/lib/x86_64-linux-gnu/libpthread.so.0
->
-#1  0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0)
->
-at qemu-2.12/util/qemu-thread-posix.c:322
->
-#2  0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50,
->
-current_active_state=0x7f50445f2ae4, new_state=10)
->
-at qemu-2.12/migration/migration.c:2106
->
-#3  0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at
->
-qemu-2.12/migration/migration.c:2137
->
-#4  migration_iteration_run (s=0x5574ef692f50) at
->
-qemu-2.12/migration/migration.c:2311
->
-#5  migration_thread (opaque=0x5574ef692f50)
->
-atqemu-2.12/migration/migration.c:2415
->
-#6  0x00007f504b847184 in start_thread () from
->
-/lib/x86_64-linux-gnu/libpthread.so.0
->
-#7  0x00007f504b574bed in clone () from
->
-/lib/x86_64-linux-gnu/libc.so.6
-In migration_maybe_pause we have:
-
-    migrate_set_state(&s->state, *current_active_state,
-                      MIGRATION_STATUS_PRE_SWITCHOVER);
-    qemu_sem_wait(&s->pause_sem);
-    migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
-                      new_state);
-
-the line numbers don't match my 2.12.0 checkout; so I guess that it's that 
-qemu_sem_wait it's stuck at.
-
-QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an 
-event to libvirt, and libvirt should notice that - I'm not sure how to tell 
-whether libvirt has seen that event yet or not?
-
-Dave
-
->
-libvirt:
->
-Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)):
->
-#0  0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from
->
-/lib/x86_64-linux-gnu/libpthread.so.0
->
-#1  0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000,
->
-m=0x7fdbc4002f30) at ../../../src/util/virthread.c:252
->
-#2  0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at
->
-../../../src/conf/domain_conf.c:3303
->
-#3  0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0,
->
-vm=0x7fdbc4002f20, persist_xml=0x0,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss
->
-</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-flags=777,
->
-resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0,
->
-nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990,
->
-migParams=0x7fdb82ffc900)
->
-at ../../../src/qemu/qemu_migration.c:3937
->
-#4  0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0,
->
-vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0
->
-"tcp://172.16.202.17:49152",
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n
->
-<name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n  <hos---Type <return> to continue, or q
->
-<return> to quit---
->
-tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..
->
-tuuid>., cookieinlen=207, cookieout=0x7fdb82ffcad0,
->
-tuuid>cookieoutlen=0x7fdb82ffcac8, flags=777,
->
-resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0,
->
-migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900)
->
-at ../../../src/qemu/qemu_migration.c:4118
->
-#5  0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0,
->
-conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0,
->
-uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0,
->
-nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990,
->
-migParams=0x7fdb82ffc900,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-flags=777,
->
-resource=0) at ../../../src/qemu/qemu_migration.c:5030
->
-#6  0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0,
->
-conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0,
->
-dconnuri=0x0,
->
-uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0,
->
-listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0,
->
-compression=0x7fdb78007990,
->
-migParams=0x7fdb82ffc900,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-flags=777,
->
-dname=0x0, resource=0, v3proto=true) at
->
-../../../src/qemu/qemu_migration.c:5124
->
-#7  0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00,
->
-xmlin=0x0,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-dconnuri=0x0,
->
-uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777,
->
-dname=0x0, resource=0) at ../../../src/qemu/qemu_driver.c:12996
->
-#8  0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00,
->
-xmlin=0x0,
->
-cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
->
-<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
->
-<hostname>mss</hostname>\n
->
-<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
->
-cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
->
-dconnuri=0x0,
->
-uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777,
->
-dname=0x0, bandwidth=0) at ../../../src/libvirt-domain.c:4698
->
-#9  0x000055d13923a939 in remoteDispatchDomainMigratePerform3
->
-(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620,
->
-rerr=0x7fdb82ffcbc0,
->
-args=0x7fdb7800b220, ret=0x7fdb78021e90) at
->
-../../../daemon/remote.c:4528
->
-#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper
->
-(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620,
->
-rerr=0x7fdb82ffcbc0,
->
-args=0x7fdb7800b220, ret=0x7fdb78021e90) at
->
-../../../daemon/remote_dispatch.h:7944
->
-#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall
->
-(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0,
->
-msg=0x55d13afbf620)
->
-at ../../../src/rpc/virnetserverprogram.c:436
->
-#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50,
->
-server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620)
->
-at ../../../src/rpc/virnetserverprogram.c:307
->
-#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60,
->
-client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620)
->
-at ../../../src/rpc/virnetserver.c:148
->
-----------------------------------------------------------------------
->
----------------------------------------------------------------
->
-æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸ååº
->
-çä¸ªäººæç¾¤ç»ãç¦æ¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
->
-ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
->
-é®ä»¶ï¼
->
-This e-mail and its attachments contain confidential information from
->
-New H3C, which is intended only for the person or entity whose address
->
-is listed above. Any use of the information contained herein in any
->
-way (including, but not limited to, total or partial disclosure,
->
-reproduction, or dissemination) by persons other than the intended
->
-recipient(s) is prohibited. If you receive this e-mail in error,
->
-please notify the sender by phone or email immediately and delete it!
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
diff --git a/classification_output/04/other/21221931 b/classification_output/04/other/21221931
deleted file mode 100644
index ed9cdbd71..000000000
--- a/classification_output/04/other/21221931
+++ /dev/null
@@ -1,336 +0,0 @@
-other: 0.979
-network: 0.976
-instruction: 0.974
-device: 0.971
-semantic: 0.967
-assembly: 0.963
-socket: 0.957
-graphic: 0.948
-boot: 0.947
-vnc: 0.944
-mistranslation: 0.933
-KVM: 0.913
-
-[BUG] qemu git error with virgl
-
-Hello,
-
-i can't start any system if i use virgl. I get the following error:
-qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
-`con->gl' failed.
-./and.sh: line 27: 3337167 AbortedÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  qemu-x86_64 -m 4096
--smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
-virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
-intel-hda,id=sound0,msi=on -device
-hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
--device usb-tablet,bus=xhci.0 -net
-nic,macaddr=52:54:00:12:34:62,model=e1000 -net
-tap,ifname=$INTERFACE,script=no,downscript=no -drive
-file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
-Set 'tap3' nonpersistent
-
-i have bicected the issue:
-
-towo:Defiant> git bisect good
-b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
-commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
-Author: Paolo Bonzini <pbonzini@redhat.com>
-Date:Â Â  Tue Oct 27 08:44:23 2020 -0400
-
-Â Â Â  vl: remove separate preconfig main_loop
-Â Â Â  Move post-preconfig initialization to the x-exit-preconfig. If
-preconfig
-Â Â Â  is not requested, just exit preconfig mode immediately with the QMP
-Â Â Â  command.
-
-Â Â Â  As a result, the preconfig loop will run with accel_setup_post
-Â Â Â  and os_setup_post restrictions (xen_restrict, chroot, etc.)
-Â Â Â  already done.
-
-Â Â Â  Reviewed-by: Igor Mammedov <imammedo@redhat.com>
-Â Â Â  Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
-Â include/sysemu/runstate.h |Â  1 -
-Â monitor/qmp-cmds.cÂ Â Â Â Â Â Â  |Â  9 -----
-Â softmmu/vl.cÂ Â Â Â Â Â Â Â Â Â Â Â Â  | 95
-++++++++++++++++++++---------------------------
-Â 3 files changed, 41 insertions(+), 64 deletions(-)
-
-Regards,
-
-Torsten Wohlfarth
-
-Cc'ing Gerd + patch author/reviewer.
-
-On 1/2/21 2:11 PM, Torsten Wohlfarth wrote:
->
-Hello,
->
->
-i can't start any system if i use virgl. I get the following error:
->
->
-qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
->
-`con->gl' failed.
->
-./and.sh: line 27: 3337167 AbortedÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  qemu-x86_64 -m 4096
->
--smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
->
-virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
->
-intel-hda,id=sound0,msi=on -device
->
-hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
->
--device usb-tablet,bus=xhci.0 -net
->
-nic,macaddr=52:54:00:12:34:62,model=e1000 -net
->
-tap,ifname=$INTERFACE,script=no,downscript=no -drive
->
-file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
->
->
-Set 'tap3' nonpersistent
->
->
-i have bicected the issue:
->
->
-towo:Defiant> git bisect good
->
-b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
->
-commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
->
-Author: Paolo Bonzini <pbonzini@redhat.com>
->
-Date:Â Â  Tue Oct 27 08:44:23 2020 -0400
->
->
-Â Â Â  vl: remove separate preconfig main_loop
->
->
-Â Â Â  Move post-preconfig initialization to the x-exit-preconfig. If
->
-preconfig
->
-Â Â Â  is not requested, just exit preconfig mode immediately with the QMP
->
-Â Â Â  command.
->
->
-Â Â Â  As a result, the preconfig loop will run with accel_setup_post
->
-Â Â Â  and os_setup_post restrictions (xen_restrict, chroot, etc.)
->
-Â Â Â  already done.
->
->
-Â Â Â  Reviewed-by: Igor Mammedov <imammedo@redhat.com>
->
-Â Â Â  Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
->
->
-Â include/sysemu/runstate.h |Â  1 -
->
-Â monitor/qmp-cmds.cÂ Â Â Â Â Â Â  |Â  9 -----
->
-Â softmmu/vl.cÂ Â Â Â Â Â Â Â Â Â Â Â Â  | 95
->
-++++++++++++++++++++---------------------------
->
-Â 3 files changed, 41 insertions(+), 64 deletions(-)
->
->
-Regards,
->
->
-Torsten Wohlfarth
->
->
->
-
-On Sun, 3 Jan 2021 18:28:11 +0100
-Philippe Mathieu-DaudÃ© <philmd@redhat.com> wrote:
-
->
-Cc'ing Gerd + patch author/reviewer.
->
->
-On 1/2/21 2:11 PM, Torsten Wohlfarth wrote:
->
-> Hello,
->
->
->
-> i can't start any system if i use virgl. I get the following error:
->
->
->
-> qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
->
-> `con->gl' failed.
-Does following fix issue:
-  [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration
-
->
-> ./and.sh: line 27: 3337167 AbortedÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  qemu-x86_64 -m 4096
->
-> -smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
->
-> virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
->
-> intel-hda,id=sound0,msi=on -device
->
-> hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
->
-> -device usb-tablet,bus=xhci.0 -net
->
-> nic,macaddr=52:54:00:12:34:62,model=e1000 -net
->
-> tap,ifname=$INTERFACE,script=no,downscript=no -drive
->
-> file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
->
->
->
-> Set 'tap3' nonpersistent
->
->
->
-> i have bicected the issue:
->
->
->
-> towo:Defiant> git bisect good
->
-> b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
->
-> commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
->
-> Author: Paolo Bonzini <pbonzini@redhat.com>
->
-> Date:Â Â  Tue Oct 27 08:44:23 2020 -0400
->
->
->
-> Â Â Â  vl: remove separate preconfig main_loop
->
->
->
-> Â Â Â  Move post-preconfig initialization to the x-exit-preconfig. If
->
-> preconfig
->
-> Â Â Â  is not requested, just exit preconfig mode immediately with the QMP
->
-> Â Â Â  command.
->
->
->
-> Â Â Â  As a result, the preconfig loop will run with accel_setup_post
->
-> Â Â Â  and os_setup_post restrictions (xen_restrict, chroot, etc.)
->
-> Â Â Â  already done.
->
->
->
-> Â Â Â  Reviewed-by: Igor Mammedov <imammedo@redhat.com>
->
-> Â Â Â  Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
->
->
->
-> Â include/sysemu/runstate.h |Â  1 -
->
-> Â monitor/qmp-cmds.cÂ Â Â Â Â Â Â  |Â  9 -----
->
-> Â softmmu/vl.cÂ Â Â Â Â Â Â Â Â Â Â Â Â  | 95
->
-> ++++++++++++++++++++---------------------------
->
-> Â 3 files changed, 41 insertions(+), 64 deletions(-)
->
->
->
-> Regards,
->
->
->
-> Torsten Wohlfarth
->
->
->
->
->
->
->
->
-
-Hi Igor,
-
-yes, that fixes my issue.
-
-Regards, Torsten
-
-Am 04.01.21 um 19:50 schrieb Igor Mammedov:
-On Sun, 3 Jan 2021 18:28:11 +0100
-Philippe Mathieu-DaudÃ© <philmd@redhat.com> wrote:
-Cc'ing Gerd + patch author/reviewer.
-
-On 1/2/21 2:11 PM, Torsten Wohlfarth wrote:
-Hello,
-
-i can't start any system if i use virgl. I get the following error:
-
-qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
-`con->gl' failed.
-Does following fix issue:
-   [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration
-./and.sh: line 27: 3337167 AbortedÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  qemu-x86_64 -m 4096
--smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
-virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
-intel-hda,id=sound0,msi=on -device
-hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
--device usb-tablet,bus=xhci.0 -net
-nic,macaddr=52:54:00:12:34:62,model=e1000 -net
-tap,ifname=$INTERFACE,script=no,downscript=no -drive
-file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
-
-Set 'tap3' nonpersistent
-
-i have bicected the issue:
-towo:Defiant> git bisect good
-b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
-commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
-Author: Paolo Bonzini <pbonzini@redhat.com>
-Date:Â Â  Tue Oct 27 08:44:23 2020 -0400
-
- Â Â Â  vl: remove separate preconfig main_loop
-
- Â Â Â  Move post-preconfig initialization to the x-exit-preconfig. If
-preconfig
- Â Â Â  is not requested, just exit preconfig mode immediately with the QMP
- Â Â Â  command.
-
- Â Â Â  As a result, the preconfig loop will run with accel_setup_post
- Â Â Â  and os_setup_post restrictions (xen_restrict, chroot, etc.)
- Â Â Â  already done.
-
- Â Â Â  Reviewed-by: Igor Mammedov <imammedo@redhat.com>
- Â Â Â  Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- Â include/sysemu/runstate.h |Â  1 -
- Â monitor/qmp-cmds.cÂ Â Â Â Â Â Â  |Â  9 -----
- Â softmmu/vl.cÂ Â Â Â Â Â Â Â Â Â Â Â Â  | 95
-++++++++++++++++++++---------------------------
- Â 3 files changed, 41 insertions(+), 64 deletions(-)
-
-Regards,
-
-Torsten Wohlfarth
-
diff --git a/classification_output/04/other/21247035 b/classification_output/04/other/21247035
deleted file mode 100644
index 080554a95..000000000
--- a/classification_output/04/other/21247035
+++ /dev/null
@@ -1,1329 +0,0 @@
-other: 0.640
-mistranslation: 0.584
-device: 0.525
-KVM: 0.514
-instruction: 0.508
-graphic: 0.426
-assembly: 0.391
-semantic: 0.374
-vnc: 0.367
-boot: 0.345
-network: 0.322
-socket: 0.322
-
-[Qemu-devel] [BUG] I/O thread segfault for QEMU on s390x
-
-Hi,
-I have been noticing some segfaults for QEMU on s390x, and I have been
-hitting this issue quite reliably (at least once in 10 runs of a test
-case). The qemu version is 2.11.50, and I have systemd created coredumps
-when this happens.
-
-Here is a back trace of the segfaulting thread:
-
-
-#0  0x000003ffafed202c in swapcontext () from /lib64/libc.so.6
-#1  0x000002aa355c02ee in qemu_coroutine_new () at
-util/coroutine-ucontext.c:164
-#2  0x000002aa355bec34 in qemu_coroutine_create
-(address@hidden <blk_aio_read_entry>,
-address@hidden) at util/qemu-coroutine.c:76
-#3  0x000002aa35510262 in blk_aio_prwv (blk=0x2aa65fbefa0,
-offset=<optimized out>, bytes=<optimized out>, qiov=0x3ffa002a9c0,
-address@hidden <blk_aio_read_entry>, flags=0,
-cb=0x2aa35340a50 <virtio_blk_rw_complete>, opaque=0x3ffa002a960) at
-block/block-backend.c:1299
-#4  0x000002aa35510376 in blk_aio_preadv (blk=<optimized out>,
-offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
-cb=<optimized out>, opaque=0x3ffa002a960) at block/block-backend.c:1392
-#5  0x000002aa3534114e in submit_requests (niov=<optimized out>,
-num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>,
-blk=<optimized out>) at
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:372
-#6  virtio_blk_submit_multireq (blk=<optimized out>,
-address@hidden) at
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:402
-#7  0x000002aa353422e0 in virtio_blk_handle_vq (s=0x2aa6611e7d8,
-vq=0x3ffb0f5f010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
-#8  0x000002aa3536655a in virtio_queue_notify_aio_vq
-(address@hidden) at
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
-#9  0x000002aa35366cd6 in virtio_queue_notify_aio_vq (vq=0x3ffb0f5f010)
-at /usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1511
-#10 virtio_queue_host_notifier_aio_poll (opaque=0x3ffb0f5f078) at
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:2409
-#11 0x000002aa355a8ba4 in run_poll_handlers_once
-(address@hidden) at util/aio-posix.c:497
-#12 0x000002aa355a9b74 in run_poll_handlers (max_ns=<optimized out>,
-ctx=0x2aa65f99310) at util/aio-posix.c:534
-#13 try_poll_mode (blocking=true, ctx=0x2aa65f99310) at util/aio-posix.c:562
-#14 aio_poll (ctx=0x2aa65f99310, address@hidden) at
-util/aio-posix.c:602
-#15 0x000002aa353d2d0a in iothread_run (opaque=0x2aa65f990f0) at
-iothread.c:60
-#16 0x000003ffb0f07e82 in start_thread () from /lib64/libpthread.so.0
-#17 0x000003ffaff91596 in thread_start () from /lib64/libc.so.6
-I don't have much knowledge about i/o threads and the block layer code
-in QEMU, so I would like to report to the community about this issue.
-I believe this very similar to the bug that I reported upstream couple
-of days ago
-(
-https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04452.html
-).
-Any help would be greatly appreciated.
-
-Thanks
-Farhan
-
-On Thu, Mar 1, 2018 at 10:33 PM, Farhan Ali <address@hidden> wrote:
->
-Hi,
->
->
-I have been noticing some segfaults for QEMU on s390x, and I have been
->
-hitting this issue quite reliably (at least once in 10 runs of a test case).
->
-The qemu version is 2.11.50, and I have systemd created coredumps
->
-when this happens.
-Can you describe the test case or suggest how to reproduce it for us?
-
-Fam
-
-On 03/02/2018 01:13 AM, Fam Zheng wrote:
-On Thu, Mar 1, 2018 at 10:33 PM, Farhan Ali <address@hidden> wrote:
-Hi,
-
-I have been noticing some segfaults for QEMU on s390x, and I have been
-hitting this issue quite reliably (at least once in 10 runs of a test case).
-The qemu version is 2.11.50, and I have systemd created coredumps
-when this happens.
-Can you describe the test case or suggest how to reproduce it for us?
-
-Fam
-The test case is with a single guest, running a memory intensive
-workload. The guest has 8 vpcus and 4G of memory.
-Here is the qemu command line, if that helps:
-
-/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \
--S -object
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes
-\
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \
--m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \
--object iothread,id=iothread1 -object iothread,id=iothread2 -uuid
-b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \
--display none -no-user-config -nodefaults -chardev
-socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
--no-shutdown \
--boot strict=on -drive
-file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
--drive
-file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1
--netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000
--chardev pty,id=charconsole0 -device
-sclpconsole,chardev=charconsole0,id=console0 -device
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on
-Please let me know if I need to provide any other information.
-
-Thanks
-Farhan
-
-On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote:
->
-Hi,
->
->
-I have been noticing some segfaults for QEMU on s390x, and I have been
->
-hitting this issue quite reliably (at least once in 10 runs of a test case).
->
-The qemu version is 2.11.50, and I have systemd created coredumps
->
-when this happens.
->
->
-Here is a back trace of the segfaulting thread:
-The backtrace looks normal.
-
-Please post the QEMU command-line and the details of the segfault (which
-memory access faulted?).
-
->
-#0  0x000003ffafed202c in swapcontext () from /lib64/libc.so.6
->
-#1  0x000002aa355c02ee in qemu_coroutine_new () at
->
-util/coroutine-ucontext.c:164
->
-#2  0x000002aa355bec34 in qemu_coroutine_create
->
-(address@hidden <blk_aio_read_entry>,
->
-address@hidden) at util/qemu-coroutine.c:76
->
-#3  0x000002aa35510262 in blk_aio_prwv (blk=0x2aa65fbefa0, offset=<optimized
->
-out>, bytes=<optimized out>, qiov=0x3ffa002a9c0,
->
-address@hidden <blk_aio_read_entry>, flags=0,
->
-cb=0x2aa35340a50 <virtio_blk_rw_complete>, opaque=0x3ffa002a960) at
->
-block/block-backend.c:1299
->
-#4  0x000002aa35510376 in blk_aio_preadv (blk=<optimized out>,
->
-offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
->
-cb=<optimized out>, opaque=0x3ffa002a960) at block/block-backend.c:1392
->
-#5  0x000002aa3534114e in submit_requests (niov=<optimized out>,
->
-num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>,
->
-blk=<optimized out>) at
->
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:372
->
-#6  virtio_blk_submit_multireq (blk=<optimized out>,
->
-address@hidden) at
->
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:402
->
-#7  0x000002aa353422e0 in virtio_blk_handle_vq (s=0x2aa6611e7d8,
->
-vq=0x3ffb0f5f010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
->
-#8  0x000002aa3536655a in virtio_queue_notify_aio_vq
->
-(address@hidden) at
->
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
->
-#9  0x000002aa35366cd6 in virtio_queue_notify_aio_vq (vq=0x3ffb0f5f010) at
->
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1511
->
-#10 virtio_queue_host_notifier_aio_poll (opaque=0x3ffb0f5f078) at
->
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:2409
->
-#11 0x000002aa355a8ba4 in run_poll_handlers_once
->
-(address@hidden) at util/aio-posix.c:497
->
-#12 0x000002aa355a9b74 in run_poll_handlers (max_ns=<optimized out>,
->
-ctx=0x2aa65f99310) at util/aio-posix.c:534
->
-#13 try_poll_mode (blocking=true, ctx=0x2aa65f99310) at util/aio-posix.c:562
->
-#14 aio_poll (ctx=0x2aa65f99310, address@hidden) at
->
-util/aio-posix.c:602
->
-#15 0x000002aa353d2d0a in iothread_run (opaque=0x2aa65f990f0) at
->
-iothread.c:60
->
-#16 0x000003ffb0f07e82 in start_thread () from /lib64/libpthread.so.0
->
-#17 0x000003ffaff91596 in thread_start () from /lib64/libc.so.6
->
->
->
-I don't have much knowledge about i/o threads and the block layer code in
->
-QEMU, so I would like to report to the community about this issue.
->
-I believe this very similar to the bug that I reported upstream couple of
->
-days ago
->
-(
-https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04452.html
-).
->
->
-Any help would be greatly appreciated.
->
->
-Thanks
->
-Farhan
->
-signature.asc
-Description:
-PGP signature
-
-On 03/02/2018 04:23 AM, Stefan Hajnoczi wrote:
-On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote:
-Hi,
-
-I have been noticing some segfaults for QEMU on s390x, and I have been
-hitting this issue quite reliably (at least once in 10 runs of a test case).
-The qemu version is 2.11.50, and I have systemd created coredumps
-when this happens.
-
-Here is a back trace of the segfaulting thread:
-The backtrace looks normal.
-
-Please post the QEMU command-line and the details of the segfault (which
-memory access faulted?).
-I was able to create another crash today and here is the qemu comand line
-
-/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \
--S -object
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes
-\
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \
--m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \
--object iothread,id=iothread1 -object iothread,id=iothread2 -uuid
-b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \
--display none -no-user-config -nodefaults -chardev
-socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
--no-shutdown \
--boot strict=on -drive
-file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
--drive
-file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1
--netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000
--chardev pty,id=charconsole0 -device
-sclpconsole,chardev=charconsole0,id=console0 -device
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on
-This the latest back trace on the segfaulting thread, and it seems to
-segfault in swapcontext.
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
-
-
-This is the remaining back trace:
-
-#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
-#1  0x000002aa33b45566 in qemu_coroutine_new () at
-util/coroutine-ucontext.c:164
-#2  0x000002aa33b43eac in qemu_coroutine_create
-(address@hidden <blk_aio_write_entry>,
-address@hidden) at util/qemu-coroutine.c:76
-#3  0x000002aa33a954da in blk_aio_prwv (blk=0x2aa4f0efda0,
-offset=<optimized out>, bytes=<optimized out>, qiov=0x3ff74019080,
-address@hidden <blk_aio_write_entry>, flags=0,
-cb=0x2aa338c62e8 <virtio_blk_rw_complete>, opaque=0x3ff74019020) at
-block/block-backend.c:1299
-#4  0x000002aa33a9563e in blk_aio_pwritev (blk=<optimized out>,
-offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
-cb=<optimized out>, opaque=0x3ff74019020) at block/block-backend.c:1400
-#5  0x000002aa338c6a38 in submit_requests (niov=<optimized out>,
-num_reqs=1, start=<optimized out>, mrb=0x3ff831fe6e0, blk=<optimized
-out>) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:369
-#6  virtio_blk_submit_multireq (blk=<optimized out>,
-address@hidden) at
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:426
-#7  0x000002aa338c7b78 in virtio_blk_handle_vq (s=0x2aa4f2507c8,
-vq=0x3ff869df010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
-#8  0x000002aa338ebdf2 in virtio_queue_notify_aio_vq (vq=0x3ff869df010)
-at /usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
-#9  0x000002aa33b2df46 in aio_dispatch_handlers
-(address@hidden) at util/aio-posix.c:406
-#10 0x000002aa33b2eb50 in aio_poll (ctx=0x2aa4f0ca050,
-address@hidden) at util/aio-posix.c:692
-#11 0x000002aa33957f6a in iothread_run (opaque=0x2aa4f0c9630) at
-iothread.c:60
-#12 0x000003ff86987e82 in start_thread () from /lib64/libpthread.so.0
-#13 0x000003ff85a11596 in thread_start () from /lib64/libc.so.6
-Backtrace stopped: previous frame identical to this frame (corrupt stack?)
-
-On Fri, Mar 02, 2018 at 10:30:57AM -0500, Farhan Ali wrote:
->
->
->
-On 03/02/2018 04:23 AM, Stefan Hajnoczi wrote:
->
-> On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote:
->
-> > Hi,
->
-> >
->
-> > I have been noticing some segfaults for QEMU on s390x, and I have been
->
-> > hitting this issue quite reliably (at least once in 10 runs of a test
->
-> > case).
->
-> > The qemu version is 2.11.50, and I have systemd created coredumps
->
-> > when this happens.
->
-> >
->
-> > Here is a back trace of the segfaulting thread:
->
-> The backtrace looks normal.
->
->
->
-> Please post the QEMU command-line and the details of the segfault (which
->
-> memory access faulted?).
->
->
->
->
->
-I was able to create another crash today and here is the qemu comand line
->
->
-/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \
->
--S -object
->
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes
->
-\
->
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \
->
--m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \
->
--object iothread,id=iothread1 -object iothread,id=iothread2 -uuid
->
-b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \
->
--display none -no-user-config -nodefaults -chardev
->
-socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait
->
->
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
->
-\
->
--boot strict=on -drive
->
-file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
->
--device
->
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
->
--drive
->
-file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native
->
--device
->
-virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1
->
--netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device
->
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000
->
--chardev pty,id=charconsole0 -device
->
-sclpconsole,chardev=charconsole0,id=console0 -device
->
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on
->
->
->
-This the latest back trace on the segfaulting thread, and it seems to
->
-segfault in swapcontext.
->
->
-Program terminated with signal SIGSEGV, Segmentation fault.
->
-#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
-Please include the following gdb output:
-
-  (gdb) disas swapcontext
-  (gdb) i r
-
-That way it's possible to see which instruction faulted and which
-registers were being accessed.
-
->
-This is the remaining back trace:
->
->
-#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
->
-#1  0x000002aa33b45566 in qemu_coroutine_new () at
->
-util/coroutine-ucontext.c:164
->
-#2  0x000002aa33b43eac in qemu_coroutine_create
->
-(address@hidden <blk_aio_write_entry>,
->
-address@hidden) at util/qemu-coroutine.c:76
->
-#3  0x000002aa33a954da in blk_aio_prwv (blk=0x2aa4f0efda0, offset=<optimized
->
-out>, bytes=<optimized out>, qiov=0x3ff74019080,
->
-address@hidden <blk_aio_write_entry>, flags=0,
->
-cb=0x2aa338c62e8 <virtio_blk_rw_complete>, opaque=0x3ff74019020) at
->
-block/block-backend.c:1299
->
-#4  0x000002aa33a9563e in blk_aio_pwritev (blk=<optimized out>,
->
-offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
->
-cb=<optimized out>, opaque=0x3ff74019020) at block/block-backend.c:1400
->
-#5  0x000002aa338c6a38 in submit_requests (niov=<optimized out>, num_reqs=1,
->
-start=<optimized out>, mrb=0x3ff831fe6e0, blk=<optimized out>) at
->
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:369
->
-#6  virtio_blk_submit_multireq (blk=<optimized out>,
->
-address@hidden) at
->
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:426
->
-#7  0x000002aa338c7b78 in virtio_blk_handle_vq (s=0x2aa4f2507c8,
->
-vq=0x3ff869df010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
->
-#8  0x000002aa338ebdf2 in virtio_queue_notify_aio_vq (vq=0x3ff869df010) at
->
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
->
-#9  0x000002aa33b2df46 in aio_dispatch_handlers
->
-(address@hidden) at util/aio-posix.c:406
->
-#10 0x000002aa33b2eb50 in aio_poll (ctx=0x2aa4f0ca050,
->
-address@hidden) at util/aio-posix.c:692
->
-#11 0x000002aa33957f6a in iothread_run (opaque=0x2aa4f0c9630) at
->
-iothread.c:60
->
-#12 0x000003ff86987e82 in start_thread () from /lib64/libpthread.so.0
->
-#13 0x000003ff85a11596 in thread_start () from /lib64/libc.so.6
->
-Backtrace stopped: previous frame identical to this frame (corrupt stack?)
->
-signature.asc
-Description:
-PGP signature
-
-On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
-Please include the following gdb output:
-
-   (gdb) disas swapcontext
-   (gdb) i r
-
-That way it's possible to see which instruction faulted and which
-registers were being accessed.
-here is the disas out for swapcontext, this is on a coredump with
-debugging symbols enabled for qemu. So the addresses from the previous
-dump is a little different.
-(gdb) disas swapcontext
-Dump of assembler code for function swapcontext:
-   0x000003ff90751fb8 <+0>:       lgr     %r1,%r2
-   0x000003ff90751fbc <+4>:       lgr     %r0,%r3
-   0x000003ff90751fc0 <+8>:       stfpc   248(%r1)
-   0x000003ff90751fc4 <+12>:      std     %f0,256(%r1)
-   0x000003ff90751fc8 <+16>:      std     %f1,264(%r1)
-   0x000003ff90751fcc <+20>:      std     %f2,272(%r1)
-   0x000003ff90751fd0 <+24>:      std     %f3,280(%r1)
-   0x000003ff90751fd4 <+28>:      std     %f4,288(%r1)
-   0x000003ff90751fd8 <+32>:      std     %f5,296(%r1)
-   0x000003ff90751fdc <+36>:      std     %f6,304(%r1)
-   0x000003ff90751fe0 <+40>:      std     %f7,312(%r1)
-   0x000003ff90751fe4 <+44>:      std     %f8,320(%r1)
-   0x000003ff90751fe8 <+48>:      std     %f9,328(%r1)
-   0x000003ff90751fec <+52>:      std     %f10,336(%r1)
-   0x000003ff90751ff0 <+56>:      std     %f11,344(%r1)
-   0x000003ff90751ff4 <+60>:      std     %f12,352(%r1)
-   0x000003ff90751ff8 <+64>:      std     %f13,360(%r1)
-   0x000003ff90751ffc <+68>:      std     %f14,368(%r1)
-   0x000003ff90752000 <+72>:      std     %f15,376(%r1)
-   0x000003ff90752004 <+76>:      slgr    %r2,%r2
-   0x000003ff90752008 <+80>:      stam    %a0,%a15,184(%r1)
-   0x000003ff9075200c <+84>:      stmg    %r0,%r15,56(%r1)
-   0x000003ff90752012 <+90>:      la      %r2,2
-   0x000003ff90752016 <+94>:      lgr     %r5,%r0
-   0x000003ff9075201a <+98>:      la      %r3,384(%r5)
-   0x000003ff9075201e <+102>:     la      %r4,384(%r1)
-   0x000003ff90752022 <+106>:     lghi    %r5,8
-   0x000003ff90752026 <+110>:     svc     175
-   0x000003ff90752028 <+112>:     lgr     %r5,%r0
-=> 0x000003ff9075202c <+116>:  lfpc    248(%r5)
-   0x000003ff90752030 <+120>:     ld      %f0,256(%r5)
-   0x000003ff90752034 <+124>:     ld      %f1,264(%r5)
-   0x000003ff90752038 <+128>:     ld      %f2,272(%r5)
-   0x000003ff9075203c <+132>:     ld      %f3,280(%r5)
-   0x000003ff90752040 <+136>:     ld      %f4,288(%r5)
-   0x000003ff90752044 <+140>:     ld      %f5,296(%r5)
-   0x000003ff90752048 <+144>:     ld      %f6,304(%r5)
-   0x000003ff9075204c <+148>:     ld      %f7,312(%r5)
-   0x000003ff90752050 <+152>:     ld      %f8,320(%r5)
-   0x000003ff90752054 <+156>:     ld      %f9,328(%r5)
-   0x000003ff90752058 <+160>:     ld      %f10,336(%r5)
-   0x000003ff9075205c <+164>:     ld      %f11,344(%r5)
-   0x000003ff90752060 <+168>:     ld      %f12,352(%r5)
-   0x000003ff90752064 <+172>:     ld      %f13,360(%r5)
-   0x000003ff90752068 <+176>:     ld      %f14,368(%r5)
-   0x000003ff9075206c <+180>:     ld      %f15,376(%r5)
-   0x000003ff90752070 <+184>:     lam     %a2,%a15,192(%r5)
-   0x000003ff90752074 <+188>:     lmg     %r0,%r15,56(%r5)
-   0x000003ff9075207a <+194>:     br      %r14
-End of assembler dump.
-
-(gdb) i r
-r0             0x0      0
-r1             0x3ff8fe7de40    4396165881408
-r2             0x0      0
-r3             0x3ff8fe7e1c0    4396165882304
-r4             0x3ff8fe7dfc0    4396165881792
-r5             0x0      0
-r6             0xffffffff88004880       18446744071696304256
-r7             0x3ff880009e0    4396033247712
-r8             0x27ff89000      10736930816
-r9             0x3ff88001460    4396033250400
-r10            0x1000   4096
-r11            0x1261be0        19274720
-r12            0x3ff88001e00    4396033252864
-r13            0x14d0bc0        21826496
-r14            0x1312ac8        19999432
-r15            0x3ff8fe7dc80    4396165880960
-pc             0x3ff9075202c    0x3ff9075202c <swapcontext+116>
-cc             0x2      2
-
-On 03/05/2018 07:45 PM, Farhan Ali wrote:
->
->
->
-On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
->
-> Please include the following gdb output:
->
->
->
-> Â Â  (gdb) disas swapcontext
->
-> Â Â  (gdb) i r
->
->
->
-> That way it's possible to see which instruction faulted and which
->
-> registers were being accessed.
->
->
->
-here is the disas out for swapcontext, this is on a coredump with debugging
->
-symbols enabled for qemu. So the addresses from the previous dump is a little
->
-different.
->
->
->
-(gdb) disas swapcontext
->
-Dump of assembler code for function swapcontext:
->
-Â Â  0x000003ff90751fb8 <+0>:Â Â Â  lgrÂ Â Â  %r1,%r2
->
-Â Â  0x000003ff90751fbc <+4>:Â Â Â  lgrÂ Â Â  %r0,%r3
->
-Â Â  0x000003ff90751fc0 <+8>:Â Â Â  stfpcÂ Â Â  248(%r1)
->
-Â Â  0x000003ff90751fc4 <+12>:Â Â Â  stdÂ Â Â  %f0,256(%r1)
->
-Â Â  0x000003ff90751fc8 <+16>:Â Â Â  stdÂ Â Â  %f1,264(%r1)
->
-Â Â  0x000003ff90751fcc <+20>:Â Â Â  stdÂ Â Â  %f2,272(%r1)
->
-Â Â  0x000003ff90751fd0 <+24>:Â Â Â  stdÂ Â Â  %f3,280(%r1)
->
-Â Â  0x000003ff90751fd4 <+28>:Â Â Â  stdÂ Â Â  %f4,288(%r1)
->
-Â Â  0x000003ff90751fd8 <+32>:Â Â Â  stdÂ Â Â  %f5,296(%r1)
->
-Â Â  0x000003ff90751fdc <+36>:Â Â Â  stdÂ Â Â  %f6,304(%r1)
->
-Â Â  0x000003ff90751fe0 <+40>:Â Â Â  stdÂ Â Â  %f7,312(%r1)
->
-Â Â  0x000003ff90751fe4 <+44>:Â Â Â  stdÂ Â Â  %f8,320(%r1)
->
-Â Â  0x000003ff90751fe8 <+48>:Â Â Â  stdÂ Â Â  %f9,328(%r1)
->
-Â Â  0x000003ff90751fec <+52>:Â Â Â  stdÂ Â Â  %f10,336(%r1)
->
-Â Â  0x000003ff90751ff0 <+56>:Â Â Â  stdÂ Â Â  %f11,344(%r1)
->
-Â Â  0x000003ff90751ff4 <+60>:Â Â Â  stdÂ Â Â  %f12,352(%r1)
->
-Â Â  0x000003ff90751ff8 <+64>:Â Â Â  stdÂ Â Â  %f13,360(%r1)
->
-Â Â  0x000003ff90751ffc <+68>:Â Â Â  stdÂ Â Â  %f14,368(%r1)
->
-Â Â  0x000003ff90752000 <+72>:Â Â Â  stdÂ Â Â  %f15,376(%r1)
->
-Â Â  0x000003ff90752004 <+76>:Â Â Â  slgrÂ Â Â  %r2,%r2
->
-Â Â  0x000003ff90752008 <+80>:Â Â Â  stamÂ Â Â  %a0,%a15,184(%r1)
->
-Â Â  0x000003ff9075200c <+84>:Â Â Â  stmgÂ Â Â  %r0,%r15,56(%r1)
->
-Â Â  0x000003ff90752012 <+90>:Â Â Â  laÂ Â Â  %r2,2
->
-Â Â  0x000003ff90752016 <+94>:Â Â Â  lgrÂ Â Â  %r5,%r0
->
-Â Â  0x000003ff9075201a <+98>:Â Â Â  laÂ Â Â  %r3,384(%r5)
->
-Â Â  0x000003ff9075201e <+102>:Â Â Â  laÂ Â Â  %r4,384(%r1)
->
-Â Â  0x000003ff90752022 <+106>:Â Â Â  lghiÂ Â Â  %r5,8
->
-Â Â  0x000003ff90752026 <+110>:Â Â Â  svcÂ Â Â  175
-sys_rt_sigprocmask. r0 should not be changed by the system call.
-
->
-Â Â  0x000003ff90752028 <+112>:Â Â Â  lgrÂ Â Â  %r5,%r0
->
-=> 0x000003ff9075202c <+116>:Â Â Â  lfpcÂ Â Â  248(%r5)
-so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the 
-2nd parameter to this
-function). Now this is odd.
-
->
-Â Â  0x000003ff90752030 <+120>:Â Â Â  ldÂ Â Â  %f0,256(%r5)
->
-Â Â  0x000003ff90752034 <+124>:Â Â Â  ldÂ Â Â  %f1,264(%r5)
->
-Â Â  0x000003ff90752038 <+128>:Â Â Â  ldÂ Â Â  %f2,272(%r5)
->
-Â Â  0x000003ff9075203c <+132>:Â Â Â  ldÂ Â Â  %f3,280(%r5)
->
-Â Â  0x000003ff90752040 <+136>:Â Â Â  ldÂ Â Â  %f4,288(%r5)
->
-Â Â  0x000003ff90752044 <+140>:Â Â Â  ldÂ Â Â  %f5,296(%r5)
->
-Â Â  0x000003ff90752048 <+144>:Â Â Â  ldÂ Â Â  %f6,304(%r5)
->
-Â Â  0x000003ff9075204c <+148>:Â Â Â  ldÂ Â Â  %f7,312(%r5)
->
-Â Â  0x000003ff90752050 <+152>:Â Â Â  ldÂ Â Â  %f8,320(%r5)
->
-Â Â  0x000003ff90752054 <+156>:Â Â Â  ldÂ Â Â  %f9,328(%r5)
->
-Â Â  0x000003ff90752058 <+160>:Â Â Â  ldÂ Â Â  %f10,336(%r5)
->
-Â Â  0x000003ff9075205c <+164>:Â Â Â  ldÂ Â Â  %f11,344(%r5)
->
-Â Â  0x000003ff90752060 <+168>:Â Â Â  ldÂ Â Â  %f12,352(%r5)
->
-Â Â  0x000003ff90752064 <+172>:Â Â Â  ldÂ Â Â  %f13,360(%r5)
->
-Â Â  0x000003ff90752068 <+176>:Â Â Â  ldÂ Â Â  %f14,368(%r5)
->
-Â Â  0x000003ff9075206c <+180>:Â Â Â  ldÂ Â Â  %f15,376(%r5)
->
-Â Â  0x000003ff90752070 <+184>:Â Â Â  lamÂ Â Â  %a2,%a15,192(%r5)
->
-Â Â  0x000003ff90752074 <+188>:Â Â Â  lmgÂ Â Â  %r0,%r15,56(%r5)
->
-Â Â  0x000003ff9075207a <+194>:Â Â Â  brÂ Â Â  %r14
->
-End of assembler dump.
->
->
-(gdb) i r
->
-r0Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
->
-r1Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7de40Â Â Â  4396165881408
->
-r2Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
->
-r3Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7e1c0Â Â Â  4396165882304
->
-r4Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dfc0Â Â Â  4396165881792
->
-r5Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
->
-r6Â Â Â Â Â Â Â Â Â Â Â Â  0xffffffff88004880Â Â Â  18446744071696304256
->
-r7Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff880009e0Â Â Â  4396033247712
->
-r8Â Â Â Â Â Â Â Â Â Â Â Â  0x27ff89000Â Â Â  10736930816
->
-r9Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001460Â Â Â  4396033250400
->
-r10Â Â Â Â Â Â Â Â Â Â Â  0x1000Â Â Â  4096
->
-r11Â Â Â Â Â Â Â Â Â Â Â  0x1261be0Â Â Â  19274720
->
-r12Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001e00Â Â Â  4396033252864
->
-r13Â Â Â Â Â Â Â Â Â Â Â  0x14d0bc0Â Â Â  21826496
->
-r14Â Â Â Â Â Â Â Â Â Â Â  0x1312ac8Â Â Â  19999432
->
-r15Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dc80Â Â Â  4396165880960
->
-pcÂ Â Â Â Â Â Â Â Â Â Â Â  0x3ff9075202cÂ Â Â  0x3ff9075202c <swapcontext+116>
->
-ccÂ Â Â Â Â Â Â Â Â Â Â Â  0x2Â Â Â  2
-
-On 5 March 2018 at 18:54, Christian Borntraeger <address@hidden> wrote:
->
->
->
-On 03/05/2018 07:45 PM, Farhan Ali wrote:
->
->    0x000003ff90752026 <+110>:    svc    175
->
->
-sys_rt_sigprocmask. r0 should not be changed by the system call.
->
->
->    0x000003ff90752028 <+112>:    lgr    %r5,%r0
->
-> => 0x000003ff9075202c <+116>:    lfpc    248(%r5)
->
->
-so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the
->
-2nd parameter to this
->
-function). Now this is odd.
-...particularly given that the only place we call swapcontext()
-the second parameter is always the address of a local variable
-and can't be 0...
-
-thanks
--- PMM
-
-Do you happen to run with a recent host kernel that has 
-
-commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
-    s390: scrub registers on kernel entry and KVM exit
-
-
-
-
-
-Can you run with this on top
-diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
-index 13a133a6015c..d6dc0e5e8f74 100644
---- a/arch/s390/kernel/entry.S
-+++ b/arch/s390/kernel/entry.S
-@@ -426,13 +426,13 @@ ENTRY(system_call)
-        UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
-        BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
-        stmg    %r0,%r7,__PT_R0(%r11)
--       # clear user controlled register to prevent speculative use
--       xgr     %r0,%r0
-        mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
-        mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
-        mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
-        stg     %r14,__PT_FLAGS(%r11)
- .Lsysc_do_svc:
-+       # clear user controlled register to prevent speculative use
-+       xgr     %r0,%r0
-        # load address of system call table
-        lg      %r10,__THREAD_sysc_table(%r13,%r12)
-        llgh    %r8,__PT_INT_CODE+2(%r11)
-
-
-To me it looks like that the critical section cleanup (interrupt during system 
-call entry) might
-save the registers again into ptregs but we have already zeroed out r0.
-This patch moves the clearing of r0 after sysc_do_svc, which should fix the 
-critical
-section cleanup.
-
-Adding Martin and Heiko. Will spin a patch.
-
-
-On 03/05/2018 07:54 PM, Christian Borntraeger wrote:
->
->
->
-On 03/05/2018 07:45 PM, Farhan Ali wrote:
->
->
->
->
->
-> On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
->
->> Please include the following gdb output:
->
->>
->
->> Â Â  (gdb) disas swapcontext
->
->> Â Â  (gdb) i r
->
->>
->
->> That way it's possible to see which instruction faulted and which
->
->> registers were being accessed.
->
->
->
->
->
-> here is the disas out for swapcontext, this is on a coredump with debugging
->
-> symbols enabled for qemu. So the addresses from the previous dump is a
->
-> little different.
->
->
->
->
->
-> (gdb) disas swapcontext
->
-> Dump of assembler code for function swapcontext:
->
-> Â Â  0x000003ff90751fb8 <+0>:Â Â Â  lgrÂ Â Â  %r1,%r2
->
-> Â Â  0x000003ff90751fbc <+4>:Â Â Â  lgrÂ Â Â  %r0,%r3
->
-> Â Â  0x000003ff90751fc0 <+8>:Â Â Â  stfpcÂ Â Â  248(%r1)
->
-> Â Â  0x000003ff90751fc4 <+12>:Â Â Â  stdÂ Â Â  %f0,256(%r1)
->
-> Â Â  0x000003ff90751fc8 <+16>:Â Â Â  stdÂ Â Â  %f1,264(%r1)
->
-> Â Â  0x000003ff90751fcc <+20>:Â Â Â  stdÂ Â Â  %f2,272(%r1)
->
-> Â Â  0x000003ff90751fd0 <+24>:Â Â Â  stdÂ Â Â  %f3,280(%r1)
->
-> Â Â  0x000003ff90751fd4 <+28>:Â Â Â  stdÂ Â Â  %f4,288(%r1)
->
-> Â Â  0x000003ff90751fd8 <+32>:Â Â Â  stdÂ Â Â  %f5,296(%r1)
->
-> Â Â  0x000003ff90751fdc <+36>:Â Â Â  stdÂ Â Â  %f6,304(%r1)
->
-> Â Â  0x000003ff90751fe0 <+40>:Â Â Â  stdÂ Â Â  %f7,312(%r1)
->
-> Â Â  0x000003ff90751fe4 <+44>:Â Â Â  stdÂ Â Â  %f8,320(%r1)
->
-> Â Â  0x000003ff90751fe8 <+48>:Â Â Â  stdÂ Â Â  %f9,328(%r1)
->
-> Â Â  0x000003ff90751fec <+52>:Â Â Â  stdÂ Â Â  %f10,336(%r1)
->
-> Â Â  0x000003ff90751ff0 <+56>:Â Â Â  stdÂ Â Â  %f11,344(%r1)
->
-> Â Â  0x000003ff90751ff4 <+60>:Â Â Â  stdÂ Â Â  %f12,352(%r1)
->
-> Â Â  0x000003ff90751ff8 <+64>:Â Â Â  stdÂ Â Â  %f13,360(%r1)
->
-> Â Â  0x000003ff90751ffc <+68>:Â Â Â  stdÂ Â Â  %f14,368(%r1)
->
-> Â Â  0x000003ff90752000 <+72>:Â Â Â  stdÂ Â Â  %f15,376(%r1)
->
-> Â Â  0x000003ff90752004 <+76>:Â Â Â  slgrÂ Â Â  %r2,%r2
->
-> Â Â  0x000003ff90752008 <+80>:Â Â Â  stamÂ Â Â  %a0,%a15,184(%r1)
->
-> Â Â  0x000003ff9075200c <+84>:Â Â Â  stmgÂ Â Â  %r0,%r15,56(%r1)
->
-> Â Â  0x000003ff90752012 <+90>:Â Â Â  laÂ Â Â  %r2,2
->
-> Â Â  0x000003ff90752016 <+94>:Â Â Â  lgrÂ Â Â  %r5,%r0
->
-> Â Â  0x000003ff9075201a <+98>:Â Â Â  laÂ Â Â  %r3,384(%r5)
->
-> Â Â  0x000003ff9075201e <+102>:Â Â Â  laÂ Â Â  %r4,384(%r1)
->
-> Â Â  0x000003ff90752022 <+106>:Â Â Â  lghiÂ Â Â  %r5,8
->
-> Â Â  0x000003ff90752026 <+110>:Â Â Â  svcÂ Â Â  175
->
->
-sys_rt_sigprocmask. r0 should not be changed by the system call.
->
->
-> Â Â  0x000003ff90752028 <+112>:Â Â Â  lgrÂ Â Â  %r5,%r0
->
-> => 0x000003ff9075202c <+116>:Â Â Â  lfpcÂ Â Â  248(%r5)
->
->
-so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the
->
-2nd parameter to this
->
-function). Now this is odd.
->
->
-> Â Â  0x000003ff90752030 <+120>:Â Â Â  ldÂ Â Â  %f0,256(%r5)
->
-> Â Â  0x000003ff90752034 <+124>:Â Â Â  ldÂ Â Â  %f1,264(%r5)
->
-> Â Â  0x000003ff90752038 <+128>:Â Â Â  ldÂ Â Â  %f2,272(%r5)
->
-> Â Â  0x000003ff9075203c <+132>:Â Â Â  ldÂ Â Â  %f3,280(%r5)
->
-> Â Â  0x000003ff90752040 <+136>:Â Â Â  ldÂ Â Â  %f4,288(%r5)
->
-> Â Â  0x000003ff90752044 <+140>:Â Â Â  ldÂ Â Â  %f5,296(%r5)
->
-> Â Â  0x000003ff90752048 <+144>:Â Â Â  ldÂ Â Â  %f6,304(%r5)
->
-> Â Â  0x000003ff9075204c <+148>:Â Â Â  ldÂ Â Â  %f7,312(%r5)
->
-> Â Â  0x000003ff90752050 <+152>:Â Â Â  ldÂ Â Â  %f8,320(%r5)
->
-> Â Â  0x000003ff90752054 <+156>:Â Â Â  ldÂ Â Â  %f9,328(%r5)
->
-> Â Â  0x000003ff90752058 <+160>:Â Â Â  ldÂ Â Â  %f10,336(%r5)
->
-> Â Â  0x000003ff9075205c <+164>:Â Â Â  ldÂ Â Â  %f11,344(%r5)
->
-> Â Â  0x000003ff90752060 <+168>:Â Â Â  ldÂ Â Â  %f12,352(%r5)
->
-> Â Â  0x000003ff90752064 <+172>:Â Â Â  ldÂ Â Â  %f13,360(%r5)
->
-> Â Â  0x000003ff90752068 <+176>:Â Â Â  ldÂ Â Â  %f14,368(%r5)
->
-> Â Â  0x000003ff9075206c <+180>:Â Â Â  ldÂ Â Â  %f15,376(%r5)
->
-> Â Â  0x000003ff90752070 <+184>:Â Â Â  lamÂ Â Â  %a2,%a15,192(%r5)
->
-> Â Â  0x000003ff90752074 <+188>:Â Â Â  lmgÂ Â Â  %r0,%r15,56(%r5)
->
-> Â Â  0x000003ff9075207a <+194>:Â Â Â  brÂ Â Â  %r14
->
-> End of assembler dump.
->
->
->
-> (gdb) i r
->
-> r0Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
->
-> r1Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7de40Â Â Â  4396165881408
->
-> r2Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
->
-> r3Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7e1c0Â Â Â  4396165882304
->
-> r4Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dfc0Â Â Â  4396165881792
->
-> r5Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
->
-> r6Â Â Â Â Â Â Â Â Â Â Â Â  0xffffffff88004880Â Â Â  18446744071696304256
->
-> r7Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff880009e0Â Â Â  4396033247712
->
-> r8Â Â Â Â Â Â Â Â Â Â Â Â  0x27ff89000Â Â Â  10736930816
->
-> r9Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001460Â Â Â  4396033250400
->
-> r10Â Â Â Â Â Â Â Â Â Â Â  0x1000Â Â Â  4096
->
-> r11Â Â Â Â Â Â Â Â Â Â Â  0x1261be0Â Â Â  19274720
->
-> r12Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001e00Â Â Â  4396033252864
->
-> r13Â Â Â Â Â Â Â Â Â Â Â  0x14d0bc0Â Â Â  21826496
->
-> r14Â Â Â Â Â Â Â Â Â Â Â  0x1312ac8Â Â Â  19999432
->
-> r15Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dc80Â Â Â  4396165880960
->
-> pcÂ Â Â Â Â Â Â Â Â Â Â Â  0x3ff9075202cÂ Â Â  0x3ff9075202c <swapcontext+116>
->
-> ccÂ Â Â Â Â Â Â Â Â Â Â Â  0x2Â Â Â  2
-
-On 03/05/2018 02:08 PM, Christian Borntraeger wrote:
-Do you happen to run with a recent host kernel that has
-
-commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
-     s390: scrub registers on kernel entry and KVM exit
-Yes.
-Can you run with this on top
-diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
-index 13a133a6015c..d6dc0e5e8f74 100644
---- a/arch/s390/kernel/entry.S
-+++ b/arch/s390/kernel/entry.S
-@@ -426,13 +426,13 @@ ENTRY(system_call)
-         UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
-         BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
-         stmg    %r0,%r7,__PT_R0(%r11)
--       # clear user controlled register to prevent speculative use
--       xgr     %r0,%r0
-         mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
-         mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
-         mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
-         stg     %r14,__PT_FLAGS(%r11)
-  .Lsysc_do_svc:
-+       # clear user controlled register to prevent speculative use
-+       xgr     %r0,%r0
-         # load address of system call table
-         lg      %r10,__THREAD_sysc_table(%r13,%r12)
-         llgh    %r8,__PT_INT_CODE+2(%r11)
-
-
-To me it looks like that the critical section cleanup (interrupt during system 
-call entry) might
-save the registers again into ptregs but we have already zeroed out r0.
-This patch moves the clearing of r0 after sysc_do_svc, which should fix the 
-critical
-section cleanup.
-Okay I will run with this.
-Adding Martin and Heiko. Will spin a patch.
-
-
-On 03/05/2018 07:54 PM, Christian Borntraeger wrote:
-On 03/05/2018 07:45 PM, Farhan Ali wrote:
-On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
-Please include the following gdb output:
-
- Â Â  (gdb) disas swapcontext
- Â Â  (gdb) i r
-
-That way it's possible to see which instruction faulted and which
-registers were being accessed.
-here is the disas out for swapcontext, this is on a coredump with debugging 
-symbols enabled for qemu. So the addresses from the previous dump is a little 
-different.
-
-
-(gdb) disas swapcontext
-Dump of assembler code for function swapcontext:
- Â Â  0x000003ff90751fb8 <+0>:Â Â Â  lgrÂ Â Â  %r1,%r2
- Â Â  0x000003ff90751fbc <+4>:Â Â Â  lgrÂ Â Â  %r0,%r3
- Â Â  0x000003ff90751fc0 <+8>:Â Â Â  stfpcÂ Â Â  248(%r1)
- Â Â  0x000003ff90751fc4 <+12>:Â Â Â  stdÂ Â Â  %f0,256(%r1)
- Â Â  0x000003ff90751fc8 <+16>:Â Â Â  stdÂ Â Â  %f1,264(%r1)
- Â Â  0x000003ff90751fcc <+20>:Â Â Â  stdÂ Â Â  %f2,272(%r1)
- Â Â  0x000003ff90751fd0 <+24>:Â Â Â  stdÂ Â Â  %f3,280(%r1)
- Â Â  0x000003ff90751fd4 <+28>:Â Â Â  stdÂ Â Â  %f4,288(%r1)
- Â Â  0x000003ff90751fd8 <+32>:Â Â Â  stdÂ Â Â  %f5,296(%r1)
- Â Â  0x000003ff90751fdc <+36>:Â Â Â  stdÂ Â Â  %f6,304(%r1)
- Â Â  0x000003ff90751fe0 <+40>:Â Â Â  stdÂ Â Â  %f7,312(%r1)
- Â Â  0x000003ff90751fe4 <+44>:Â Â Â  stdÂ Â Â  %f8,320(%r1)
- Â Â  0x000003ff90751fe8 <+48>:Â Â Â  stdÂ Â Â  %f9,328(%r1)
- Â Â  0x000003ff90751fec <+52>:Â Â Â  stdÂ Â Â  %f10,336(%r1)
- Â Â  0x000003ff90751ff0 <+56>:Â Â Â  stdÂ Â Â  %f11,344(%r1)
- Â Â  0x000003ff90751ff4 <+60>:Â Â Â  stdÂ Â Â  %f12,352(%r1)
- Â Â  0x000003ff90751ff8 <+64>:Â Â Â  stdÂ Â Â  %f13,360(%r1)
- Â Â  0x000003ff90751ffc <+68>:Â Â Â  stdÂ Â Â  %f14,368(%r1)
- Â Â  0x000003ff90752000 <+72>:Â Â Â  stdÂ Â Â  %f15,376(%r1)
- Â Â  0x000003ff90752004 <+76>:Â Â Â  slgrÂ Â Â  %r2,%r2
- Â Â  0x000003ff90752008 <+80>:Â Â Â  stamÂ Â Â  %a0,%a15,184(%r1)
- Â Â  0x000003ff9075200c <+84>:Â Â Â  stmgÂ Â Â  %r0,%r15,56(%r1)
- Â Â  0x000003ff90752012 <+90>:Â Â Â  laÂ Â Â  %r2,2
- Â Â  0x000003ff90752016 <+94>:Â Â Â  lgrÂ Â Â  %r5,%r0
- Â Â  0x000003ff9075201a <+98>:Â Â Â  laÂ Â Â  %r3,384(%r5)
- Â Â  0x000003ff9075201e <+102>:Â Â Â  laÂ Â Â  %r4,384(%r1)
- Â Â  0x000003ff90752022 <+106>:Â Â Â  lghiÂ Â Â  %r5,8
- Â Â  0x000003ff90752026 <+110>:Â Â Â  svcÂ Â Â  175
-sys_rt_sigprocmask. r0 should not be changed by the system call.
-Â Â  0x000003ff90752028 <+112>:Â Â Â  lgrÂ Â Â  %r5,%r0
-=> 0x000003ff9075202c <+116>:Â Â Â  lfpcÂ Â Â  248(%r5)
-so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the 
-2nd parameter to this
-function). Now this is odd.
-Â Â  0x000003ff90752030 <+120>:Â Â Â  ldÂ Â Â  %f0,256(%r5)
- Â Â  0x000003ff90752034 <+124>:Â Â Â  ldÂ Â Â  %f1,264(%r5)
- Â Â  0x000003ff90752038 <+128>:Â Â Â  ldÂ Â Â  %f2,272(%r5)
- Â Â  0x000003ff9075203c <+132>:Â Â Â  ldÂ Â Â  %f3,280(%r5)
- Â Â  0x000003ff90752040 <+136>:Â Â Â  ldÂ Â Â  %f4,288(%r5)
- Â Â  0x000003ff90752044 <+140>:Â Â Â  ldÂ Â Â  %f5,296(%r5)
- Â Â  0x000003ff90752048 <+144>:Â Â Â  ldÂ Â Â  %f6,304(%r5)
- Â Â  0x000003ff9075204c <+148>:Â Â Â  ldÂ Â Â  %f7,312(%r5)
- Â Â  0x000003ff90752050 <+152>:Â Â Â  ldÂ Â Â  %f8,320(%r5)
- Â Â  0x000003ff90752054 <+156>:Â Â Â  ldÂ Â Â  %f9,328(%r5)
- Â Â  0x000003ff90752058 <+160>:Â Â Â  ldÂ Â Â  %f10,336(%r5)
- Â Â  0x000003ff9075205c <+164>:Â Â Â  ldÂ Â Â  %f11,344(%r5)
- Â Â  0x000003ff90752060 <+168>:Â Â Â  ldÂ Â Â  %f12,352(%r5)
- Â Â  0x000003ff90752064 <+172>:Â Â Â  ldÂ Â Â  %f13,360(%r5)
- Â Â  0x000003ff90752068 <+176>:Â Â Â  ldÂ Â Â  %f14,368(%r5)
- Â Â  0x000003ff9075206c <+180>:Â Â Â  ldÂ Â Â  %f15,376(%r5)
- Â Â  0x000003ff90752070 <+184>:Â Â Â  lamÂ Â Â  %a2,%a15,192(%r5)
- Â Â  0x000003ff90752074 <+188>:Â Â Â  lmgÂ Â Â  %r0,%r15,56(%r5)
- Â Â  0x000003ff9075207a <+194>:Â Â Â  brÂ Â Â  %r14
-End of assembler dump.
-
-(gdb) i r
-r0Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
-r1Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7de40Â Â Â  4396165881408
-r2Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
-r3Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7e1c0Â Â Â  4396165882304
-r4Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dfc0Â Â Â  4396165881792
-r5Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
-r6Â Â Â Â Â Â Â Â Â Â Â Â  0xffffffff88004880Â Â Â  18446744071696304256
-r7Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff880009e0Â Â Â  4396033247712
-r8Â Â Â Â Â Â Â Â Â Â Â Â  0x27ff89000Â Â Â  10736930816
-r9Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001460Â Â Â  4396033250400
-r10Â Â Â Â Â Â Â Â Â Â Â  0x1000Â Â Â  4096
-r11Â Â Â Â Â Â Â Â Â Â Â  0x1261be0Â Â Â  19274720
-r12Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001e00Â Â Â  4396033252864
-r13Â Â Â Â Â Â Â Â Â Â Â  0x14d0bc0Â Â Â  21826496
-r14Â Â Â Â Â Â Â Â Â Â Â  0x1312ac8Â Â Â  19999432
-r15Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dc80Â Â Â  4396165880960
-pcÂ Â Â Â Â Â Â Â Â Â Â Â  0x3ff9075202cÂ Â Â  0x3ff9075202c <swapcontext+116>
-ccÂ Â Â Â Â Â Â Â Â Â Â Â  0x2Â Â Â  2
-
-On Mon, 5 Mar 2018 20:08:45 +0100
-Christian Borntraeger <address@hidden> wrote:
-
->
-Do you happen to run with a recent host kernel that has
->
->
-commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
->
-s390: scrub registers on kernel entry and KVM exit
->
->
-Can you run with this on top
->
-diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
->
-index 13a133a6015c..d6dc0e5e8f74 100644
->
---- a/arch/s390/kernel/entry.S
->
-+++ b/arch/s390/kernel/entry.S
->
-@@ -426,13 +426,13 @@ ENTRY(system_call)
->
-UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
->
-BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
->
-stmg    %r0,%r7,__PT_R0(%r11)
->
--       # clear user controlled register to prevent speculative use
->
--       xgr     %r0,%r0
->
-mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
->
-mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
->
-mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
->
-stg     %r14,__PT_FLAGS(%r11)
->
-.Lsysc_do_svc:
->
-+       # clear user controlled register to prevent speculative use
->
-+       xgr     %r0,%r0
->
-# load address of system call table
->
-lg      %r10,__THREAD_sysc_table(%r13,%r12)
->
-llgh    %r8,__PT_INT_CODE+2(%r11)
->
->
->
-To me it looks like that the critical section cleanup (interrupt during
->
-system call entry) might
->
-save the registers again into ptregs but we have already zeroed out r0.
->
-This patch moves the clearing of r0 after sysc_do_svc, which should fix the
->
-critical
->
-section cleanup.
->
->
-Adding Martin and Heiko. Will spin a patch.
-Argh, yes. Thanks Chrisitan, this is it. I have been searching for the bug
-for days now. The point is that if the system call handler is interrupted
-after the xgr but before .Lsysc_do_svc the code at .Lcleanup_system_call 
-repeats the stmg for %r0-%r7 but now %r0 is already zero.
-
-Please commit a patch for this and I'll will queue it up immediately.
-
--- 
-blue skies,
-   Martin.
-
-"Reality continues to ruin my life." - Calvin.
-
-On 03/06/2018 01:34 AM, Martin Schwidefsky wrote:
-On Mon, 5 Mar 2018 20:08:45 +0100
-Christian Borntraeger <address@hidden> wrote:
-Do you happen to run with a recent host kernel that has
-
-commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
-     s390: scrub registers on kernel entry and KVM exit
-
-Can you run with this on top
-diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
-index 13a133a6015c..d6dc0e5e8f74 100644
---- a/arch/s390/kernel/entry.S
-+++ b/arch/s390/kernel/entry.S
-@@ -426,13 +426,13 @@ ENTRY(system_call)
-         UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
-         BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
-         stmg    %r0,%r7,__PT_R0(%r11)
--       # clear user controlled register to prevent speculative use
--       xgr     %r0,%r0
-         mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
-         mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
-         mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
-         stg     %r14,__PT_FLAGS(%r11)
-  .Lsysc_do_svc:
-+       # clear user controlled register to prevent speculative use
-+       xgr     %r0,%r0
-         # load address of system call table
-         lg      %r10,__THREAD_sysc_table(%r13,%r12)
-         llgh    %r8,__PT_INT_CODE+2(%r11)
-
-
-To me it looks like that the critical section cleanup (interrupt during system 
-call entry) might
-save the registers again into ptregs but we have already zeroed out r0.
-This patch moves the clearing of r0 after sysc_do_svc, which should fix the 
-critical
-section cleanup.
-
-Adding Martin and Heiko. Will spin a patch.
-Argh, yes. Thanks Chrisitan, this is it. I have been searching for the bug
-for days now. The point is that if the system call handler is interrupted
-after the xgr but before .Lsysc_do_svc the code at .Lcleanup_system_call
-repeats the stmg for %r0-%r7 but now %r0 is already zero.
-
-Please commit a patch for this and I'll will queue it up immediately.
-This patch does fix the QEMU crash. I haven't seen the crash after
-running the test case for more than a day. Thanks to everyone for taking
-a look at this problem :)
-Thanks
-Farhan
-
diff --git a/classification_output/04/other/23300761 b/classification_output/04/other/23300761
deleted file mode 100644
index e546903ce..000000000
--- a/classification_output/04/other/23300761
+++ /dev/null
@@ -1,321 +0,0 @@
-other: 0.963
-assembly: 0.958
-semantic: 0.950
-device: 0.932
-boot: 0.929
-instruction: 0.929
-socket: 0.927
-vnc: 0.926
-graphic: 0.924
-KVM: 0.897
-network: 0.879
-mistranslation: 0.770
-
-[Qemu-devel] [BUG] 216 Alerts reported by LGTM for QEMU (some might be release critical)
-
-Hi,
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
-Some of them are already know (wrong format strings), others look like
-real errors:
-- several multiplication results which don't work as they should in
-contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
-32 bit!),Â  target/i386/translate.c and other files
-- potential buffer overflows in gdbstub.c and other files
-I am afraid that the overflows in the block code are release critical,
-maybe that in target/i386/translate.c and other errors, too.
-About half of the alerts are issues which can be fixed later.
-
-Regards
-
-Stefan
-
-On 13/07/19 19:46, Stefan Weil wrote:
->
->
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->
-Some of them are already know (wrong format strings), others look like
->
-real errors:
->
->
-- several multiplication results which don't work as they should in
->
-contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
-32 bit!),Â  target/i386/translate.c and other files
-m->nb_clusters here is limited by s->l2_slice_size (see for example
-handle_alloc) so I wouldn't be surprised if this is a false positive.  I
-couldn't find this particular multiplication in Coverity, but it has
-about 250 issues marked as intentional or false positive so there's
-probably a lot of overlap with what LGTM found.
-
-Paolo
-
-Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
->
-On 13/07/19 19:46, Stefan Weil wrote:
->
-> LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->
->
-> Some of them are already known (wrong format strings), others look like
->
-> real errors:
->
->
->
-> - several multiplication results which don't work as they should in
->
-> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
-> 32 bit!),Â  target/i386/translate.c and other files
->
-m->nb_clusters here is limited by s->l2_slice_size (see for example
->
-handle_alloc) so I wouldn't be surprised if this is a false positive.  I
->
-couldn't find this particular multiplication in Coverity, but it has
->
-about 250 issues marked as intentional or false positive so there's
->
-probably a lot of overlap with what LGTM found.
->
->
-Paolo
->
-From other projects I know that there is a certain overlap between the
-results from Coverity Scan an LGTM, but it is good to have both
-analyzers, and the results from LGTM are typically quite reliable.
-
-Even if we know that there is no multiplication overflow, the code could
-be modified. Either the assigned value should use the same data type as
-the factors (possible when there is never an overflow, avoids a size
-extension), or the multiplication could use the larger data type by
-adding a type cast to one of the factors (then an overflow cannot
-happen, static code analysers and human reviewers have an easier job,
-but the multiplication costs more time).
-
-Stefan
-
-Am 14.07.2019 um 15:28 hat Stefan Weil geschrieben:
->
-Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
->
-> On 13/07/19 19:46, Stefan Weil wrote:
->
->> LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
->>
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->>
->
->> Some of them are already known (wrong format strings), others look like
->
->> real errors:
->
->>
->
->> - several multiplication results which don't work as they should in
->
->> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
->> 32 bit!),Â  target/i386/translate.c and other files
-Request sizes are limited to 32 bit in the generic block layer before
-they are even passed to the individual block drivers, so most if not all
-of these are going to be false positives.
-
->
-> m->nb_clusters here is limited by s->l2_slice_size (see for example
->
-> handle_alloc) so I wouldn't be surprised if this is a false positive.  I
->
-> couldn't find this particular multiplication in Coverity, but it has
->
-> about 250 issues marked as intentional or false positive so there's
->
-> probably a lot of overlap with what LGTM found.
->
->
->
-> Paolo
->
->
-From other projects I know that there is a certain overlap between the
->
-results from Coverity Scan an LGTM, but it is good to have both
->
-analyzers, and the results from LGTM are typically quite reliable.
->
->
-Even if we know that there is no multiplication overflow, the code could
->
-be modified. Either the assigned value should use the same data type as
->
-the factors (possible when there is never an overflow, avoids a size
->
-extension), or the multiplication could use the larger data type by
->
-adding a type cast to one of the factors (then an overflow cannot
->
-happen, static code analysers and human reviewers have an easier job,
->
-but the multiplication costs more time).
-But if you look at the code we're talking about, you see that it's
-complaining about things where being more explicit would make things
-less readable.
-
-For example, if complains about the multiplication in this line:
-
-    s->file_size += n * s->header.cluster_size;
-
-We know that n * s->header.cluster_size fits in 32 bits, but
-s->file_size is 64 bits (and has to be 64 bits). Do you really think we
-should introduce another uint32_t variable to store the intermediate
-result? And if we cast n to uint64_t, not only might the multiplication
-cost more time, but also human readers would wonder why the result could
-become larger than 32 bits. So a cast would be misleading.
-
-
-It also complains about this line:
-
-    ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size,
-                        PREALLOC_MODE_OFF, &local_err);
-
-Here, we don't even assign the result to a 64 bit variable, but just
-pass it to a function which takes a 64 bit parameter. Again, I don't
-think introducing additional variables for the intermediate result or
-adding casts would be an improvement of the situation.
-
-
-So I don't think this is a good enough tool to base our code on what it
-does and doesn't understand. It would have too much of a negative impact
-on our code. We'd rather need a way to mark false positives as such and
-move on without changing the code in such cases.
-
-Kevin
-
-On Sat, 13 Jul 2019 at 18:46, Stefan Weil <address@hidden> wrote:
->
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
-I had a look at some of these before, but mostly I came
-to the conclusion that it wasn't worth trying to put the
-effort into keeping up with the site because they didn't
-seem to provide any useful way to mark things as false
-positives. Coverity has its flaws but at least you can do
-that kind of thing in its UI (it runs at about a 33% fp
-rate, I think.) "Analyzer thinks this multiply can overflow
-but in fact it's not possible" is quite a common false
-positive cause...
-
-Anyway, if you want to fish out specific issues, analyse
-whether they're false positive or real, and report them
-to the mailing list as followups to the patches which
-introduced the issue, that's probably the best way for
-us to make use of this analyzer. (That is essentially
-what I do for coverity.)
-
-thanks
--- PMM
-
-Am 14.07.2019 um 19:30 schrieb Peter Maydell:
-[...]
->
-"Analyzer thinks this multiply can overflow
->
-but in fact it's not possible" is quite a common false
->
-positive cause...
-The analysers don't complain because a multiply can overflow.
-
-They complain because the code indicates that a larger result is
-expected, for example uint64_t = uint32_t * uint32_t. They would not
-complain for the same multiplication if it were assigned to a uint32_t.
-
-So there is a simple solution to write the code in a way which avoids
-false positives...
-
-Stefan
-
-Stefan Weil <address@hidden> writes:
-
->
-Am 14.07.2019 um 19:30 schrieb Peter Maydell:
->
-[...]
->
-> "Analyzer thinks this multiply can overflow
->
-> but in fact it's not possible" is quite a common false
->
-> positive cause...
->
->
->
-The analysers don't complain because a multiply can overflow.
->
->
-They complain because the code indicates that a larger result is
->
-expected, for example uint64_t = uint32_t * uint32_t. They would not
->
-complain for the same multiplication if it were assigned to a uint32_t.
-I agree this is an anti-pattern.
-
->
-So there is a simple solution to write the code in a way which avoids
->
-false positives...
-You wrote elsewhere in this thread:
-
-    Either the assigned value should use the same data type as the
-    factors (possible when there is never an overflow, avoids a size
-    extension), or the multiplication could use the larger data type by
-    adding a type cast to one of the factors (then an overflow cannot
-    happen, static code analysers and human reviewers have an easier
-    job, but the multiplication costs more time).
-
-Makes sense to me.
-
-On 7/14/19 5:30 PM, Peter Maydell wrote:
->
-I had a look at some of these before, but mostly I came
->
-to the conclusion that it wasn't worth trying to put the
->
-effort into keeping up with the site because they didn't
->
-seem to provide any useful way to mark things as false
->
-positives. Coverity has its flaws but at least you can do
->
-that kind of thing in its UI (it runs at about a 33% fp
->
-rate, I think.)
-Yes, LGTM wants you to modify the source code with
-
-  /* lgtm [cpp/some-warning-code] */
-
-and on the same line as the reported problem.  Which is mildly annoying in that
-you're definitely committing to LGTM in the long term.  Also for any
-non-trivial bit of code, it will almost certainly run over 80 columns.
-
-
-r~
-
diff --git a/classification_output/04/other/23448582 b/classification_output/04/other/23448582
deleted file mode 100644
index 1d086d269..000000000
--- a/classification_output/04/other/23448582
+++ /dev/null
@@ -1,273 +0,0 @@
-other: 0.990
-semantic: 0.987
-graphic: 0.987
-assembly: 0.986
-instruction: 0.983
-socket: 0.982
-mistranslation: 0.982
-device: 0.979
-network: 0.973
-vnc: 0.973
-boot: 0.967
-KVM: 0.958
-
-[BUG REPORT] cxl process in infinity loop
-
-Hi, all
-
-When I did the cxl memory hot-plug test on QEMU, I accidentally connected 
-two memdev to the same downstream port, the command like below:
-
->
--object memory-backend-ram,size=262144k,share=on,id=vmem0 \
->
--object memory-backend-ram,size=262144k,share=on,id=vmem1 \
->
--device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
->
--device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
->
--device cxl-upstream,bus=root_port0,id=us0 \
->
--device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \
->
--device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \
-same downstream port but has different slot!
-
->
--device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \
->
--device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \
->
--M
->
-cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k
->
-\
-There is no error occurred when vm start, but when I executed the âcxl listâ 
-command to view
-the CXL objects info, the process can not end properly.
-
-Then I used strace to trace the process, I found that the process is in 
-infinity loop:
-# strace cxl list
-......
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
-write(3, "1\n\0", 3)                    = 3
-close(3)                                = 0
-access("/run/udev/queue", F_OK)         = 0
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
-write(3, "1\n\0", 3)                    = 3
-close(3)                                = 0
-access("/run/udev/queue", F_OK)         = 0
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
-write(3, "1\n\0", 3)                    = 3
-close(3)                                = 0
-access("/run/udev/queue", F_OK)         = 0
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
-write(3, "1\n\0", 3)                    = 3
-close(3)                                = 0
-access("/run/udev/queue", F_OK)         = 0
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
-write(3, "1\n\0", 3)                    = 3
-close(3)                                = 0
-access("/run/udev/queue", F_OK)         = 0
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
-write(3, "1\n\0", 3)                    = 3
-close(3)                                = 0
-access("/run/udev/queue", F_OK)         = 0
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
-write(3, "1\n\0", 3)                    = 3
-close(3)                                = 0
-access("/run/udev/queue", F_OK)         = 0
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
-write(3, "1\n\0", 3)                    = 3
-close(3)                                = 0
-access("/run/udev/queue", F_OK)         = 0
-
-[Environment]:
-linux: V6.10-rc3
-QEMU: V9.0.0
-ndctl: v79
-
-I know this is because of the wrong use of the QEMU command, but I think we 
-should 
-be aware of this error in one of the QEMU, OS or ndctl side at least.
-
-Thanks
-Xingtao
-
-On Tue, 2 Jul 2024 00:30:06 +0000
-"Xingtao Yao (Fujitsu)" <yaoxt.fnst@fujitsu.com> wrote:
-
->
-Hi, all
->
->
-When I did the cxl memory hot-plug test on QEMU, I accidentally connected
->
-two memdev to the same downstream port, the command like below:
->
->
-> -object memory-backend-ram,size=262144k,share=on,id=vmem0 \
->
-> -object memory-backend-ram,size=262144k,share=on,id=vmem1 \
->
-> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
->
-> -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
->
-> -device cxl-upstream,bus=root_port0,id=us0 \
->
-> -device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \
->
-> -device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \
->
-same downstream port but has different slot!
->
->
-> -device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \
->
-> -device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \
->
-> -M
->
-> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k
->
->  \
->
->
-There is no error occurred when vm start, but when I executed the âcxl listâ
->
-command to view
->
-the CXL objects info, the process can not end properly.
-I'd be happy to look preventing this on QEMU side if you send one,
-but in general there are are lots of ways to shoot yourself in the
-foot with CXL and PCI device emulation in QEMU so I'm not going
-to rush to solve this specific one.
-
-Likewise, some hardening in kernel / userspace probably makes sense but
-this is a non compliant switch so priority of a fix is probably fairly low.
-
-Jonathan
-
->
->
-Then I used strace to trace the process, I found that the process is in
->
-infinity loop:
->
-# strace cxl list
->
-......
->
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
->
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
->
-write(3, "1\n\0", 3)                    = 3
->
-close(3)                                = 0
->
-access("/run/udev/queue", F_OK)         = 0
->
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
->
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
->
-write(3, "1\n\0", 3)                    = 3
->
-close(3)                                = 0
->
-access("/run/udev/queue", F_OK)         = 0
->
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
->
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
->
-write(3, "1\n\0", 3)                    = 3
->
-close(3)                                = 0
->
-access("/run/udev/queue", F_OK)         = 0
->
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
->
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
->
-write(3, "1\n\0", 3)                    = 3
->
-close(3)                                = 0
->
-access("/run/udev/queue", F_OK)         = 0
->
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
->
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
->
-write(3, "1\n\0", 3)                    = 3
->
-close(3)                                = 0
->
-access("/run/udev/queue", F_OK)         = 0
->
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
->
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
->
-write(3, "1\n\0", 3)                    = 3
->
-close(3)                                = 0
->
-access("/run/udev/queue", F_OK)         = 0
->
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
->
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
->
-write(3, "1\n\0", 3)                    = 3
->
-close(3)                                = 0
->
-access("/run/udev/queue", F_OK)         = 0
->
-clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
->
-openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
->
-write(3, "1\n\0", 3)                    = 3
->
-close(3)                                = 0
->
-access("/run/udev/queue", F_OK)         = 0
->
->
-[Environment]:
->
-linux: V6.10-rc3
->
-QEMU: V9.0.0
->
-ndctl: v79
->
->
-I know this is because of the wrong use of the QEMU command, but I think we
->
-should
->
-be aware of this error in one of the QEMU, OS or ndctl side at least.
->
->
-Thanks
->
-Xingtao
-
diff --git a/classification_output/04/other/25892827 b/classification_output/04/other/25892827
deleted file mode 100644
index 4477b5aee..000000000
--- a/classification_output/04/other/25892827
+++ /dev/null
@@ -1,1085 +0,0 @@
-other: 0.892
-KVM: 0.872
-vnc: 0.846
-instruction: 0.842
-mistranslation: 0.842
-boot: 0.839
-network: 0.839
-device: 0.839
-graphic: 0.832
-assembly: 0.829
-semantic: 0.825
-socket: 0.822
-
-[Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot
-
-Hi,
-
-Recently we encountered a problem in our project: 2 CPUs in VM are not brought 
-up normally after reboot.
-
-Our host is using KVM kmod 3.6 and QEMU 2.1.
-A SLES 11 sp3 VM configured with 8 vcpus,
-cpu model is configured with 'host-passthrough'.
-
-After VM's first time started up, everything seems to be OK.
-and then VM is paniced and rebooted.
-After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online.
-
-This is the only message we can get from VM:
-VM dmesg shows:
-[    0.069867] Booting Node   0, Processors  #1
-[    5.060042] CPU1: Stuck ??
-[    5.060499]  #2
-[    5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock
-[    5.088335] KVM setup async PF for cpu 2
-[    5.092967] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.094405]  #3
-[    5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock
-[    5.108333] KVM setup async PF for cpu 3
-[    5.113553] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.114970]  #4
-[    5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock
-[    5.128336] KVM setup async PF for cpu 4
-[    5.134576] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.135998]  #5
-[    5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock
-[    5.152334] KVM setup async PF for cpu 5
-[    5.154764] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.156467]  #6
-[    5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock
-[    5.172341] KVM setup async PF for cpu 6
-[    5.180738] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.182173]  #7 Ok.
-[   10.170815] CPU7: Stuck ??
-[   10.171648] Brought up 6 CPUs
-[   10.172394] Total of 6 processors activated (28799.97 BogoMIPS).
-
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming 
-any cpu (Should be in idle state),
-All of VCPUs' stacks in host is like bellow:
-
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
-[<ffffffffffffffff>] 0xffffffffffffffff
-
-We looked into the kernel codes that could leading to the above 'Stuck' warning,
-and found that the only possible is the emulation of 'cpuid' instruct in 
-kvm/qemu has something wrong.
-But since we canât reproduce this problem, we are not quite sure.
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-
-Has anyone come across these problem before? Or any idea?
-
-Thanks,
-zhanghailiang
-
-On 06/07/2015 09:54, zhanghailiang wrote:
->
->
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
->
-consuming any cpu (Should be in idle state),
->
-All of VCPUs' stacks in host is like bellow:
->
->
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
->
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
->
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
->
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
->
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
->
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
->
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
->
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
->
-[<ffffffffffffffff>] 0xffffffffffffffff
->
->
-We looked into the kernel codes that could leading to the above 'Stuck'
->
-warning,
->
-and found that the only possible is the emulation of 'cpuid' instruct in
->
-kvm/qemu has something wrong.
->
-But since we canât reproduce this problem, we are not quite sure.
->
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-Can you explain the relationship to the cpuid emulation?  What do the
-traces say about vcpus 1 and 7?
-
-Paolo
-
-On 2015/7/6 16:45, Paolo Bonzini wrote:
-On 06/07/2015 09:54, zhanghailiang wrote:
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
-consuming any cpu (Should be in idle state),
-All of VCPUs' stacks in host is like bellow:
-
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
-[<ffffffffffffffff>] 0xffffffffffffffff
-
-We looked into the kernel codes that could leading to the above 'Stuck'
-warning,
-and found that the only possible is the emulation of 'cpuid' instruct in
-kvm/qemu has something wrong.
-But since we canât reproduce this problem, we are not quite sure.
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-Can you explain the relationship to the cpuid emulation?  What do the
-traces say about vcpus 1 and 7?
-OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
-located in
-do_boot_cpu(). It's in BSP context, the call process is:
-BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
--> wakeup_secondary_via_INIT() to trigger APs.
-It will wait 5s for APs to startup, if some AP not startup normally, it will 
-print 'CPU%d Stuck' or 'CPU%d: Not responding'.
-
-If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
-begins to execute the code
-'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
-smp_callin()(smpboot.c).
-The follow is the starup process of BSP and AP.
-BSP:
-start_kernel()
-  ->smp_init()
-     ->smp_boot_cpus()
-       ->do_boot_cpu()
-           ->start_ip = trampoline_address(); //set the address that AP will go 
-to execute
-           ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
-           ->for (timeout = 0; timeout < 50000; timeout++)
-               if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if AP 
-startup or not
-
-APs:
-ENTRY(trampoline_data) (trampoline_64.S)
-      ->ENTRY(secondary_startup_64) (head_64.S)
-         ->start_secondary() (smpboot.c)
-            ->cpu_init();
-            ->smp_callin();
-                ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP comes 
-here, the BSP will not prints the error message.
-
-From above call process, we can be sure that, the AP has been stuck between 
-trampoline_data and the cpumask_set_cpu() in
-smp_callin(), we look through these codes path carefully, and only found a 
-'hlt' instruct that could block the process.
-It is located in trampoline_data():
-
-ENTRY(trampoline_data)
-        ...
-
-        call    verify_cpu              # Verify the cpu supports long mode
-        testl   %eax, %eax              # Check for return code
-        jnz     no_longmode
-
-        ...
-
-no_longmode:
-        hlt
-        jmp no_longmode
-
-For the process verify_cpu(),
-we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
-No-root mode.
-This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
-the fail in verify_cpu.
-
-From the message in VM, we know vcpu1 and vcpu7 is something wrong.
-[    5.060042] CPU1: Stuck ??
-[   10.170815] CPU7: Stuck ??
-[   10.171648] Brought up 6 CPUs
-
-Besides, the follow is the cpus message got from host.
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
-instance-0000000
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
-  CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
-  CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
-  CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
-  CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
-  CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
-  CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
-  CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
-
-Oh, i also forgot to mention in the above message that, we have bond each vCPU 
-to different physical CPU in
-host.
-
-Thanks,
-zhanghailiang
-
-On 06/07/2015 11:59, zhanghailiang wrote:
->
->
->
-Besides, the follow is the cpus message got from host.
->
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
->
-qemu-monitor-command instance-0000000
->
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
->
-CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
->
-CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
->
-CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
->
-CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
->
-CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
->
-CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
->
-CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
->
->
-Oh, i also forgot to mention in the above message that, we have bond
->
-each vCPU to different physical CPU in
->
-host.
-Can you capture a trace on the host (trace-cmd record -e kvm) and send
-it privately?  Please note which CPUs get stuck, since I guess it's not
-always 1 and 7.
-
-Paolo
-
-On Mon, 6 Jul 2015 17:59:10 +0800
-zhanghailiang <address@hidden> wrote:
-
->
-On 2015/7/6 16:45, Paolo Bonzini wrote:
->
->
->
->
->
-> On 06/07/2015 09:54, zhanghailiang wrote:
->
->>
->
->>  From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
->
->> consuming any cpu (Should be in idle state),
->
->> All of VCPUs' stacks in host is like bellow:
->
->>
->
->> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
->
->> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
->
->> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
->
->> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
->
->> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
->
->> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
->
->> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
->
->> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
->
->> [<ffffffffffffffff>] 0xffffffffffffffff
->
->>
->
->> We looked into the kernel codes that could leading to the above 'Stuck'
->
->> warning,
-in current upstream there isn't any printk(...Stuck...) left since that code 
-path
-has been reworked.
-I've often seen this on over-committed host during guest CPUs up/down torture 
-test.
-Could you update guest kernel to upstream and see if issue reproduces?
-
->
->> and found that the only possible is the emulation of 'cpuid' instruct in
->
->> kvm/qemu has something wrong.
->
->> But since we canât reproduce this problem, we are not quite sure.
->
->> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
->
->
->
-> Can you explain the relationship to the cpuid emulation?  What do the
->
-> traces say about vcpus 1 and 7?
->
->
-OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is
->
-located in
->
-do_boot_cpu(). It's in BSP context, the call process is:
->
-BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu()
->
--> wakeup_secondary_via_INIT() to trigger APs.
->
-It will wait 5s for APs to startup, if some AP not startup normally, it will
->
-print 'CPU%d Stuck' or 'CPU%d: Not responding'.
->
->
-If it prints 'Stuck', it means the AP has received the SIPI interrupt and
->
-begins to execute the code
->
-'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
->
-before smp_callin()(smpboot.c).
->
-The follow is the starup process of BSP and AP.
->
-BSP:
->
-start_kernel()
->
-->smp_init()
->
-->smp_boot_cpus()
->
-->do_boot_cpu()
->
-->start_ip = trampoline_address(); //set the address that AP will
->
-go to execute
->
-->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
->
-->for (timeout = 0; timeout < 50000; timeout++)
->
-if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if
->
-AP startup or not
->
->
-APs:
->
-ENTRY(trampoline_data) (trampoline_64.S)
->
-->ENTRY(secondary_startup_64) (head_64.S)
->
-->start_secondary() (smpboot.c)
->
-->cpu_init();
->
-->smp_callin();
->
-->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
->
-comes here, the BSP will not prints the error message.
->
->
-From above call process, we can be sure that, the AP has been stuck between
->
-trampoline_data and the cpumask_set_cpu() in
->
-smp_callin(), we look through these codes path carefully, and only found a
->
-'hlt' instruct that could block the process.
->
-It is located in trampoline_data():
->
->
-ENTRY(trampoline_data)
->
-...
->
->
-call    verify_cpu              # Verify the cpu supports long mode
->
-testl   %eax, %eax              # Check for return code
->
-jnz     no_longmode
->
->
-...
->
->
-no_longmode:
->
-hlt
->
-jmp no_longmode
->
->
-For the process verify_cpu(),
->
-we can only find the 'cpuid' sensitive instruct that could lead VM exit from
->
-No-root mode.
->
-This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to
->
-the fail in verify_cpu.
->
->
-From the message in VM, we know vcpu1 and vcpu7 is something wrong.
->
-[    5.060042] CPU1: Stuck ??
->
-[   10.170815] CPU7: Stuck ??
->
-[   10.171648] Brought up 6 CPUs
->
->
-Besides, the follow is the cpus message got from host.
->
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
->
-qemu-monitor-command instance-0000000
->
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
->
-CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
->
-CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
->
-CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
->
-CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
->
-CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
->
-CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
->
-CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
->
->
-Oh, i also forgot to mention in the above message that, we have bond each
->
-vCPU to different physical CPU in
->
-host.
->
->
-Thanks,
->
-zhanghailiang
->
->
->
->
->
---
->
-To unsubscribe from this list: send the line "unsubscribe kvm" in
->
-the body of a message to address@hidden
->
-More majordomo info at
-http://vger.kernel.org/majordomo-info.html
-
-On 2015/7/7 19:23, Igor Mammedov wrote:
-On Mon, 6 Jul 2015 17:59:10 +0800
-zhanghailiang <address@hidden> wrote:
-On 2015/7/6 16:45, Paolo Bonzini wrote:
-On 06/07/2015 09:54, zhanghailiang wrote:
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
-consuming any cpu (Should be in idle state),
-All of VCPUs' stacks in host is like bellow:
-
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
-[<ffffffffffffffff>] 0xffffffffffffffff
-
-We looked into the kernel codes that could leading to the above 'Stuck'
-warning,
-in current upstream there isn't any printk(...Stuck...) left since that code 
-path
-has been reworked.
-I've often seen this on over-committed host during guest CPUs up/down torture 
-test.
-Could you update guest kernel to upstream and see if issue reproduces?
-Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to 
-reproduce it.
-
-For your test case, is it a kernel bug?
-Or is there any related patch could solve your test problem been merged into
-upstream ?
-
-Thanks,
-zhanghailiang
-and found that the only possible is the emulation of 'cpuid' instruct in
-kvm/qemu has something wrong.
-But since we canât reproduce this problem, we are not quite sure.
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-Can you explain the relationship to the cpuid emulation?  What do the
-traces say about vcpus 1 and 7?
-OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
-located in
-do_boot_cpu(). It's in BSP context, the call process is:
-BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
--> wakeup_secondary_via_INIT() to trigger APs.
-It will wait 5s for APs to startup, if some AP not startup normally, it will 
-print 'CPU%d Stuck' or 'CPU%d: Not responding'.
-
-If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
-begins to execute the code
-'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
-smp_callin()(smpboot.c).
-The follow is the starup process of BSP and AP.
-BSP:
-start_kernel()
-    ->smp_init()
-       ->smp_boot_cpus()
-         ->do_boot_cpu()
-             ->start_ip = trampoline_address(); //set the address that AP will 
-go to execute
-             ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
-             ->for (timeout = 0; timeout < 50000; timeout++)
-                 if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if 
-AP startup or not
-
-APs:
-ENTRY(trampoline_data) (trampoline_64.S)
-        ->ENTRY(secondary_startup_64) (head_64.S)
-           ->start_secondary() (smpboot.c)
-              ->cpu_init();
-              ->smp_callin();
-                  ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP 
-comes here, the BSP will not prints the error message.
-
-  From above call process, we can be sure that, the AP has been stuck between 
-trampoline_data and the cpumask_set_cpu() in
-smp_callin(), we look through these codes path carefully, and only found a 
-'hlt' instruct that could block the process.
-It is located in trampoline_data():
-
-ENTRY(trampoline_data)
-          ...
-
-        call    verify_cpu              # Verify the cpu supports long mode
-        testl   %eax, %eax              # Check for return code
-        jnz     no_longmode
-
-          ...
-
-no_longmode:
-        hlt
-        jmp no_longmode
-
-For the process verify_cpu(),
-we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
-No-root mode.
-This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
-the fail in verify_cpu.
-
-  From the message in VM, we know vcpu1 and vcpu7 is something wrong.
-[    5.060042] CPU1: Stuck ??
-[   10.170815] CPU7: Stuck ??
-[   10.171648] Brought up 6 CPUs
-
-Besides, the follow is the cpus message got from host.
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
-instance-0000000
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
-    CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
-    CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
-    CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
-    CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
-    CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
-    CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
-    CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
-
-Oh, i also forgot to mention in the above message that, we have bond each vCPU 
-to different physical CPU in
-host.
-
-Thanks,
-zhanghailiang
-
-
-
-
---
-To unsubscribe from this list: send the line "unsubscribe kvm" in
-the body of a message to address@hidden
-More majordomo info at
-http://vger.kernel.org/majordomo-info.html
-.
-
-On Tue, 7 Jul 2015 19:43:35 +0800
-zhanghailiang <address@hidden> wrote:
-
->
-On 2015/7/7 19:23, Igor Mammedov wrote:
->
-> On Mon, 6 Jul 2015 17:59:10 +0800
->
-> zhanghailiang <address@hidden> wrote:
->
->
->
->> On 2015/7/6 16:45, Paolo Bonzini wrote:
->
->>>
->
->>>
->
->>> On 06/07/2015 09:54, zhanghailiang wrote:
->
->>>>
->
->>>>   From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
->
->>>> consuming any cpu (Should be in idle state),
->
->>>> All of VCPUs' stacks in host is like bellow:
->
->>>>
->
->>>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
->
->>>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
->
->>>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
->
->>>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
->
->>>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
->
->>>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
->
->>>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
->
->>>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
->
->>>> [<ffffffffffffffff>] 0xffffffffffffffff
->
->>>>
->
->>>> We looked into the kernel codes that could leading to the above 'Stuck'
->
->>>> warning,
->
-> in current upstream there isn't any printk(...Stuck...) left since that
->
-> code path
->
-> has been reworked.
->
-> I've often seen this on over-committed host during guest CPUs up/down
->
-> torture test.
->
-> Could you update guest kernel to upstream and see if issue reproduces?
->
->
->
->
-Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to
->
-reproduce it.
->
->
-For your test case, is it a kernel bug?
->
-Or is there any related patch could solve your test problem been merged into
->
-upstream ?
-I don't remember all prerequisite patches but you should be able to find
-http://marc.info/?l=linux-kernel&m=140326703108009&w=2
-"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
-and then look for dependencies.
-
-
->
->
-Thanks,
->
-zhanghailiang
->
->
->>>> and found that the only possible is the emulation of 'cpuid' instruct in
->
->>>> kvm/qemu has something wrong.
->
->>>> But since we canât reproduce this problem, we are not quite sure.
->
->>>> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
->
->>>
->
->>> Can you explain the relationship to the cpuid emulation?  What do the
->
->>> traces say about vcpus 1 and 7?
->
->>
->
->> OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is
->
->> located in
->
->> do_boot_cpu(). It's in BSP context, the call process is:
->
->> BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() ->
->
->> do_boot_cpu() -> wakeup_secondary_via_INIT() to trigger APs.
->
->> It will wait 5s for APs to startup, if some AP not startup normally, it
->
->> will print 'CPU%d Stuck' or 'CPU%d: Not responding'.
->
->>
->
->> If it prints 'Stuck', it means the AP has received the SIPI interrupt and
->
->> begins to execute the code
->
->> 'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
->
->> before smp_callin()(smpboot.c).
->
->> The follow is the starup process of BSP and AP.
->
->> BSP:
->
->> start_kernel()
->
->>     ->smp_init()
->
->>        ->smp_boot_cpus()
->
->>          ->do_boot_cpu()
->
->>              ->start_ip = trampoline_address(); //set the address that AP
->
->> will go to execute
->
->>              ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
->
->>              ->for (timeout = 0; timeout < 50000; timeout++)
->
->>                  if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;//
->
->> check if AP startup or not
->
->>
->
->> APs:
->
->> ENTRY(trampoline_data) (trampoline_64.S)
->
->>         ->ENTRY(secondary_startup_64) (head_64.S)
->
->>            ->start_secondary() (smpboot.c)
->
->>               ->cpu_init();
->
->>               ->smp_callin();
->
->>                   ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
->
->> comes here, the BSP will not prints the error message.
->
->>
->
->>   From above call process, we can be sure that, the AP has been stuck
->
->> between trampoline_data and the cpumask_set_cpu() in
->
->> smp_callin(), we look through these codes path carefully, and only found a
->
->> 'hlt' instruct that could block the process.
->
->> It is located in trampoline_data():
->
->>
->
->> ENTRY(trampoline_data)
->
->>           ...
->
->>
->
->>    call    verify_cpu              # Verify the cpu supports long mode
->
->>    testl   %eax, %eax              # Check for return code
->
->>    jnz     no_longmode
->
->>
->
->>           ...
->
->>
->
->> no_longmode:
->
->>    hlt
->
->>    jmp no_longmode
->
->>
->
->> For the process verify_cpu(),
->
->> we can only find the 'cpuid' sensitive instruct that could lead VM exit
->
->> from No-root mode.
->
->> This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading
->
->> to the fail in verify_cpu.
->
->>
->
->>   From the message in VM, we know vcpu1 and vcpu7 is something wrong.
->
->> [    5.060042] CPU1: Stuck ??
->
->> [   10.170815] CPU7: Stuck ??
->
->> [   10.171648] Brought up 6 CPUs
->
->>
->
->> Besides, the follow is the cpus message got from host.
->
->> 80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
->
->> qemu-monitor-command instance-0000000
->
->> * CPU #0: pc=0x00007f64160c683d thread_id=68570
->
->>     CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
->
->>     CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
->
->>     CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
->
->>     CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
->
->>     CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
->
->>     CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
->
->>     CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
->
->>
->
->> Oh, i also forgot to mention in the above message that, we have bond each
->
->> vCPU to different physical CPU in
->
->> host.
->
->>
->
->> Thanks,
->
->> zhanghailiang
->
->>
->
->>
->
->>
->
->>
->
->> --
->
->> To unsubscribe from this list: send the line "unsubscribe kvm" in
->
->> the body of a message to address@hidden
->
->> More majordomo info at
-http://vger.kernel.org/majordomo-info.html
->
->
->
->
->
-> .
->
->
->
->
->
-
-On 2015/7/7 20:21, Igor Mammedov wrote:
-On Tue, 7 Jul 2015 19:43:35 +0800
-zhanghailiang <address@hidden> wrote:
-On 2015/7/7 19:23, Igor Mammedov wrote:
-On Mon, 6 Jul 2015 17:59:10 +0800
-zhanghailiang <address@hidden> wrote:
-On 2015/7/6 16:45, Paolo Bonzini wrote:
-On 06/07/2015 09:54, zhanghailiang wrote:
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
-consuming any cpu (Should be in idle state),
-All of VCPUs' stacks in host is like bellow:
-
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
-[<ffffffffffffffff>] 0xffffffffffffffff
-
-We looked into the kernel codes that could leading to the above 'Stuck'
-warning,
-in current upstream there isn't any printk(...Stuck...) left since that code 
-path
-has been reworked.
-I've often seen this on over-committed host during guest CPUs up/down torture 
-test.
-Could you update guest kernel to upstream and see if issue reproduces?
-Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to 
-reproduce it.
-
-For your test case, is it a kernel bug?
-Or is there any related patch could solve your test problem been merged into
-upstream ?
-I don't remember all prerequisite patches but you should be able to find
-http://marc.info/?l=linux-kernel&m=140326703108009&w=2
-"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
-and then look for dependencies.
-Er, we have investigated this patch, and it is not related to our problem, :)
-
-Thanks.
-Thanks,
-zhanghailiang
-and found that the only possible is the emulation of 'cpuid' instruct in
-kvm/qemu has something wrong.
-But since we canât reproduce this problem, we are not quite sure.
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-Can you explain the relationship to the cpuid emulation?  What do the
-traces say about vcpus 1 and 7?
-OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
-located in
-do_boot_cpu(). It's in BSP context, the call process is:
-BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
--> wakeup_secondary_via_INIT() to trigger APs.
-It will wait 5s for APs to startup, if some AP not startup normally, it will 
-print 'CPU%d Stuck' or 'CPU%d: Not responding'.
-
-If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
-begins to execute the code
-'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
-smp_callin()(smpboot.c).
-The follow is the starup process of BSP and AP.
-BSP:
-start_kernel()
-     ->smp_init()
-        ->smp_boot_cpus()
-          ->do_boot_cpu()
-              ->start_ip = trampoline_address(); //set the address that AP will 
-go to execute
-              ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
-              ->for (timeout = 0; timeout < 50000; timeout++)
-                  if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if 
-AP startup or not
-
-APs:
-ENTRY(trampoline_data) (trampoline_64.S)
-         ->ENTRY(secondary_startup_64) (head_64.S)
-            ->start_secondary() (smpboot.c)
-               ->cpu_init();
-               ->smp_callin();
-                   ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP 
-comes here, the BSP will not prints the error message.
-
-   From above call process, we can be sure that, the AP has been stuck between 
-trampoline_data and the cpumask_set_cpu() in
-smp_callin(), we look through these codes path carefully, and only found a 
-'hlt' instruct that could block the process.
-It is located in trampoline_data():
-
-ENTRY(trampoline_data)
-           ...
-
-        call    verify_cpu              # Verify the cpu supports long mode
-        testl   %eax, %eax              # Check for return code
-        jnz     no_longmode
-
-           ...
-
-no_longmode:
-        hlt
-        jmp no_longmode
-
-For the process verify_cpu(),
-we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
-No-root mode.
-This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
-the fail in verify_cpu.
-
-   From the message in VM, we know vcpu1 and vcpu7 is something wrong.
-[    5.060042] CPU1: Stuck ??
-[   10.170815] CPU7: Stuck ??
-[   10.171648] Brought up 6 CPUs
-
-Besides, the follow is the cpus message got from host.
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
-instance-0000000
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
-     CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
-     CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
-     CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
-     CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
-     CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
-     CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
-     CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
-
-Oh, i also forgot to mention in the above message that, we have bond each vCPU 
-to different physical CPU in
-host.
-
-Thanks,
-zhanghailiang
-
-
-
-
---
-To unsubscribe from this list: send the line "unsubscribe kvm" in
-the body of a message to address@hidden
-More majordomo info at
-http://vger.kernel.org/majordomo-info.html
-.
-.
-
diff --git a/classification_output/04/other/31349848 b/classification_output/04/other/31349848
deleted file mode 100644
index 2177a1bc6..000000000
--- a/classification_output/04/other/31349848
+++ /dev/null
@@ -1,162 +0,0 @@
-other: 0.901
-device: 0.881
-graphic: 0.876
-assembly: 0.855
-vnc: 0.854
-semantic: 0.846
-socket: 0.846
-instruction: 0.845
-KVM: 0.827
-boot: 0.815
-mistranslation: 0.781
-network: 0.769
-
-[Qemu-devel]  [BUG] qemu stuck when detach host-usb device
-
-Description of problem:
-The guest has a host-usb device(Kingston Technology DataTraveler 100 G3/G4/SE9 
-G2), which is attached
-to xhci controller(on host). Qemu will stuck if I detach it from guest.
-
-How reproducible:
-100%
-
-Steps to Reproduce:
-1.            Use usb stick to copy files in guest , make it busy working.
-2.            virsh detach-device vm_name usb.xml
-
-Then qemu will stuck for 20s, I found this is because libusb_release_interface 
-block for 20s.
-Dmesg prints:
-
-[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
-[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
-[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
-[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
-
-Is this a hardware error or software's bug?
-
-On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote:
->
-Description of problem:
->
-The guest has a host-usb device(Kingston Technology DataTraveler 100
->
-G3/G4/SE9 G2), which is attached
->
-to xhci controller(on host). Qemu will stuck if I detach it from guest.
->
->
-How reproducible:
->
-100%
->
->
-Steps to Reproduce:
->
-1.            Use usb stick to copy files in guest , make it busy working.
->
-2.            virsh detach-device vm_name usb.xml
->
->
-Then qemu will stuck for 20s, I found this is because
->
-libusb_release_interface block for 20s.
->
-Dmesg prints:
->
->
-[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
->
-[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
->
-[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
->
-[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
->
->
-Is this a hardware error or software's bug?
-I'd guess software error, could be is libusb or (host) linux kernel.
-Cc'ing libusb-devel.
-
-cheers,
-  Gerd
-
->
------Original Message-----
->
-From: Gerd Hoffmann [
-mailto:address@hidden
->
-Sent: Tuesday, November 27, 2018 2:09 PM
->
-To: linzhecheng <address@hidden>
->
-Cc: address@hidden; wangxin (U) <address@hidden>;
->
-Zhoujian (jay) <address@hidden>; address@hidden
->
-Subject: Re: [Qemu-devel] [BUG] qemu stuck when detach host-usb device
->
->
-On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote:
->
-> Description of problem:
->
-> The guest has a host-usb device(Kingston Technology DataTraveler 100
->
-> G3/G4/SE9 G2), which is attached to xhci controller(on host). Qemu will
->
-> stuck
->
-if I detach it from guest.
->
->
->
-> How reproducible:
->
-> 100%
->
->
->
-> Steps to Reproduce:
->
-> 1.            Use usb stick to copy files in guest , make it busy working.
->
-> 2.            virsh detach-device vm_name usb.xml
->
->
->
-> Then qemu will stuck for 20s, I found this is because
->
-> libusb_release_interface
->
-block for 20s.
->
-> Dmesg prints:
->
->
->
-> [35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
->
-> [35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
->
-> [35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
->
-> [35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
->
->
->
-> Is this a hardware error or software's bug?
->
->
-I'd guess software error, could be is libusb or (host) linux kernel.
->
-Cc'ing libusb-devel.
-Perhaps it's usb driver's bug. Could you also reproduce it?
->
->
-cheers,
->
-Gerd
-
diff --git a/classification_output/04/other/32484936 b/classification_output/04/other/32484936
deleted file mode 100644
index 8cada8dae..000000000
--- a/classification_output/04/other/32484936
+++ /dev/null
@@ -1,231 +0,0 @@
-other: 0.856
-assembly: 0.835
-semantic: 0.832
-vnc: 0.830
-device: 0.830
-instruction: 0.829
-socket: 0.829
-graphic: 0.813
-network: 0.811
-boot: 0.810
-mistranslation: 0.794
-KVM: 0.793
-
-[Qemu-devel] [Snapshot Bug?]Qcow2 meta data corruption
-
-Hi all,
-There was a problem about qcow2 image file happened in my serval vms and I could not figure it out,
-so have to ask for some help.
-Here is the thing:
-At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms.
-parts of check report:
-3-Leaked cluster 2926229 refcount=1 reference=0
-4-Leaked cluster 3021181 refcount=1 reference=0
-5-Leaked cluster 3021182 refcount=1 reference=0
-6-Leaked cluster 3021183 refcount=1 reference=0
-7-Leaked cluster 3021184 refcount=1 reference=0
-8-ERROR cluster 3102547 refcount=3 reference=4
-9-ERROR cluster 3111536 refcount=3 reference=4
-10-ERROR cluster 3113369 refcount=3 reference=4
-11-ERROR cluster 3235590 refcount=10 reference=11
-12-ERROR cluster 3235591 refcount=10 reference=11
-423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts.
-424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts.
-426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts.
-After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms.
-Like this:
-table entry conflict (with our qcow2 check 
-tool):
-a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7
-b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7
-table entry conflict :
-a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19
-b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19
-table entry conflict :
-a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18
-b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18
-table entry conflict :
-a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17
-b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17
-table entry conflict :
-a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16
-b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16
-I think the problem is relate to the snapshot create, delete. But I cant reproduce it .
-Can Anyone give a hint about how this happen?
-Qemu version 2.0.1, I download the source code and make install it.
-Qemu parameters:
-/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1
-Thanks
-Sangfor VT.
-leijian
-
-Hi all,
-There was a problem about qcow2 image file happened in my serval vms and I could not figure it out,
-so have to ask for some help.
-Here is the thing:
-At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms.
-parts of check report:
-3-Leaked cluster 2926229 refcount=1 reference=0
-4-Leaked cluster 3021181 refcount=1 reference=0
-5-Leaked cluster 3021182 refcount=1 reference=0
-6-Leaked cluster 3021183 refcount=1 reference=0
-7-Leaked cluster 3021184 refcount=1 reference=0
-8-ERROR cluster 3102547 refcount=3 reference=4
-9-ERROR cluster 3111536 refcount=3 reference=4
-10-ERROR cluster 3113369 refcount=3 reference=4
-11-ERROR cluster 3235590 refcount=10 reference=11
-12-ERROR cluster 3235591 refcount=10 reference=11
-423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts.
-424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts.
-426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts.
-After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms.
-Like this:
-table entry conflict (with our qcow2 check 
-tool):
-a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7
-b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7
-table entry conflict :
-a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19
-b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19
-table entry conflict :
-a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18
-b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18
-table entry conflict :
-a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17
-b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17
-table entry conflict :
-a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16
-b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16
-I think the problem is relate to the snapshot create, delete. But I cant reproduce it .
-Can Anyone give a hint about how this happen?
-Qemu version 2.0.1, I download the source code and make install it.
-Qemu parameters:
-/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1
-Thanks
-Sangfor VT.
-leijian
-
-Am 03.04.2015 um 12:04 hat leijian geschrieben:
->
-Hi all,
->
->
-There was a problem about qcow2 image file happened in my serval vms and I
->
-could not figure it out,
->
-so have to ask for some help.
->
-[...]
->
-I think the problem is relate to the snapshot create, delete. But I cant
->
-reproduce it .
->
-Can Anyone give a hint about how this happen?
-How did you create/delete your snapshots?
-
-More specifically, did you take care to never access your image from
-more than one process (except if both are read-only)? It happens
-occasionally that people use 'qemu-img snapshot' while the VM is
-running. This is wrong and can corrupt the image.
-
-Kevin
-
-On 04/07/2015 03:33 AM, Kevin Wolf wrote:
->
-More specifically, did you take care to never access your image from
->
-more than one process (except if both are read-only)? It happens
->
-occasionally that people use 'qemu-img snapshot' while the VM is
->
-running. This is wrong and can corrupt the image.
-Since this has been done by more than one person, I'm wondering if there
-is something we can do in the qcow2 format itself to make it harder for
-the casual user to cause corruption.  Maybe if we declare some bit or
-extension header for an image open for writing, which other readers can
-use as a warning ("this image is being actively modified; reading it may
-fail"), and other writers can use to deny access ("another process is
-already modifying this image"), where a writer should set that bit
-before writing anything else in the file, then clear it on exit.  Of
-course, you'd need a way to override the bit to actively clear it to
-recover from the case of a writer dying unexpectedly without resetting
-it normally.  And it won't help the case of a reader opening the file
-first, followed by a writer, where the reader could still get thrown off
-track.
-
-Or maybe we could document in the qcow2 format that all readers and
-writers should attempt to obtain the appropriate flock() permissions [or
-other appropriate advisory locking scheme] over the file header, so that
-cooperating processes that both use advisory locking will know when the
-file is in use by another process.
-
--- 
-Eric Blake   eblake redhat com    +1-919-301-3266
-Libvirt virtualization library
-http://libvirt.org
-signature.asc
-Description:
-OpenPGP digital signature
-
-ï»¿
-I created/deleted the snapshot by using qmp command "snapshot_blkdev_internal"/"snapshot_delete_blkdev_internal", and for avoiding the case you mentioned above, I have added the flock() permission in the qemu_open().
-Here is the test of doing qemu-img snapshot to a running vm:
-Diskfile:/sf/data/36c81f660e38b3b001b183da50b477d89_f8bc123b3e74/images/host-f8bc123b3e74/4a8d8728fcdc/Devried30030.vm/vm-disk-1.qcow2 is used! errno=Resource temporarily unavailable
-Does the two cluster entry happen to be the same because of the refcount of using cluster decrease to 0 unexpectedly and  is allocated again?
-If it was not accessing the image from more than one process, any other exceptions I can test for?
-Thanks
-leijian
-From:
-Eric Blake
-Date:
-2015-04-07 23:27
-To:
-Kevin Wolf
-;
-leijian
-CC:
-qemu-devel
-;
-stefanha
-Subject:
-Re: [Qemu-devel] [Snapshot Bug?]Qcow2 meta data 
-corruption
-On 04/07/2015 03:33 AM, Kevin Wolf wrote:
-> More specifically, did you take care to never access your image from
-> more than one process (except if both are read-only)? It happens
-> occasionally that people use 'qemu-img snapshot' while the VM is
-> running. This is wrong and can corrupt the image.
-Since this has been done by more than one person, I'm wondering if there
-is something we can do in the qcow2 format itself to make it harder for
-the casual user to cause corruption.  Maybe if we declare some bit or
-extension header for an image open for writing, which other readers can
-use as a warning ("this image is being actively modified; reading it may
-fail"), and other writers can use to deny access ("another process is
-already modifying this image"), where a writer should set that bit
-before writing anything else in the file, then clear it on exit.  Of
-course, you'd need a way to override the bit to actively clear it to
-recover from the case of a writer dying unexpectedly without resetting
-it normally.  And it won't help the case of a reader opening the file
-first, followed by a writer, where the reader could still get thrown off
-track.
-Or maybe we could document in the qcow2 format that all readers and
-writers should attempt to obtain the appropriate flock() permissions [or
-other appropriate advisory locking scheme] over the file header, so that
-cooperating processes that both use advisory locking will know when the
-file is in use by another process.
---
-Eric Blake   eblake redhat com    +1-919-301-3266
-Libvirt virtualization library http://libvirt.org
-
diff --git a/classification_output/04/other/35170175 b/classification_output/04/other/35170175
deleted file mode 100644
index 9d4185af4..000000000
--- a/classification_output/04/other/35170175
+++ /dev/null
@@ -1,529 +0,0 @@
-other: 0.933
-graphic: 0.844
-instruction: 0.812
-semantic: 0.798
-device: 0.787
-assembly: 0.767
-boot: 0.719
-KVM: 0.709
-socket: 0.681
-network: 0.666
-mistranslation: 0.641
-vnc: 0.633
-
-[Qemu-devel] [BUG] QEMU crashes with dpdk virtio pmd
-
-Qemu crashes, with pre-condition:
-vm xml config with multiqueue, and the vm's driver virtio-net support 
-multi-queue
-
-reproduce steps:
-i. start dpdk testpmd in VM with the virtio nic
-ii. stop testpmd
-iii. reboot the VM
-
-This commit "f9d6dbf0  remove virtio queues if the guest doesn't support 
-multiqueue" is introduced.
-
-Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
-VM DPDK version:  DPDK-1.6.1
-
-Call Trace:
-#0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
-#1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
-#2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
-#3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
-#4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
-#5  0x00007f6088fea32c in iter_remove_or_steal () from 
-/usr/lib64/libglib-2.0.so.0
-#6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at 
-qom/object.c:410
-#7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
-#8  object_unref (address@hidden) at qom/object.c:903
-#9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at 
-git/qemu/exec.c:1154
-#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
-#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514
-#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
-
-Call Trace:
-#0  0x00007fdccaeb9790 in ?? ()
-#1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at 
-qom/object.c:405
-#2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
-#3  object_unref (address@hidden) at qom/object.c:903
-#4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at 
-git/qemu/exec.c:1154
-#5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
-#6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514
-#7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
-
-The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio 
-queues 
-if the guest doesn't support multiqueue. But it might be still referenced by 
-others (eg . virtio_net_set_status()),
-which need so set NULL.
-
-diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
-index 7d091c9..98bd683 100644
---- a/hw/net/virtio-net.c
-+++ b/hw/net/virtio-net.c
-@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index)
-     if (q->tx_timer) {
-         timer_del(q->tx_timer);
-         timer_free(q->tx_timer);
-+        q->tx_timer = NULL;
-     } else {
-         qemu_bh_delete(q->tx_bh);
-+        q->tx_bh = NULL;
-     }
-+    q->tx_waiting = 0;
-     virtio_del_queue(vdev, index * 2 + 1);
- }
-
-From: wangyunjian 
-Sent: Monday, April 24, 2017 6:10 PM
-To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' 
-<address@hidden>
-Cc: wangyunjian <address@hidden>; caihe <address@hidden>
-Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd 
-
-Qemu crashes, with pre-condition:
-vm xml config with multiqueue, and the vm's driver virtio-net support 
-multi-queue
-
-reproduce steps:
-i. start dpdk testpmd in VM with the virtio nic
-ii. stop testpmd
-iii. reboot the VM
-
-This commit "f9d6dbf0Â  remove virtio queues if the guest doesn't support 
-multiqueue" is introduced.
-
-Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
-VM DPDK version: Â DPDK-1.6.1
-
-Call Trace:
-#0Â  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
-#1Â  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
-#2Â  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
-#3Â  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
-#4Â  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
-#5Â  0x00007f6088fea32c in iter_remove_or_steal () from 
-/usr/lib64/libglib-2.0.so.0
-#6Â  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at 
-qom/object.c:410
-#7Â  object_finalize (data=0x7f6091e74800) at qom/object.c:467
-#8Â  object_unref (address@hidden) at qom/object.c:903
-#9Â  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at 
-git/qemu/exec.c:1154
-#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
-#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514
-#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
-
-Call Trace:
-#0Â  0x00007fdccaeb9790 in ?? ()
-#1Â  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at 
-qom/object.c:405
-#2Â  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
-#3Â  object_unref (address@hidden) at qom/object.c:903
-#4Â  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at 
-git/qemu/exec.c:1154
-#5Â  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
-#6Â  address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514
-#7Â  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#8Â  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#9Â  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
-
-On 2017å¹´04æ25æ¥ 19:37, wangyunjian wrote:
-The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio 
-queues
-if the guest doesn't support multiqueue. But it might be still referenced by 
-others (eg . virtio_net_set_status()),
-which need so set NULL.
-
-diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
-index 7d091c9..98bd683 100644
---- a/hw/net/virtio-net.c
-+++ b/hw/net/virtio-net.c
-@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index)
-      if (q->tx_timer) {
-          timer_del(q->tx_timer);
-          timer_free(q->tx_timer);
-+        q->tx_timer = NULL;
-      } else {
-          qemu_bh_delete(q->tx_bh);
-+        q->tx_bh = NULL;
-      }
-+    q->tx_waiting = 0;
-      virtio_del_queue(vdev, index * 2 + 1);
-  }
-Thanks a lot for the fix.
-
-Two questions:
-- If virtio_net_set_status() is the only function that may access tx_bh,
-it looks like setting tx_waiting to zero is sufficient?
-- Can you post a formal patch for this?
-
-Thanks
-From: wangyunjian
-Sent: Monday, April 24, 2017 6:10 PM
-To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' 
-<address@hidden>
-Cc: wangyunjian <address@hidden>; caihe <address@hidden>
-Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd
-
-Qemu crashes, with pre-condition:
-vm xml config with multiqueue, and the vm's driver virtio-net support 
-multi-queue
-
-reproduce steps:
-i. start dpdk testpmd in VM with the virtio nic
-ii. stop testpmd
-iii. reboot the VM
-
-This commit "f9d6dbf0  remove virtio queues if the guest doesn't support 
-multiqueue" is introduced.
-
-Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
-VM DPDK version:  DPDK-1.6.1
-
-Call Trace:
-#0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
-#1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
-#2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
-#3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
-#4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
-#5  0x00007f6088fea32c in iter_remove_or_steal () from 
-/usr/lib64/libglib-2.0.so.0
-#6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at 
-qom/object.c:410
-#7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
-#8  object_unref (address@hidden) at qom/object.c:903
-#9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at 
-git/qemu/exec.c:1154
-#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
-#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514
-#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
-
-Call Trace:
-#0  0x00007fdccaeb9790 in ?? ()
-#1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at 
-qom/object.c:405
-#2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
-#3  object_unref (address@hidden) at qom/object.c:903
-#4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at 
-git/qemu/exec.c:1154
-#5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
-#6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514
-#7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
-
-CCing Paolo and Stefan, since it has a relationship with bh in Qemu.
-
->
------Original Message-----
->
-From: Jason Wang [
-mailto:address@hidden
->
->
->
-On 2017å¹´04æ25æ¥ 19:37, wangyunjian wrote:
->
-> The q->tx_bh will free in virtio_net_del_queue() function, when remove
->
-> virtio
->
-queues
->
-> if the guest doesn't support multiqueue. But it might be still referenced by
->
-others (eg . virtio_net_set_status()),
->
-> which need so set NULL.
->
->
->
-> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
->
-> index 7d091c9..98bd683 100644
->
-> --- a/hw/net/virtio-net.c
->
-> +++ b/hw/net/virtio-net.c
->
-> @@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n,
->
-int index)
->
->       if (q->tx_timer) {
->
->           timer_del(q->tx_timer);
->
->           timer_free(q->tx_timer);
->
-> +        q->tx_timer = NULL;
->
->       } else {
->
->           qemu_bh_delete(q->tx_bh);
->
-> +        q->tx_bh = NULL;
->
->       }
->
-> +    q->tx_waiting = 0;
->
->       virtio_del_queue(vdev, index * 2 + 1);
->
->   }
->
->
-Thanks a lot for the fix.
->
->
-Two questions:
->
->
-- If virtio_net_set_status() is the only function that may access tx_bh,
->
-it looks like setting tx_waiting to zero is sufficient?
-Currently yes, but we don't assure that it works for all scenarios, so
-we set the tx_bh and tx_timer to NULL to avoid to possibly access wild pointer,
-which is the common method for usage of bh in Qemu.
-
-I have another question about the root cause of this issure.
-
-This below trace is the path of setting tx_waiting to one in 
-virtio_net_handle_tx_bh() :
-
-Breakpoint 1, virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at 
-/data/wyj/git/qemu/hw/net/virtio-net.c:1398
-1398    {
-(gdb) bt
-#0  virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at 
-/data/wyj/git/qemu/hw/net/virtio-net.c:1398
-#1  0x00007f3357bddf9c in virtio_bus_set_host_notifier (bus=<optimized out>, 
-address@hidden, address@hidden) at hw/virtio/virtio-bus.c:297
-#2  0x00007f3357a0055d in vhost_dev_disable_notifiers (address@hidden, 
-address@hidden) at /data/wyj/git/qemu/hw/virtio/vhost.c:1422
-#3  0x00007f33579e3373 in vhost_net_stop_one (net=0x7f335ad84dc0, 
-dev=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/vhost_net.c:289
-#4  0x00007f33579e385b in vhost_net_stop (address@hidden, ncs=<optimized out>, 
-address@hidden) at /data/wyj/git/qemu/hw/net/vhost_net.c:367
-#5  0x00007f33579e15de in virtio_net_vhost_status (status=<optimized out>, 
-n=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/virtio-net.c:176
-#6  virtio_net_set_status (vdev=0x7f335c6f5f90, status=0 '\000') at 
-/data/wyj/git/qemu/hw/net/virtio-net.c:250
-#7  0x00007f33579f8dc6 in virtio_set_status (address@hidden, address@hidden 
-'\000') at /data/wyj/git/qemu/hw/virtio/virtio.c:1146
-#8  0x00007f3357bdd3cc in virtio_ioport_write (val=0, addr=18, 
-opaque=0x7f335c6eda80) at hw/virtio/virtio-pci.c:387
-#9  virtio_pci_config_write (opaque=0x7f335c6eda80, addr=18, val=0, 
-size=<optimized out>) at hw/virtio/virtio-pci.c:511
-#10 0x00007f33579b2155 in memory_region_write_accessor (mr=0x7f335c6ee470, 
-addr=18, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized 
-out>, attrs=...) at /data/wyj/git/qemu/memory.c:526
-#11 0x00007f33579af2e9 in access_with_adjusted_size (address@hidden, 
-address@hidden, address@hidden, access_size_min=<optimized out>, 
-access_size_max=<optimized out>, address@hidden
-    0x7f33579b20f0 <memory_region_write_accessor>, address@hidden, 
-address@hidden) at /data/wyj/git/qemu/memory.c:592
-#12 0x00007f33579b2e15 in memory_region_dispatch_write (address@hidden, 
-address@hidden, data=0, address@hidden, address@hidden) at 
-/data/wyj/git/qemu/memory.c:1319
-#13 0x00007f335796cd93 in address_space_write_continue (mr=0x7f335c6ee470, l=1, 
-addr1=18, len=1, buf=0x7f335773d000 "", attrs=..., addr=49170, 
-as=0x7f3358317060 <address_space_io>) at /data/wyj/git/qemu/exec.c:2834
-#14 address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., 
-buf=<optimized out>, len=<optimized out>) at /data/wyj/git/qemu/exec.c:2879
-#15 0x00007f335796d3ad in address_space_rw (as=<optimized out>, address@hidden, 
-attrs=..., address@hidden, buf=<optimized out>, address@hidden, address@hidden) 
-at /data/wyj/git/qemu/exec.c:2981
-#16 0x00007f33579ae226 in kvm_handle_io (count=1, size=1, direction=<optimized 
-out>, data=<optimized out>, attrs=..., port=49170) at 
-/data/wyj/git/qemu/kvm-all.c:1803
-#17 kvm_cpu_exec (address@hidden) at /data/wyj/git/qemu/kvm-all.c:2032
-#18 0x00007f335799b632 in qemu_kvm_cpu_thread_fn (arg=0x7f335ae82070) at 
-/data/wyj/git/qemu/cpus.c:1118
-#19 0x00007f3352983dc5 in start_thread () from /usr/lib64/libpthread.so.0
-#20 0x00007f335113571d in clone () from /usr/lib64/libc.so.6
-
-It calls qemu_bh_schedule(q->tx_bh) at the bottom of virtio_net_handle_tx_bh(),
-I don't know why virtio_net_tx_bh() doesn't be invoked, so that the 
-q->tx_waiting is not zero.
-[ps: we added logs in virtio_net_tx_bh() to verify that]
-
-Some other information: 
-
-It won't crash if we don't use vhost-net.
-
-
-Thanks,
--Gonglei
-
->
-- Can you post a formal patch for this?
->
->
-Thanks
->
->
-> From: wangyunjian
->
-> Sent: Monday, April 24, 2017 6:10 PM
->
-> To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason
->
-Wang' <address@hidden>
->
-> Cc: wangyunjian <address@hidden>; caihe <address@hidden>
->
-> Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd
->
->
->
-> Qemu crashes, with pre-condition:
->
-> vm xml config with multiqueue, and the vm's driver virtio-net support
->
-multi-queue
->
->
->
-> reproduce steps:
->
-> i. start dpdk testpmd in VM with the virtio nic
->
-> ii. stop testpmd
->
-> iii. reboot the VM
->
->
->
-> This commit "f9d6dbf0  remove virtio queues if the guest doesn't support
->
-multiqueue" is introduced.
->
->
->
-> Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
->
-> VM DPDK version:  DPDK-1.6.1
->
->
->
-> Call Trace:
->
-> #0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
->
-> #1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
->
-> #2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
->
-> #3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
->
-> #4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
->
-> #5  0x00007f6088fea32c in iter_remove_or_steal () from
->
-/usr/lib64/libglib-2.0.so.0
->
-> #6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800)
->
-at qom/object.c:410
->
-> #7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
->
-> #8  object_unref (address@hidden) at qom/object.c:903
->
-> #9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at
->
-git/qemu/exec.c:1154
->
-> #10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
->
-> #11 address_space_dispatch_free (d=0x7f6090b72b90) at
->
-git/qemu/exec.c:2514
->
-> #12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at
->
-util/rcu.c:272
->
-> #13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
->
-> #14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
->
->
->
-> Call Trace:
->
-> #0  0x00007fdccaeb9790 in ?? ()
->
-> #1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at
->
-qom/object.c:405
->
-> #2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
->
-> #3  object_unref (address@hidden) at qom/object.c:903
->
-> #4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at
->
-git/qemu/exec.c:1154
->
-> #5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
->
-> #6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at
->
-git/qemu/exec.c:2514
->
-> #7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at
->
-util/rcu.c:272
->
-> #8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
->
-> #9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
->
->
->
->
-
-On 25/04/2017 14:02, Jason Wang wrote:
->
->
-Thanks a lot for the fix.
->
->
-Two questions:
->
->
-- If virtio_net_set_status() is the only function that may access tx_bh,
->
-it looks like setting tx_waiting to zero is sufficient?
-I think clearing tx_bh is better anyway, as leaving a dangling pointer
-is not very hygienic.
-
-Paolo
-
->
-- Can you post a formal patch for this?
-
diff --git a/classification_output/04/other/42974450 b/classification_output/04/other/42974450
deleted file mode 100644
index 05f9accaa..000000000
--- a/classification_output/04/other/42974450
+++ /dev/null
@@ -1,437 +0,0 @@
-other: 0.940
-device: 0.921
-instruction: 0.920
-semantic: 0.917
-assembly: 0.917
-boot: 0.909
-network: 0.907
-graphic: 0.906
-KVM: 0.901
-socket: 0.899
-mistranslation: 0.882
-vnc: 0.869
-
-[Bug Report] Possible Missing Endianness Conversion
-
-The virtio packed virtqueue support patch[1] suggests converting
-endianness by lines:
-
-virtio_tswap16s(vdev, &e->off_wrap);
-virtio_tswap16s(vdev, &e->flags);
-
-Though both of these conversion statements aren't present in the
-latest qemu code here[2]
-
-Is this intentional?
-
-[1]:
-https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
-[2]:
-https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
-
-CCing Jason.
-
-On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->
->
-The virtio packed virtqueue support patch[1] suggests converting
->
-endianness by lines:
->
->
-virtio_tswap16s(vdev, &e->off_wrap);
->
-virtio_tswap16s(vdev, &e->flags);
->
->
-Though both of these conversion statements aren't present in the
->
-latest qemu code here[2]
->
->
-Is this intentional?
-Good catch!
-
-It looks like it was removed (maybe by mistake) by commit
-d152cdd6f6 ("virtio: use virtio accessor to access packed event")
-
-Jason can you confirm that?
-
-Thanks,
-Stefano
-
->
->
-[1]:
-https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
->
-[2]:
-https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
->
-
-On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
->
->
-CCing Jason.
->
->
-On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->
->
->
-> The virtio packed virtqueue support patch[1] suggests converting
->
-> endianness by lines:
->
->
->
-> virtio_tswap16s(vdev, &e->off_wrap);
->
-> virtio_tswap16s(vdev, &e->flags);
->
->
->
-> Though both of these conversion statements aren't present in the
->
-> latest qemu code here[2]
->
->
->
-> Is this intentional?
->
->
-Good catch!
->
->
-It looks like it was removed (maybe by mistake) by commit
->
-d152cdd6f6 ("virtio: use virtio accessor to access packed event")
-That commit changes from:
-
--    address_space_read_cached(cache, off_off, &e->off_wrap,
--                              sizeof(e->off_wrap));
--    virtio_tswap16s(vdev, &e->off_wrap);
-
-which does a byte read of 2 bytes and then swaps the bytes
-depending on the host endianness and the value of
-virtio_access_is_big_endian()
-
-to this:
-
-+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-
-virtio_lduw_phys_cached() is a small function which calls
-either lduw_be_phys_cached() or lduw_le_phys_cached()
-depending on the value of virtio_access_is_big_endian().
-(And lduw_be_phys_cached() and lduw_le_phys_cached() do
-the right thing for the host-endianness to do a "load
-a specifically big or little endian 16-bit value".)
-
-Which is to say that because we use a load/store function that's
-explicit about the size of the data type it is accessing, the
-function itself can handle doing the load as big or little
-endian, rather than the calling code having to do a manual swap after
-it has done a load-as-bag-of-bytes. This is generally preferable
-as it's less error-prone.
-
-(Explicit swap-after-loading still has a place where the
-code is doing a load of a whole structure out of the
-guest and then swapping each struct field after the fact,
-because it means we can do a single load-from-guest-memory
-rather than a whole sequence of calls all the way down
-through the memory subsystem.)
-
-thanks
--- PMM
-
-On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
-On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
-CCing Jason.
-
-On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->
-> The virtio packed virtqueue support patch[1] suggests converting
-> endianness by lines:
->
-> virtio_tswap16s(vdev, &e->off_wrap);
-> virtio_tswap16s(vdev, &e->flags);
->
-> Though both of these conversion statements aren't present in the
-> latest qemu code here[2]
->
-> Is this intentional?
-
-Good catch!
-
-It looks like it was removed (maybe by mistake) by commit
-d152cdd6f6 ("virtio: use virtio accessor to access packed event")
-That commit changes from:
-
--    address_space_read_cached(cache, off_off, &e->off_wrap,
--                              sizeof(e->off_wrap));
--    virtio_tswap16s(vdev, &e->off_wrap);
-
-which does a byte read of 2 bytes and then swaps the bytes
-depending on the host endianness and the value of
-virtio_access_is_big_endian()
-
-to this:
-
-+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-
-virtio_lduw_phys_cached() is a small function which calls
-either lduw_be_phys_cached() or lduw_le_phys_cached()
-depending on the value of virtio_access_is_big_endian().
-(And lduw_be_phys_cached() and lduw_le_phys_cached() do
-the right thing for the host-endianness to do a "load
-a specifically big or little endian 16-bit value".)
-
-Which is to say that because we use a load/store function that's
-explicit about the size of the data type it is accessing, the
-function itself can handle doing the load as big or little
-endian, rather than the calling code having to do a manual swap after
-it has done a load-as-bag-of-bytes. This is generally preferable
-as it's less error-prone.
-Thanks for the details!
-
-So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
-
-I mean:
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
-index 893a072c9d..2e5e67bdb9 100644
---- a/hw/virtio/virtio.c
-+++ b/hw/virtio/virtio.c
-@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
-     /* Make sure flags is seen before off_wrap */
-     smp_rmb();
-     e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
--    virtio_tswap16s(vdev, &e->flags);
- }
-
- static void vring_packed_off_wrap_write(VirtIODevice *vdev,
-
-Thanks,
-Stefano
-(Explicit swap-after-loading still has a place where the
-code is doing a load of a whole structure out of the
-guest and then swapping each struct field after the fact,
-because it means we can do a single load-from-guest-memory
-rather than a whole sequence of calls all the way down
-through the memory subsystem.)
-
-thanks
--- PMM
-
-On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
->
->
-On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
->
->On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
->
->>
->
->> CCing Jason.
->
->>
->
->> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->
->> >
->
->> > The virtio packed virtqueue support patch[1] suggests converting
->
->> > endianness by lines:
->
->> >
->
->> > virtio_tswap16s(vdev, &e->off_wrap);
->
->> > virtio_tswap16s(vdev, &e->flags);
->
->> >
->
->> > Though both of these conversion statements aren't present in the
->
->> > latest qemu code here[2]
->
->> >
->
->> > Is this intentional?
->
->>
->
->> Good catch!
->
->>
->
->> It looks like it was removed (maybe by mistake) by commit
->
->> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
->
->
->
->That commit changes from:
->
->
->
->-    address_space_read_cached(cache, off_off, &e->off_wrap,
->
->-                              sizeof(e->off_wrap));
->
->-    virtio_tswap16s(vdev, &e->off_wrap);
->
->
->
->which does a byte read of 2 bytes and then swaps the bytes
->
->depending on the host endianness and the value of
->
->virtio_access_is_big_endian()
->
->
->
->to this:
->
->
->
->+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
->
->
->
->virtio_lduw_phys_cached() is a small function which calls
->
->either lduw_be_phys_cached() or lduw_le_phys_cached()
->
->depending on the value of virtio_access_is_big_endian().
->
->(And lduw_be_phys_cached() and lduw_le_phys_cached() do
->
->the right thing for the host-endianness to do a "load
->
->a specifically big or little endian 16-bit value".)
->
->
->
->Which is to say that because we use a load/store function that's
->
->explicit about the size of the data type it is accessing, the
->
->function itself can handle doing the load as big or little
->
->endian, rather than the calling code having to do a manual swap after
->
->it has done a load-as-bag-of-bytes. This is generally preferable
->
->as it's less error-prone.
->
->
-Thanks for the details!
->
->
-So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
->
->
-I mean:
->
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
->
-index 893a072c9d..2e5e67bdb9 100644
->
---- a/hw/virtio/virtio.c
->
-+++ b/hw/virtio/virtio.c
->
-@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
->
-/* Make sure flags is seen before off_wrap */
->
-smp_rmb();
->
-e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
->
--    virtio_tswap16s(vdev, &e->flags);
->
-}
-That definitely looks like it's probably not correct...
-
--- PMM
-
-On Fri, Jun 28, 2024 at 03:53:09PM GMT, Peter Maydell wrote:
-On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
-On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
->On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
->>
->> CCing Jason.
->>
->> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->> >
->> > The virtio packed virtqueue support patch[1] suggests converting
->> > endianness by lines:
->> >
->> > virtio_tswap16s(vdev, &e->off_wrap);
->> > virtio_tswap16s(vdev, &e->flags);
->> >
->> > Though both of these conversion statements aren't present in the
->> > latest qemu code here[2]
->> >
->> > Is this intentional?
->>
->> Good catch!
->>
->> It looks like it was removed (maybe by mistake) by commit
->> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
->
->That commit changes from:
->
->-    address_space_read_cached(cache, off_off, &e->off_wrap,
->-                              sizeof(e->off_wrap));
->-    virtio_tswap16s(vdev, &e->off_wrap);
->
->which does a byte read of 2 bytes and then swaps the bytes
->depending on the host endianness and the value of
->virtio_access_is_big_endian()
->
->to this:
->
->+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
->
->virtio_lduw_phys_cached() is a small function which calls
->either lduw_be_phys_cached() or lduw_le_phys_cached()
->depending on the value of virtio_access_is_big_endian().
->(And lduw_be_phys_cached() and lduw_le_phys_cached() do
->the right thing for the host-endianness to do a "load
->a specifically big or little endian 16-bit value".)
->
->Which is to say that because we use a load/store function that's
->explicit about the size of the data type it is accessing, the
->function itself can handle doing the load as big or little
->endian, rather than the calling code having to do a manual swap after
->it has done a load-as-bag-of-bytes. This is generally preferable
->as it's less error-prone.
-
-Thanks for the details!
-
-So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
-
-I mean:
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
-index 893a072c9d..2e5e67bdb9 100644
---- a/hw/virtio/virtio.c
-+++ b/hw/virtio/virtio.c
-@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
-      /* Make sure flags is seen before off_wrap */
-      smp_rmb();
-      e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
--    virtio_tswap16s(vdev, &e->flags);
-  }
-That definitely looks like it's probably not correct...
-Yeah, I just sent that patch:
-20240701075208.19634-1-sgarzare@redhat.com
-">https://lore.kernel.org/qemu-devel/
-20240701075208.19634-1-sgarzare@redhat.com
-We can continue the discussion there.
-
-Thanks,
-Stefano
-
diff --git a/classification_output/04/other/55247116 b/classification_output/04/other/55247116
deleted file mode 100644
index a42111a5e..000000000
--- a/classification_output/04/other/55247116
+++ /dev/null
@@ -1,1318 +0,0 @@
-other: 0.945
-assembly: 0.938
-graphic: 0.933
-socket: 0.929
-semantic: 0.928
-instruction: 0.928
-device: 0.919
-boot: 0.918
-network: 0.916
-vnc: 0.916
-KVM: 0.894
-mistranslation: 0.841
-
-[Qemu-devel]  [RFC/BUG] xen-mapcache: buggy invalidate map cache?
-
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
-instead of first level entry (if map to rom other than guest memory
-comes first), while in xen_invalidate_map_cache(), when VM ballooned
-out memory, qemu did not invalidate cache entries in linked
-list(entry->next), so when VM balloon back in memory, gfns probably
-mapped to different mfns, thus if guest asks device to DMA to these
-GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should also be
-checked and invalidated.
-
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-Hi,
->
->
-In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-instead of first level entry (if map to rom other than guest memory
->
-comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-out memory, qemu did not invalidate cache entries in linked
->
-list(entry->next), so when VM balloon back in memory, gfns probably
->
-mapped to different mfns, thus if guest asks device to DMA to these
->
-GPA, qemu may DMA to stale MFNs.
->
->
-So I think in xen_invalidate_map_cache() linked lists should also be
->
-checked and invalidated.
->
->
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-Added Jun Nakajima and Alexander Graf
-
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> Hi,
->
->
->
-> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-> instead of first level entry (if map to rom other than guest memory
->
-> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-> out memory, qemu did not invalidate cache entries in linked
->
-> list(entry->next), so when VM balloon back in memory, gfns probably
->
-> mapped to different mfns, thus if guest asks device to DMA to these
->
-> GPA, qemu may DMA to stale MFNs.
->
->
->
-> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> checked and invalidated.
->
->
->
-> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->
-Added Jun Nakajima and Alexander Graf
-And correct Stefano Stabellini's email address.
-
-On Mon, 10 Apr 2017 00:36:02 +0800
-hrg <address@hidden> wrote:
-
-Hi,
-
->
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
->> Hi,
->
->>
->
->> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
->> instead of first level entry (if map to rom other than guest memory
->
->> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
->> out memory, qemu did not invalidate cache entries in linked
->
->> list(entry->next), so when VM balloon back in memory, gfns probably
->
->> mapped to different mfns, thus if guest asks device to DMA to these
->
->> GPA, qemu may DMA to stale MFNs.
->
->>
->
->> So I think in xen_invalidate_map_cache() linked lists should also be
->
->> checked and invalidated.
->
->>
->
->> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->
->
-> Added Jun Nakajima and Alexander Graf
->
-And correct Stefano Stabellini's email address.
-There is a real issue with the xen-mapcache corruption in fact. I encountered
-it a few months ago while experimenting with Q35 support on Xen. Q35 emulation
-uses an AHCI controller by default, along with NCQ mode enabled. The issue can
-be (somewhat) easily reproduced there, though using a normal i440 emulation
-might possibly allow to reproduce the issue as well, using a dedicated test
-code from a guest side. In case of Q35+NCQ the issue can be reproduced "as is".
-
-The issue occurs when a guest domain performs an intensive disk I/O, ex. while
-guest OS booting. QEMU crashes with "Bad ram offset 980aa000"
-message logged, where the address is different each time. The hard thing with
-this issue is that it has a very low reproducibility rate.
-
-The corruption happens when there are multiple I/O commands in the NCQ queue.
-So there are overlapping emulated DMA operations in flight and QEMU uses a
-sequence of mapcache actions which can be executed in the "wrong" order thus
-leading to an inconsistent xen-mapcache - so a bad address from the wrong
-entry is returned.
-
-The bad thing with this issue is that QEMU crash due to "Bad ram offset"
-appearance is a relatively good situation in the sense that this is a caught
-error. But there might be a much worse (artificial) situation where the returned
-address looks valid but points to a different mapped memory.
-
-The fix itself is not hard (ex. an additional checked field in MapCacheEntry),
-but there is a need of some reliable way to test it considering the low
-reproducibility rate.
-
-Regards,
-Alex
-
-On Mon, 10 Apr 2017, hrg wrote:
->
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
->> Hi,
->
->>
->
->> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
->> instead of first level entry (if map to rom other than guest memory
->
->> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
->> out memory, qemu did not invalidate cache entries in linked
->
->> list(entry->next), so when VM balloon back in memory, gfns probably
->
->> mapped to different mfns, thus if guest asks device to DMA to these
->
->> GPA, qemu may DMA to stale MFNs.
->
->>
->
->> So I think in xen_invalidate_map_cache() linked lists should also be
->
->> checked and invalidated.
->
->>
->
->> Whatâs your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-On Mon, 10 Apr 2017, hrg wrote:
->
-> On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> >> Hi,
->
-> >>
->
-> >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-> >> instead of first level entry (if map to rom other than guest memory
->
-> >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-> >> out memory, qemu did not invalidate cache entries in linked
->
-> >> list(entry->next), so when VM balloon back in memory, gfns probably
->
-> >> mapped to different mfns, thus if guest asks device to DMA to these
->
-> >> GPA, qemu may DMA to stale MFNs.
->
-> >>
->
-> >> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> >> checked and invalidated.
->
-> >>
->
-> >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->
-Yes, you are right. We need to go through the list for each element of
->
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
->
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> On Mon, 10 Apr 2017, hrg wrote:
->
-> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> > >> Hi,
->
-> > >>
->
-> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-> > >> instead of first level entry (if map to rom other than guest memory
->
-> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-> > >> out memory, qemu did not invalidate cache entries in linked
->
-> > >> list(entry->next), so when VM balloon back in memory, gfns probably
->
-> > >> mapped to different mfns, thus if guest asks device to DMA to these
->
-> > >> GPA, qemu may DMA to stale MFNs.
->
-> > >>
->
-> > >> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> > >> checked and invalidated.
->
-> > >>
->
-> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->
->
-> Yes, you are right. We need to go through the list for each element of
->
-> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
->
-I spoke too soon. In the regular case there should be no locked mappings
->
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-beginning of the functions). Without locked mappings, there should never
->
-be more than one element in each list (see xen_map_cache_unlocked:
->
-entry->lock == true is a necessary condition to append a new entry to
->
-the list, otherwise it is just remapped).
->
->
-Can you confirm that what you are seeing are locked mappings
->
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-
-On Tue, 11 Apr 2017, hrg wrote:
->
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-<address@hidden> wrote:
->
-> On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
->> On Mon, 10 Apr 2017, hrg wrote:
->
->> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
->> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
->> > >> Hi,
->
->> > >>
->
->> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
->> > >> instead of first level entry (if map to rom other than guest memory
->
->> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
->> > >> out memory, qemu did not invalidate cache entries in linked
->
->> > >> list(entry->next), so when VM balloon back in memory, gfns probably
->
->> > >> mapped to different mfns, thus if guest asks device to DMA to these
->
->> > >> GPA, qemu may DMA to stale MFNs.
->
->> > >>
->
->> > >> So I think in xen_invalidate_map_cache() linked lists should also be
->
->> > >> checked and invalidated.
->
->> > >>
->
->> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->>
->
->> Yes, you are right. We need to go through the list for each element of
->
->> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
->
->
-> I spoke too soon. In the regular case there should be no locked mappings
->
-> when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> beginning of the functions). Without locked mappings, there should never
->
-> be more than one element in each list (see xen_map_cache_unlocked:
->
-> entry->lock == true is a necessary condition to append a new entry to
->
-> the list, otherwise it is just remapped).
->
->
->
-> Can you confirm that what you are seeing are locked mappings
->
-> when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
->
-In fact, I think the DPRINTF above is incorrect too. In
->
-pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-pci_add_option_rom->memory_region_get_ram_ptr (after
->
-memory_region_init_ram). So actually I think we should remove the
->
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-
-
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index e6b08e1..04f98b7 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
-is_default_rom,
-     }
- 
-     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
-+    xen_invalidate_map_cache_entry(ptr);
- }
- 
- static void pci_del_option_rom(PCIDevice *pdev)
-
-On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
-Stefano Stabellini <address@hidden> wrote:
-
->
-On Tue, 11 Apr 2017, hrg wrote:
->
-> On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> <address@hidden> wrote:
->
-> > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> >> On Mon, 10 Apr 2017, hrg wrote:
->
-> >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> >> > >> Hi,
->
-> >> > >>
->
-> >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> >> > >> entry->next instead of first level entry (if map to rom other than
->
-> >> > >> guest memory comes first), while in xen_invalidate_map_cache(),
->
-> >> > >> when VM ballooned out memory, qemu did not invalidate cache entries
->
-> >> > >> in linked list(entry->next), so when VM balloon back in memory,
->
-> >> > >> gfns probably mapped to different mfns, thus if guest asks device
->
-> >> > >> to DMA to these GPA, qemu may DMA to stale MFNs.
->
-> >> > >>
->
-> >> > >> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> >> > >> checked and invalidated.
->
-> >> > >>
->
-> >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
-> >>
->
-> >> Yes, you are right. We need to go through the list for each element of
->
-> >> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
-> >
->
-> > I spoke too soon. In the regular case there should be no locked mappings
->
-> > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> > beginning of the functions). Without locked mappings, there should never
->
-> > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > entry->lock == true is a necessary condition to append a new entry to
->
-> > the list, otherwise it is just remapped).
->
-> >
->
-> > Can you confirm that what you are seeing are locked mappings
->
-> > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
->
->
-> In fact, I think the DPRINTF above is incorrect too. In
->
-> pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> memory_region_init_ram). So actually I think we should remove the
->
-> DPRINTF warning as it is normal.
->
->
-Let me explain why the DPRINTF warning is there: emulated dma operations
->
-can involve locked mappings. Once a dma operation completes, the related
->
-mapping is unlocked and can be safely destroyed. But if we destroy a
->
-locked mapping in xen_invalidate_map_cache, while a dma is still
->
-ongoing, QEMU will crash. We cannot handle that case.
->
->
-However, the scenario you described is different. It has nothing to do
->
-with DMA. It looks like pci_add_option_rom calls
->
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-locked mapping and it is never unlocked or destroyed.
->
->
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-the append patch fix the problem you are seeing? For the proper fix, I
->
-think we probably need some sort of memory_region_unmap wrapper or maybe
->
-a call to address_space_unmap.
-Hmm, for some reason my message to the Xen-devel list got rejected but was sent
-to Qemu-devel instead, without any notice. Sorry if I'm missing something
-obvious as a list newbie.
-
-Stefano, hrg,
-
-There is an issue with inconsistency between the list of normal MapCacheEntry's
-and their 'reverse' counterparts - MapCacheRev's in locked_entries.
-When bad situation happens, there are multiple (locked) MapCacheEntry
-entries in the bucket's linked list along with a number of MapCacheRev's. And
-when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
-first list and calculates a wrong pointer from it which may then be caught with
-the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
-this issue as well I think.
-
-I'll try to provide a test code which can reproduce the issue from the
-guest side using an emulated IDE controller, though it's much simpler to achieve
-this result with an AHCI controller using multiple NCQ I/O commands. So far I've
-seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O
-DMA should be enough I think.
-
-On 2017/4/12 14:17, Alexey G wrote:
-On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
-Stefano Stabellini <address@hidden> wrote:
-On Tue, 11 Apr 2017, hrg wrote:
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
-On Mon, 10 Apr 2017, hrg wrote:
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in
-entry->next instead of first level entry (if map to rom other than
-guest memory comes first), while in xen_invalidate_map_cache(),
-when VM ballooned out memory, qemu did not invalidate cache entries
-in linked list(entry->next), so when VM balloon back in memory,
-gfns probably mapped to different mfns, thus if guest asks device
-to DMA to these GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should also be
-checked and invalidated.
-
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-Hmm, for some reason my message to the Xen-devel list got rejected but was sent
-to Qemu-devel instead, without any notice. Sorry if I'm missing something
-obvious as a list newbie.
-
-Stefano, hrg,
-
-There is an issue with inconsistency between the list of normal MapCacheEntry's
-and their 'reverse' counterparts - MapCacheRev's in locked_entries.
-When bad situation happens, there are multiple (locked) MapCacheEntry
-entries in the bucket's linked list along with a number of MapCacheRev's. And
-when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
-first list and calculates a wrong pointer from it which may then be caught with
-the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
-this issue as well I think.
-
-I'll try to provide a test code which can reproduce the issue from the
-guest side using an emulated IDE controller, though it's much simpler to achieve
-this result with an AHCI controller using multiple NCQ I/O commands. So far I've
-seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O
-DMA should be enough I think.
-Yes, I think there may be other bugs lurking, considering the complexity, 
-though we need to reproduce it if we want to delve into it.
-
-On Wed, 12 Apr 2017, Alexey G wrote:
->
-On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
->
-Stefano Stabellini <address@hidden> wrote:
->
->
-> On Tue, 11 Apr 2017, hrg wrote:
->
-> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> > <address@hidden> wrote:
->
-> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> > >> On Mon, 10 Apr 2017, hrg wrote:
->
-> > >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> > >> > >> Hi,
->
-> > >> > >>
->
-> > >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> > >> > >> entry->next instead of first level entry (if map to rom other than
->
-> > >> > >> guest memory comes first), while in xen_invalidate_map_cache(),
->
-> > >> > >> when VM ballooned out memory, qemu did not invalidate cache
->
-> > >> > >> entries
->
-> > >> > >> in linked list(entry->next), so when VM balloon back in memory,
->
-> > >> > >> gfns probably mapped to different mfns, thus if guest asks device
->
-> > >> > >> to DMA to these GPA, qemu may DMA to stale MFNs.
->
-> > >> > >>
->
-> > >> > >> So I think in xen_invalidate_map_cache() linked lists should also
->
-> > >> > >> be
->
-> > >> > >> checked and invalidated.
->
-> > >> > >>
->
-> > >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
-> > >>
->
-> > >> Yes, you are right. We need to go through the list for each element of
->
-> > >> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
-> > >
->
-> > > I spoke too soon. In the regular case there should be no locked mappings
->
-> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> > > beginning of the functions). Without locked mappings, there should never
->
-> > > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > > entry->lock == true is a necessary condition to append a new entry to
->
-> > > the list, otherwise it is just remapped).
->
-> > >
->
-> > > Can you confirm that what you are seeing are locked mappings
->
-> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
-> >
->
-> > In fact, I think the DPRINTF above is incorrect too. In
->
-> > pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> > pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> > memory_region_init_ram). So actually I think we should remove the
->
-> > DPRINTF warning as it is normal.
->
->
->
-> Let me explain why the DPRINTF warning is there: emulated dma operations
->
-> can involve locked mappings. Once a dma operation completes, the related
->
-> mapping is unlocked and can be safely destroyed. But if we destroy a
->
-> locked mapping in xen_invalidate_map_cache, while a dma is still
->
-> ongoing, QEMU will crash. We cannot handle that case.
->
->
->
-> However, the scenario you described is different. It has nothing to do
->
-> with DMA. It looks like pci_add_option_rom calls
->
-> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-> locked mapping and it is never unlocked or destroyed.
->
->
->
-> It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-> the append patch fix the problem you are seeing? For the proper fix, I
->
-> think we probably need some sort of memory_region_unmap wrapper or maybe
->
-> a call to address_space_unmap.
->
->
-Hmm, for some reason my message to the Xen-devel list got rejected but was
->
-sent
->
-to Qemu-devel instead, without any notice. Sorry if I'm missing something
->
-obvious as a list newbie.
->
->
-Stefano, hrg,
->
->
-There is an issue with inconsistency between the list of normal
->
-MapCacheEntry's
->
-and their 'reverse' counterparts - MapCacheRev's in locked_entries.
->
-When bad situation happens, there are multiple (locked) MapCacheEntry
->
-entries in the bucket's linked list along with a number of MapCacheRev's. And
->
-when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
->
-first list and calculates a wrong pointer from it which may then be caught
->
-with
->
-the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
->
-this issue as well I think.
->
->
-I'll try to provide a test code which can reproduce the issue from the
->
-guest side using an emulated IDE controller, though it's much simpler to
->
-achieve
->
-this result with an AHCI controller using multiple NCQ I/O commands. So far
->
-I've
->
-seen this issue only with Windows 7 (and above) guest on AHCI, but any block
->
-I/O
->
-DMA should be enough I think.
-That would be helpful. Please see if you can reproduce it after fixing
-the other issue (
-http://marc.info/?l=qemu-devel&m=149195042500707&w=2
-).
-
-On 2017/4/12 6:32, Stefano Stabellini wrote:
-On Tue, 11 Apr 2017, hrg wrote:
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
-On Mon, 10 Apr 2017, hrg wrote:
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
-instead of first level entry (if map to rom other than guest memory
-comes first), while in xen_invalidate_map_cache(), when VM ballooned
-out memory, qemu did not invalidate cache entries in linked
-list(entry->next), so when VM balloon back in memory, gfns probably
-mapped to different mfns, thus if guest asks device to DMA to these
-GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should also be
-checked and invalidated.
-
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-Yes, I think so, maybe this is the proper way to fix this.
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index e6b08e1..04f98b7 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
-is_default_rom,
-      }
-pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
-+    xen_invalidate_map_cache_entry(ptr);
-  }
-static void pci_del_option_rom(PCIDevice *pdev)
-
-On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
->
-On 2017/4/12 6:32, Stefano Stabellini wrote:
->
-> On Tue, 11 Apr 2017, hrg wrote:
->
-> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> > <address@hidden> wrote:
->
-> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> > > > On Mon, 10 Apr 2017, hrg wrote:
->
-> > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> > > > > > > Hi,
->
-> > > > > > >
->
-> > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> > > > > > > entry->next
->
-> > > > > > > instead of first level entry (if map to rom other than guest
->
-> > > > > > > memory
->
-> > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
->
-> > > > > > > ballooned
->
-> > > > > > > out memory, qemu did not invalidate cache entries in linked
->
-> > > > > > > list(entry->next), so when VM balloon back in memory, gfns
->
-> > > > > > > probably
->
-> > > > > > > mapped to different mfns, thus if guest asks device to DMA to
->
-> > > > > > > these
->
-> > > > > > > GPA, qemu may DMA to stale MFNs.
->
-> > > > > > >
->
-> > > > > > > So I think in xen_invalidate_map_cache() linked lists should
->
-> > > > > > > also be
->
-> > > > > > > checked and invalidated.
->
-> > > > > > >
->
-> > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct?
->
-> > > > Yes, you are right. We need to go through the list for each element of
->
-> > > > the array in xen_invalidate_map_cache. Can you come up with a patch?
->
-> > > I spoke too soon. In the regular case there should be no locked mappings
->
-> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> > > beginning of the functions). Without locked mappings, there should never
->
-> > > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > > entry->lock == true is a necessary condition to append a new entry to
->
-> > > the list, otherwise it is just remapped).
->
-> > >
->
-> > > Can you confirm that what you are seeing are locked mappings
->
-> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
-> > In fact, I think the DPRINTF above is incorrect too. In
->
-> > pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> > pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> > memory_region_init_ram). So actually I think we should remove the
->
-> > DPRINTF warning as it is normal.
->
-> Let me explain why the DPRINTF warning is there: emulated dma operations
->
-> can involve locked mappings. Once a dma operation completes, the related
->
-> mapping is unlocked and can be safely destroyed. But if we destroy a
->
-> locked mapping in xen_invalidate_map_cache, while a dma is still
->
-> ongoing, QEMU will crash. We cannot handle that case.
->
->
->
-> However, the scenario you described is different. It has nothing to do
->
-> with DMA. It looks like pci_add_option_rom calls
->
-> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-> locked mapping and it is never unlocked or destroyed.
->
->
->
-> It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-> the append patch fix the problem you are seeing? For the proper fix, I
->
-> think we probably need some sort of memory_region_unmap wrapper or maybe
->
-> a call to address_space_unmap.
->
->
-Yes, I think so, maybe this is the proper way to fix this.
-Would you be up for sending a proper patch and testing it? We cannot call
-xen_invalidate_map_cache_entry directly from pci.c though, it would need
-to be one of the other functions like address_space_unmap for example.
-
-
->
-> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
->
-> index e6b08e1..04f98b7 100644
->
-> --- a/hw/pci/pci.c
->
-> +++ b/hw/pci/pci.c
->
-> @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
->
-> is_default_rom,
->
->       }
->
->         pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
->
-> +    xen_invalidate_map_cache_entry(ptr);
->
->   }
->
->     static void pci_del_option_rom(PCIDevice *pdev)
-
-On 2017/4/13 7:51, Stefano Stabellini wrote:
-On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
-On 2017/4/12 6:32, Stefano Stabellini wrote:
-On Tue, 11 Apr 2017, hrg wrote:
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
-On Mon, 10 Apr 2017, hrg wrote:
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in
-entry->next
-instead of first level entry (if map to rom other than guest
-memory
-comes first), while in xen_invalidate_map_cache(), when VM
-ballooned
-out memory, qemu did not invalidate cache entries in linked
-list(entry->next), so when VM balloon back in memory, gfns
-probably
-mapped to different mfns, thus if guest asks device to DMA to
-these
-GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should
-also be
-checked and invalidated.
-
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-Yes, I think so, maybe this is the proper way to fix this.
-Would you be up for sending a proper patch and testing it? We cannot call
-xen_invalidate_map_cache_entry directly from pci.c though, it would need
-to be one of the other functions like address_space_unmap for example.
-Yes, I will look into this.
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index e6b08e1..04f98b7 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
-is_default_rom,
-       }
-         pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
-+    xen_invalidate_map_cache_entry(ptr);
-   }
-     static void pci_del_option_rom(PCIDevice *pdev)
-
-On Thu, 13 Apr 2017, Herongguang (Stephen) wrote:
->
-On 2017/4/13 7:51, Stefano Stabellini wrote:
->
-> On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
->
-> > On 2017/4/12 6:32, Stefano Stabellini wrote:
->
-> > > On Tue, 11 Apr 2017, hrg wrote:
->
-> > > > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> > > > <address@hidden> wrote:
->
-> > > > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> > > > > > On Mon, 10 Apr 2017, hrg wrote:
->
-> > > > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden>
->
-> > > > > > > wrote:
->
-> > > > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden>
->
-> > > > > > > > wrote:
->
-> > > > > > > > > Hi,
->
-> > > > > > > > >
->
-> > > > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> > > > > > > > > entry->next
->
-> > > > > > > > > instead of first level entry (if map to rom other than guest
->
-> > > > > > > > > memory
->
-> > > > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
->
-> > > > > > > > > ballooned
->
-> > > > > > > > > out memory, qemu did not invalidate cache entries in linked
->
-> > > > > > > > > list(entry->next), so when VM balloon back in memory, gfns
->
-> > > > > > > > > probably
->
-> > > > > > > > > mapped to different mfns, thus if guest asks device to DMA
->
-> > > > > > > > > to
->
-> > > > > > > > > these
->
-> > > > > > > > > GPA, qemu may DMA to stale MFNs.
->
-> > > > > > > > >
->
-> > > > > > > > > So I think in xen_invalidate_map_cache() linked lists should
->
-> > > > > > > > > also be
->
-> > > > > > > > > checked and invalidated.
->
-> > > > > > > > >
->
-> > > > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct?
->
-> > > > > > Yes, you are right. We need to go through the list for each
->
-> > > > > > element of
->
-> > > > > > the array in xen_invalidate_map_cache. Can you come up with a
->
-> > > > > > patch?
->
-> > > > > I spoke too soon. In the regular case there should be no locked
->
-> > > > > mappings
->
-> > > > > when xen_invalidate_map_cache is called (see the DPRINTF warning at
->
-> > > > > the
->
-> > > > > beginning of the functions). Without locked mappings, there should
->
-> > > > > never
->
-> > > > > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > > > > entry->lock == true is a necessary condition to append a new entry
->
-> > > > > to
->
-> > > > > the list, otherwise it is just remapped).
->
-> > > > >
->
-> > > > > Can you confirm that what you are seeing are locked mappings
->
-> > > > > when xen_invalidate_map_cache is called? To find out, enable the
->
-> > > > > DPRINTK
->
-> > > > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
-> > > > In fact, I think the DPRINTF above is incorrect too. In
->
-> > > > pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> > > > pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> > > > memory_region_init_ram). So actually I think we should remove the
->
-> > > > DPRINTF warning as it is normal.
->
-> > > Let me explain why the DPRINTF warning is there: emulated dma operations
->
-> > > can involve locked mappings. Once a dma operation completes, the related
->
-> > > mapping is unlocked and can be safely destroyed. But if we destroy a
->
-> > > locked mapping in xen_invalidate_map_cache, while a dma is still
->
-> > > ongoing, QEMU will crash. We cannot handle that case.
->
-> > >
->
-> > > However, the scenario you described is different. It has nothing to do
->
-> > > with DMA. It looks like pci_add_option_rom calls
->
-> > > memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-> > > locked mapping and it is never unlocked or destroyed.
->
-> > >
->
-> > > It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-> > > the append patch fix the problem you are seeing? For the proper fix, I
->
-> > > think we probably need some sort of memory_region_unmap wrapper or maybe
->
-> > > a call to address_space_unmap.
->
-> >
->
-> > Yes, I think so, maybe this is the proper way to fix this.
->
->
->
-> Would you be up for sending a proper patch and testing it? We cannot call
->
-> xen_invalidate_map_cache_entry directly from pci.c though, it would need
->
-> to be one of the other functions like address_space_unmap for example.
->
->
->
->
->
-Yes, I will look into this.
-Any updates?
-
-
->
-> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
->
-> > > index e6b08e1..04f98b7 100644
->
-> > > --- a/hw/pci/pci.c
->
-> > > +++ b/hw/pci/pci.c
->
-> > > @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev,
->
-> > > bool
->
-> > > is_default_rom,
->
-> > >        }
->
-> > >          pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
->
-> > > +    xen_invalidate_map_cache_entry(ptr);
->
-> > >    }
->
-> > >      static void pci_del_option_rom(PCIDevice *pdev)
->
-
diff --git a/classification_output/04/other/55367348 b/classification_output/04/other/55367348
deleted file mode 100644
index 39f042abd..000000000
--- a/classification_output/04/other/55367348
+++ /dev/null
@@ -1,540 +0,0 @@
-other: 0.626
-mistranslation: 0.615
-device: 0.586
-instruction: 0.572
-semantic: 0.555
-graphic: 0.532
-assembly: 0.531
-network: 0.518
-socket: 0.501
-boot: 0.486
-KVM: 0.470
-vnc: 0.465
-
-[Qemu-devel] [Bug] Docs build fails at interop.rst
-
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
-running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
-(Rawhide)
-
-uname - a
-Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
-UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
-
-Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
-allows for the build to occur
-
-Regards
-Aarushi Mehta
-
-On 5/20/19 7:30 AM, Aarushi Mehta wrote:
->
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
->
-running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
->
-(Rawhide)
->
->
-uname - a
->
-Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
->
-UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
->
->
-Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
->
-allows for the build to occur
->
->
-Regards
->
-Aarushi Mehta
->
->
-Ah, dang. The blocks aren't strictly conforming json, but the version I
-tested this under didn't seem to care. Your version is much newer. (I
-was using 1.7 as provided by Fedora 29.)
-
-For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
-which should at least turn off the "warnings as errors" option, but I
-don't think that reverting -n will turn off this warning.
-
-I'll try to get ahold of this newer version and see if I can't fix it
-more appropriately.
-
---js
-
-On 5/20/19 12:37 PM, John Snow wrote:
->
->
->
-On 5/20/19 7:30 AM, Aarushi Mehta wrote:
->
->
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
->
-> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
->
-> (Rawhide)
->
->
->
-> uname - a
->
-> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
->
-> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
->
->
->
-> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
->
-> allows for the build to occur
->
->
->
-> Regards
->
-> Aarushi Mehta
->
->
->
->
->
->
-Ah, dang. The blocks aren't strictly conforming json, but the version I
->
-tested this under didn't seem to care. Your version is much newer. (I
->
-was using 1.7 as provided by Fedora 29.)
->
->
-For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
->
-which should at least turn off the "warnings as errors" option, but I
->
-don't think that reverting -n will turn off this warning.
->
->
-I'll try to get ahold of this newer version and see if I can't fix it
->
-more appropriately.
->
->
---js
->
-...Sigh, okay.
-
-So, I am still not actually sure what changed from pygments 2.2 and
-sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx
-by default always tries to do add a filter to the pygments lexer that
-raises an error on highlighting failure, instead of the default behavior
-which is to just highlight those errors in the output. There is no
-option to Sphinx that I am aware of to retain this lexing behavior.
-(Effectively, it's strict or nothing.)
-
-This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the
-build works with our malformed json.
-
-There are a few options:
-
-1. Update conf.py to ignore these warnings (and all future lexing
-errors), and settle for the fact that there will be no QMP highlighting
-wherever we use the directionality indicators ('->', '<-').
-
-2. Update bitmaps.rst to remove the directionality indicators.
-
-3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON.
-
-4. Update bitmaps.rst to remove the "json" specification from the code
-block. This will cause sphinx to "guess" the formatting, and the
-pygments guesser will decide it's Python3.
-
-This will parse well enough, but will mis-highlight 'true' and 'false'
-which are not python keywords. This approach may break in the future if
-the Python3 lexer is upgraded to be stricter (because '->' and '<-' are
-still invalid), and leaves us at the mercy of both the guesser and the
-lexer.
-
-I'm not actually sure what I dislike the least; I think I dislike #1 the
-most. #4 gets us most of what we want but is perhaps porcelain.
-
-I suspect if we attempt to move more of our documentation to ReST and
-Sphinx that we will need to answer for ourselves how we intend to
-document QMP code flow examples.
-
---js
-
-On Mon, May 20, 2019 at 05:25:28PM -0400, John Snow wrote:
->
->
->
-On 5/20/19 12:37 PM, John Snow wrote:
->
->
->
->
->
-> On 5/20/19 7:30 AM, Aarushi Mehta wrote:
->
->>
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
->
->> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
->
->> (Rawhide)
->
->>
->
->> uname - a
->
->> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
->
->> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
->
->>
->
->> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
->
->> allows for the build to occur
->
->>
->
->> Regards
->
->> Aarushi Mehta
->
->>
->
->>
->
->
->
-> Ah, dang. The blocks aren't strictly conforming json, but the version I
->
-> tested this under didn't seem to care. Your version is much newer. (I
->
-> was using 1.7 as provided by Fedora 29.)
->
->
->
-> For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
->
-> which should at least turn off the "warnings as errors" option, but I
->
-> don't think that reverting -n will turn off this warning.
->
->
->
-> I'll try to get ahold of this newer version and see if I can't fix it
->
-> more appropriately.
->
->
->
-> --js
->
->
->
->
-...Sigh, okay.
->
->
-So, I am still not actually sure what changed from pygments 2.2 and
->
-sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx
->
-by default always tries to do add a filter to the pygments lexer that
->
-raises an error on highlighting failure, instead of the default behavior
->
-which is to just highlight those errors in the output. There is no
->
-option to Sphinx that I am aware of to retain this lexing behavior.
->
-(Effectively, it's strict or nothing.)
->
->
-This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the
->
-build works with our malformed json.
->
->
-There are a few options:
->
->
-1. Update conf.py to ignore these warnings (and all future lexing
->
-errors), and settle for the fact that there will be no QMP highlighting
->
-wherever we use the directionality indicators ('->', '<-').
->
->
-2. Update bitmaps.rst to remove the directionality indicators.
->
->
-3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON.
->
->
-4. Update bitmaps.rst to remove the "json" specification from the code
->
-block. This will cause sphinx to "guess" the formatting, and the
->
-pygments guesser will decide it's Python3.
->
->
-This will parse well enough, but will mis-highlight 'true' and 'false'
->
-which are not python keywords. This approach may break in the future if
->
-the Python3 lexer is upgraded to be stricter (because '->' and '<-' are
->
-still invalid), and leaves us at the mercy of both the guesser and the
->
-lexer.
->
->
-I'm not actually sure what I dislike the least; I think I dislike #1 the
->
-most. #4 gets us most of what we want but is perhaps porcelain.
->
->
-I suspect if we attempt to move more of our documentation to ReST and
->
-Sphinx that we will need to answer for ourselves how we intend to
->
-document QMP code flow examples.
-Writing a custom lexer that handles "<-" and "->" was simple (see below).
-
-Now, is it possible to convince Sphinx to register and use a custom lexer?
-
-$ cat > /tmp/lexer.py <<EOF
-from pygments.lexer import RegexLexer, DelegatingLexer
-from pygments.lexers.data import JsonLexer
-import re
-from pygments.token import *
-
-class QMPExampleMarkersLexer(RegexLexer):
-    tokens = {
-        'root': [
-            (r' *-> *', Generic.Prompt),
-            (r' *<- *', Generic.Output),
-        ]
-    }
-
-class QMPExampleLexer(DelegatingLexer):
-    def __init__(self, **options):
-        super(QMPExampleLexer, self).__init__(JsonLexer, 
-QMPExampleMarkersLexer, Error, **options)
-EOF
-$ pygmentize -l /tmp/lexer.py:QMPExampleLexer -x -f html <<EOF
-    -> {
-         "execute": "drive-backup",
-         "arguments": {
-           "device": "drive0",
-           "bitmap": "bitmap0",
-           "target": "drive0.inc0.qcow2",
-           "format": "qcow2",
-           "sync": "incremental",
-           "mode": "existing"
-         }
-       }
-
-    <- { "return": {} }
-EOF
-<div class="highlight"><pre><span></span><span class="gp">    -&gt; 
-</span><span class="p">{</span>
-         <span class="nt">&quot;execute&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;drive-backup&quot;</span><span class="p">,</span>
-         <span class="nt">&quot;arguments&quot;</span><span class="p">:</span> 
-<span class="p">{</span>
-           <span class="nt">&quot;device&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;drive0&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;bitmap&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;bitmap0&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;target&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;drive0.inc0.qcow2&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;format&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;qcow2&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;sync&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;incremental&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;mode&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;existing&quot;</span>
-         <span class="p">}</span>
-       <span class="p">}</span>
-
-<span class="go">    &lt;- </span><span class="p">{</span> <span 
-class="nt">&quot;return&quot;</span><span class="p">:</span> <span 
-class="p">{}</span> <span class="p">}</span>
-</pre></div>
-$ 
-
-
--- 
-Eduardo
-
-On 5/20/19 7:04 PM, Eduardo Habkost wrote:
->
-On Mon, May 20, 2019 at 05:25:28PM -0400, John Snow wrote:
->
->
->
->
->
-> On 5/20/19 12:37 PM, John Snow wrote:
->
->>
->
->>
->
->> On 5/20/19 7:30 AM, Aarushi Mehta wrote:
->
->>>
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
->
->>> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
->
->>> (Rawhide)
->
->>>
->
->>> uname - a
->
->>> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
->
->>> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
->
->>>
->
->>> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
->
->>> allows for the build to occur
->
->>>
->
->>> Regards
->
->>> Aarushi Mehta
->
->>>
->
->>>
->
->>
->
->> Ah, dang. The blocks aren't strictly conforming json, but the version I
->
->> tested this under didn't seem to care. Your version is much newer. (I
->
->> was using 1.7 as provided by Fedora 29.)
->
->>
->
->> For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
->
->> which should at least turn off the "warnings as errors" option, but I
->
->> don't think that reverting -n will turn off this warning.
->
->>
->
->> I'll try to get ahold of this newer version and see if I can't fix it
->
->> more appropriately.
->
->>
->
->> --js
->
->>
->
->
->
-> ...Sigh, okay.
->
->
->
-> So, I am still not actually sure what changed from pygments 2.2 and
->
-> sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx
->
-> by default always tries to do add a filter to the pygments lexer that
->
-> raises an error on highlighting failure, instead of the default behavior
->
-> which is to just highlight those errors in the output. There is no
->
-> option to Sphinx that I am aware of to retain this lexing behavior.
->
-> (Effectively, it's strict or nothing.)
->
->
->
-> This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the
->
-> build works with our malformed json.
->
->
->
-> There are a few options:
->
->
->
-> 1. Update conf.py to ignore these warnings (and all future lexing
->
-> errors), and settle for the fact that there will be no QMP highlighting
->
-> wherever we use the directionality indicators ('->', '<-').
->
->
->
-> 2. Update bitmaps.rst to remove the directionality indicators.
->
->
->
-> 3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON.
->
->
->
-> 4. Update bitmaps.rst to remove the "json" specification from the code
->
-> block. This will cause sphinx to "guess" the formatting, and the
->
-> pygments guesser will decide it's Python3.
->
->
->
-> This will parse well enough, but will mis-highlight 'true' and 'false'
->
-> which are not python keywords. This approach may break in the future if
->
-> the Python3 lexer is upgraded to be stricter (because '->' and '<-' are
->
-> still invalid), and leaves us at the mercy of both the guesser and the
->
-> lexer.
->
->
->
-> I'm not actually sure what I dislike the least; I think I dislike #1 the
->
-> most. #4 gets us most of what we want but is perhaps porcelain.
->
->
->
-> I suspect if we attempt to move more of our documentation to ReST and
->
-> Sphinx that we will need to answer for ourselves how we intend to
->
-> document QMP code flow examples.
->
->
-Writing a custom lexer that handles "<-" and "->" was simple (see below).
->
->
-Now, is it possible to convince Sphinx to register and use a custom lexer?
->
-Spoilers, yes, and I've sent a patch to list. Thanks for your help!
-
diff --git a/classification_output/04/other/55753058 b/classification_output/04/other/55753058
deleted file mode 100644
index ff699cdc1..000000000
--- a/classification_output/04/other/55753058
+++ /dev/null
@@ -1,301 +0,0 @@
-other: 0.734
-KVM: 0.713
-vnc: 0.682
-mistranslation: 0.649
-graphic: 0.630
-device: 0.623
-instruction: 0.581
-semantic: 0.577
-assembly: 0.529
-network: 0.525
-boot: 0.478
-socket: 0.462
-
-[RESEND][BUG FIX HELP] QEMU main thread endlessly hangs in __ppoll()
-
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may still
-exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-
-The qemu main thread endlessly hangs in the handle of the qmp statement:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-and we have the call trace looks like:
-#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
-timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
-sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
-#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
-__nfds=<optimized out>, __fds=<optimized out>)
-at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
-#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
-timeout=<optimized out>) at util/qemu-timer.c:348
-#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
-blocking=blocking@entry=true) at util/aio-posix.c:669
-#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
-ignore_bds_parents=false, parent=0x0, recursive=false,
-bs=0x55561138b0a0) at block/io.c:430
-#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
-parent=0x0, ignore_bds_parents=<optimized out>,
-poll=<optimized out>) at block/io.c:396
-#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
-child=0x7f36dc0ce380, errp=<optimized out>)
-at block/quorum.c:1063
-#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
-"colo-disk0", has_child=<optimized out>,
-child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
-errp=0x7ffc56c66f98) at blockdev.c:4494
-#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
-out>, ret=<optimized out>, errp=0x7ffc56c67018)
-at qapi/qapi-commands-block-core.c:1538
-#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
-allow_oob=<optimized out>, request=<optimized out>,
-cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
-#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
-out>, allow_oob=<optimized out>)
-at qapi/qmp-dispatch.c:175
-#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
-req=<optimized out>) at monitor/qmp.c:145
-#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized out>)
-at monitor/qmp.c:234
-#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
-util/async.c:117
-#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
-#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
-util/aio-posix.c:459
-#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
-callback=<optimized out>, user_data=<optimized out>)
-at util/async.c:260
-#17 0x00007f3c22302fbd in g_main_context_dispatch () from
-/lib/x86_64-linux-gnu/libglib-2.0.so.0
-#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
-#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
-#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
-#21 0x000055560ff600fe in main_loop () at vl.c:1814
-#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized out>,
-envp=<optimized out>) at vl.c:4503
-We found that we're doing endless check in the line of
-block/io.c:bdrv_do_drained_begin():
-BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
-and it turns out that the bdrv_drain_poll() always get true from:
-- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
-- AND atomic_read(&bs->in_flight)
-
-I personally think this is a deadlock issue in the a QEMU block layer
-(as we know, we have some #FIXME comments in related codes, such as block
-permisson update).
-Any comments are welcome and appreciated.
-
----
-thx,likexu
-
-On 2/28/21 9:39 PM, Like Xu wrote:
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
-still exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-Do you have a test case that reproduces on 5.2? It'd be nice to know if
-it was still a problem in the latest source tree or not.
---js
-The qemu main thread endlessly hangs in the handle of the qmp statement:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-and we have the call trace looks like:
-#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
-timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
-sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
-#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
-__nfds=<optimized out>, __fds=<optimized out>)
-at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
-#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
-timeout=<optimized out>) at util/qemu-timer.c:348
-#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
-blocking=blocking@entry=true) at util/aio-posix.c:669
-#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
-ignore_bds_parents=false, parent=0x0, recursive=false,
-bs=0x55561138b0a0) at block/io.c:430
-#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
-parent=0x0, ignore_bds_parents=<optimized out>,
-poll=<optimized out>) at block/io.c:396
-#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
-child=0x7f36dc0ce380, errp=<optimized out>)
-at block/quorum.c:1063
-#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
-"colo-disk0", has_child=<optimized out>,
-child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
-errp=0x7ffc56c66f98) at blockdev.c:4494
-#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
-out>, ret=<optimized out>, errp=0x7ffc56c67018)
-at qapi/qapi-commands-block-core.c:1538
-#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
-allow_oob=<optimized out>, request=<optimized out>,
-cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
-#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
-out>, allow_oob=<optimized out>)
-at qapi/qmp-dispatch.c:175
-#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
-req=<optimized out>) at monitor/qmp.c:145
-#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
-out>) at monitor/qmp.c:234
-#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
-util/async.c:117
-#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
-#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
-util/aio-posix.c:459
-#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
-callback=<optimized out>, user_data=<optimized out>)
-at util/async.c:260
-#17 0x00007f3c22302fbd in g_main_context_dispatch () from
-/lib/x86_64-linux-gnu/libglib-2.0.so.0
-#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
-#19 os_host_main_loop_wait (timeout=<optimized out>) at
-util/main-loop.c:242
-#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
-#21 0x000055560ff600fe in main_loop () at vl.c:1814
-#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
-out>, envp=<optimized out>) at vl.c:4503
-We found that we're doing endless check in the line of
-block/io.c:bdrv_do_drained_begin():
-Â Â Â Â BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
-and it turns out that the bdrv_drain_poll() always get true from:
-- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
-- AND atomic_read(&bs->in_flight)
-
-I personally think this is a deadlock issue in the a QEMU block layer
-(as we know, we have some #FIXME comments in related codes, such as
-block permisson update).
-Any comments are welcome and appreciated.
-
----
-thx,likexu
-
-Hi John,
-
-Thanks for your comment.
-
-On 2021/3/5 7:53, John Snow wrote:
-On 2/28/21 9:39 PM, Like Xu wrote:
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
-still exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-Do you have a test case that reproduces on 5.2? It'd be nice to know if it
-was still a problem in the latest source tree or not.
-We narrowed down the source of the bug, which basically came from
-the following qmp usage:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-One of the test cases is the COLO usage (docs/colo-proxy.txt).
-
-This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
-
-I believe it's reproducible on 5.2 and the latest tree.
---js
-The qemu main thread endlessly hangs in the handle of the qmp statement:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-and we have the call trace looks like:
-#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
-timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
-sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
-#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
-__nfds=<optimized out>, __fds=<optimized out>)
-at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
-#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
-timeout=<optimized out>) at util/qemu-timer.c:348
-#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
-blocking=blocking@entry=true) at util/aio-posix.c:669
-#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
-ignore_bds_parents=false, parent=0x0, recursive=false,
-bs=0x55561138b0a0) at block/io.c:430
-#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
-parent=0x0, ignore_bds_parents=<optimized out>,
-poll=<optimized out>) at block/io.c:396
-#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
-child=0x7f36dc0ce380, errp=<optimized out>)
-at block/quorum.c:1063
-#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
-"colo-disk0", has_child=<optimized out>,
-child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
-errp=0x7ffc56c66f98) at blockdev.c:4494
-#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
-out>, ret=<optimized out>, errp=0x7ffc56c67018)
-at qapi/qapi-commands-block-core.c:1538
-#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
-allow_oob=<optimized out>, request=<optimized out>,
-cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
-#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
-out>, allow_oob=<optimized out>)
-at qapi/qmp-dispatch.c:175
-#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
-req=<optimized out>) at monitor/qmp.c:145
-#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
-out>) at monitor/qmp.c:234
-#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
-util/async.c:117
-#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
-#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
-util/aio-posix.c:459
-#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
-callback=<optimized out>, user_data=<optimized out>)
-at util/async.c:260
-#17 0x00007f3c22302fbd in g_main_context_dispatch () from
-/lib/x86_64-linux-gnu/libglib-2.0.so.0
-#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
-#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
-#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
-#21 0x000055560ff600fe in main_loop () at vl.c:1814
-#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
-out>, envp=<optimized out>) at vl.c:4503
-We found that we're doing endless check in the line of
-block/io.c:bdrv_do_drained_begin():
-Â Â Â Â Â BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
-and it turns out that the bdrv_drain_poll() always get true from:
-- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
-- AND atomic_read(&bs->in_flight)
-
-I personally think this is a deadlock issue in the a QEMU block layer
-(as we know, we have some #FIXME comments in related codes, such as block
-permisson update).
-Any comments are welcome and appreciated.
-
----
-thx,likexu
-
-On 3/4/21 10:08 PM, Like Xu wrote:
-Hi John,
-
-Thanks for your comment.
-
-On 2021/3/5 7:53, John Snow wrote:
-On 2/28/21 9:39 PM, Like Xu wrote:
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
-still exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-Do you have a test case that reproduces on 5.2? It'd be nice to know
-if it was still a problem in the latest source tree or not.
-We narrowed down the source of the bug, which basically came from
-the following qmp usage:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-One of the test cases is the COLO usage (docs/colo-proxy.txt).
-
-This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
-
-I believe it's reproducible on 5.2 and the latest tree.
-Can you please test and confirm that this is the case, and then file a
-bug report on the LP:
-https://launchpad.net/qemu
-and include:
-- The exact commit you used (current origin/master debug build would be
-the most ideal.)
-- Which QEMU binary you are using (qemu-system-x86_64?)
-- The shortest command line you are aware of that reproduces the problem
-- The host OS and kernel version
-- An updated call trace
-- Any relevant commands issued prior to the one that caused the hang; or
-detailed reproduction steps if possible.
-Thanks,
---js
-
diff --git a/classification_output/04/other/56309929 b/classification_output/04/other/56309929
deleted file mode 100644
index 2efc040b0..000000000
--- a/classification_output/04/other/56309929
+++ /dev/null
@@ -1,188 +0,0 @@
-other: 0.690
-device: 0.646
-KVM: 0.636
-assembly: 0.618
-vnc: 0.600
-network: 0.589
-instruction: 0.581
-boot: 0.578
-graphic: 0.570
-mistranslation: 0.554
-semantic: 0.521
-socket: 0.516
-
-[Qemu-devel] [BUG 2.6] Broken CONFIG_TPM?
-
-A compilation test with clang -Weverything reported this problem:
-
-config-host.h:112:20: warning: '$' in identifier
-[-Wdollar-in-identifier-extension]
-
-The line of code looks like this:
-
-#define CONFIG_TPM $(CONFIG_SOFTMMU)
-
-This is fine for Makefile code, but won't work as expected in C code.
-
-Am 28.04.2016 um 22:33 schrieb Stefan Weil:
->
-A compilation test with clang -Weverything reported this problem:
->
->
-config-host.h:112:20: warning: '$' in identifier
->
-[-Wdollar-in-identifier-extension]
->
->
-The line of code looks like this:
->
->
-#define CONFIG_TPM $(CONFIG_SOFTMMU)
->
->
-This is fine for Makefile code, but won't work as expected in C code.
->
-A complete 64 bit build with clang -Weverything creates a log file of
-1.7 GB.
-Here are the uniq warnings sorted by their frequency:
-
-      1 -Wflexible-array-extensions
-      1 -Wgnu-folding-constant
-      1 -Wunknown-pragmas
-      1 -Wunknown-warning-option
-      1 -Wunreachable-code-loop-increment
-      2 -Warray-bounds-pointer-arithmetic
-      2 -Wdollar-in-identifier-extension
-      3 -Woverlength-strings
-      3 -Wweak-vtables
-      4 -Wgnu-empty-struct
-      4 -Wstring-conversion
-      6 -Wclass-varargs
-      7 -Wc99-extensions
-      7 -Wc++-compat
-      8 -Wfloat-equal
-     11 -Wformat-nonliteral
-     16 -Wshift-negative-value
-     19 -Wglobal-constructors
-     28 -Wc++11-long-long
-     29 -Wembedded-directive
-     38 -Wvla
-     40 -Wcovered-switch-default
-     40 -Wmissing-variable-declarations
-     49 -Wold-style-cast
-     53 -Wgnu-conditional-omitted-operand
-     56 -Wformat-pedantic
-     61 -Wvariadic-macros
-     77 -Wc++11-extensions
-     83 -Wgnu-flexible-array-initializer
-     83 -Wzero-length-array
-     96 -Wgnu-designator
-    102 -Wmissing-noreturn
-    103 -Wconditional-uninitialized
-    107 -Wdisabled-macro-expansion
-    115 -Wunreachable-code-return
-    134 -Wunreachable-code
-    243 -Wunreachable-code-break
-    257 -Wfloat-conversion
-    280 -Wswitch-enum
-    291 -Wpointer-arith
-    298 -Wshadow
-    378 -Wassign-enum
-    395 -Wused-but-marked-unused
-    420 -Wreserved-id-macro
-    493 -Wdocumentation
-    510 -Wshift-sign-overflow
-    565 -Wgnu-case-range
-    566 -Wgnu-zero-variadic-macro-arguments
-    650 -Wbad-function-cast
-    705 -Wmissing-field-initializers
-    817 -Wgnu-statement-expression
-    968 -Wdocumentation-unknown-command
-   1021 -Wextra-semi
-   1112 -Wgnu-empty-initializer
-   1138 -Wcast-qual
-   1509 -Wcast-align
-   1766 -Wextended-offsetof
-   1937 -Wsign-compare
-   2130 -Wpacked
-   2404 -Wunused-macros
-   3081 -Wpadded
-   4182 -Wconversion
-   5430 -Wlanguage-extension-token
-   6655 -Wshorten-64-to-32
-   6995 -Wpedantic
-   7354 -Wunused-parameter
-  27659 -Wsign-conversion
-
-Stefan Weil <address@hidden> writes:
-
->
-A compilation test with clang -Weverything reported this problem:
->
->
-config-host.h:112:20: warning: '$' in identifier
->
-[-Wdollar-in-identifier-extension]
->
->
-The line of code looks like this:
->
->
-#define CONFIG_TPM $(CONFIG_SOFTMMU)
->
->
-This is fine for Makefile code, but won't work as expected in C code.
-Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
-
-Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
-of CONFIG_TPM in C code.
-
-I had a quick peek at configure and create_config, but refrained from
-attempting to fix this, since I don't understand when exactly CONFIG_TPM
-should be defined.
-
-On 29 April 2016 at 08:42, Markus Armbruster <address@hidden> wrote:
->
-Stefan Weil <address@hidden> writes:
->
->
-> A compilation test with clang -Weverything reported this problem:
->
->
->
-> config-host.h:112:20: warning: '$' in identifier
->
-> [-Wdollar-in-identifier-extension]
->
->
->
-> The line of code looks like this:
->
->
->
-> #define CONFIG_TPM $(CONFIG_SOFTMMU)
->
->
->
-> This is fine for Makefile code, but won't work as expected in C code.
->
->
-Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
->
->
-Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
->
-of CONFIG_TPM in C code.
->
->
-I had a quick peek at configure and create_config, but refrained from
->
-attempting to fix this, since I don't understand when exactly CONFIG_TPM
->
-should be defined.
-Looking at 'git blame' suggests this has been wrong like this for
-some years, so we don't need to scramble to fix it for 2.6.
-
-thanks
--- PMM
-
diff --git a/classification_output/04/other/56937788 b/classification_output/04/other/56937788
deleted file mode 100644
index e4043d1fc..000000000
--- a/classification_output/04/other/56937788
+++ /dev/null
@@ -1,352 +0,0 @@
-other: 0.791
-KVM: 0.755
-vnc: 0.743
-mistranslation: 0.735
-graphic: 0.720
-semantic: 0.705
-device: 0.697
-instruction: 0.653
-assembly: 0.638
-boot: 0.636
-network: 0.633
-socket: 0.613
-
-[Qemu-devel] [Bug] virtio-blk: qemu will crash if hotplug virtio-blk device failed
-
-I found that hotplug virtio-blk device will lead to qemu crash.
-
-Re-production steps:
-
-1.       Run VM named vm001
-
-2.       Create a virtio-blk.xml which contains wrong configurations:
-<disk device="lun" rawio="yes" type="block">
-  <driver cache="none" io="native" name="qemu" type="raw" />
-  <source dev="/dev/mapper/11-dm" />
-  <target bus="virtio" dev="vdx" />
-</disk>
-
-3.       Run command : virsh attach-device vm001 vm001
-
-Libvirt will return err msg:
-
-error: Failed to attach device from blk-scsi.xml
-
-error: internal error: unable to execute QEMU command 'device_add': Please set 
-scsi=off for virtio-blk devices in order to use virtio 1.0
-
-it means hotplug virtio-blk device failed.
-
-4.       Suspend or shutdown VM will leads to qemu crash
-
-
-
-from gdb:
-
-
-(gdb) bt
-#0  object_get_class (address@hidden) at qom/object.c:750
-#1  0x00007f9a72582e01 in virtio_vmstate_change (opaque=0x7f9a73d10960, 
-running=0, state=<optimized out>) at 
-/mnt/sdb/lzc/code/open/qemu/hw/virtio/virtio.c:2203
-#2  0x00007f9a7261ef52 in vm_state_notify (address@hidden, address@hidden) at 
-vl.c:1685
-#3  0x00007f9a7252603a in do_vm_stop (state=RUN_STATE_PAUSED) at 
-/mnt/sdb/lzc/code/open/qemu/cpus.c:941
-#4  vm_stop (address@hidden) at /mnt/sdb/lzc/code/open/qemu/cpus.c:1807
-#5  0x00007f9a7262eb1b in qmp_stop (address@hidden) at qmp.c:102
-#6  0x00007f9a7262c70a in qmp_marshal_stop (args=<optimized out>, 
-ret=<optimized out>, errp=0x7ffe63e255d8) at qmp-marshal.c:5854
-#7  0x00007f9a72897e79 in do_qmp_dispatch (errp=0x7ffe63e255d0, 
-request=0x7f9a76510120, cmds=0x7f9a72ee7980 <qmp_commands>) at 
-qapi/qmp-dispatch.c:104
-#8  qmp_dispatch (cmds=0x7f9a72ee7980 <qmp_commands>, address@hidden) at 
-qapi/qmp-dispatch.c:131
-#9  0x00007f9a725288d5 in handle_qmp_command (parser=<optimized out>, 
-tokens=<optimized out>) at /mnt/sdb/lzc/code/open/qemu/monitor.c:3852
-#10 0x00007f9a7289d514 in json_message_process_token (lexer=0x7f9a73ce4498, 
-input=0x7f9a73cc6880, type=JSON_RCURLY, x=36, y=17) at 
-qobject/json-streamer.c:105
-#11 0x00007f9a728bb69b in json_lexer_feed_char (address@hidden, ch=125 '}', 
-address@hidden) at qobject/json-lexer.c:323
-#12 0x00007f9a728bb75e in json_lexer_feed (lexer=0x7f9a73ce4498, 
-buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:373
-#13 0x00007f9a7289d5d9 in json_message_parser_feed (parser=<optimized out>, 
-buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124
-#14 0x00007f9a7252722e in monitor_qmp_read (opaque=<optimized out>, 
-buf=<optimized out>, size=<optimized out>) at 
-/mnt/sdb/lzc/code/open/qemu/monitor.c:3894
-#15 0x00007f9a7284ee1b in tcp_chr_read (chan=<optimized out>, cond=<optimized 
-out>, opaque=<optimized out>) at chardev/char-socket.c:441
-#16 0x00007f9a6e03e99a in g_main_context_dispatch () from 
-/usr/lib64/libglib-2.0.so.0
-#17 0x00007f9a728a342c in glib_pollfds_poll () at util/main-loop.c:214
-#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
-#19 main_loop_wait (address@hidden) at util/main-loop.c:515
-#20 0x00007f9a724e7547 in main_loop () at vl.c:1999
-#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at 
-vl.c:4877
-
-Problem happens in virtio_vmstate_change which is called by vm_state_notify,
-static void virtio_vmstate_change(void *opaque, int running, RunState state)
-{
-    VirtIODevice *vdev = opaque;
-    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-    bool backend_run = running && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK);
-    vdev->vm_running = running;
-
-    if (backend_run) {
-        virtio_set_status(vdev, vdev->status);
-    }
-
-    if (k->vmstate_change) {
-        k->vmstate_change(qbus->parent, backend_run);
-    }
-
-    if (!backend_run) {
-        virtio_set_status(vdev, vdev->status);
-    }
-}
-
-Vdev's parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash.
-virtio_vmstate_change is added to the list vm_change_state_head at 
-virtio_blk_device_realize(virtio_init),
-but after hotplug virtio-blk failed, virtio_vmstate_change will not be removed 
-from vm_change_state_head.
-
-
-I apply a patch as follews:
-
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
-index 5884ce3..ea532dc 100644
---- a/hw/virtio/virtio.c
-+++ b/hw/virtio/virtio.c
-@@ -2491,6 +2491,7 @@ static void virtio_device_realize(DeviceState *dev, Error 
-**errp)
-     virtio_bus_device_plugged(vdev, &err);
-     if (err != NULL) {
-         error_propagate(errp, err);
-+        vdc->unrealize(dev, NULL);
-         return;
-     }
-
-On Tue, Oct 31, 2017 at 05:19:08AM +0000, linzhecheng wrote:
->
-I found that hotplug virtio-blk device will lead to qemu crash.
-The author posted a patch in a separate email thread.  Please see
-"[PATCH] fix: unrealize virtio device if we fail to hotplug it".
-
->
-Re-production steps:
->
->
-1.       Run VM named vm001
->
->
-2.       Create a virtio-blk.xml which contains wrong configurations:
->
-<disk device="lun" rawio="yes" type="block">
->
-<driver cache="none" io="native" name="qemu" type="raw" />
->
-<source dev="/dev/mapper/11-dm" />
->
-<target bus="virtio" dev="vdx" />
->
-</disk>
->
->
-3.       Run command : virsh attach-device vm001 vm001
->
->
-Libvirt will return err msg:
->
->
-error: Failed to attach device from blk-scsi.xml
->
->
-error: internal error: unable to execute QEMU command 'device_add': Please
->
-set scsi=off for virtio-blk devices in order to use virtio 1.0
->
->
-it means hotplug virtio-blk device failed.
->
->
-4.       Suspend or shutdown VM will leads to qemu crash
->
->
->
->
-from gdb:
->
->
->
-(gdb) bt
->
-#0  object_get_class (address@hidden) at qom/object.c:750
->
-#1  0x00007f9a72582e01 in virtio_vmstate_change (opaque=0x7f9a73d10960,
->
-running=0, state=<optimized out>) at
->
-/mnt/sdb/lzc/code/open/qemu/hw/virtio/virtio.c:2203
->
-#2  0x00007f9a7261ef52 in vm_state_notify (address@hidden, address@hidden) at
->
-vl.c:1685
->
-#3  0x00007f9a7252603a in do_vm_stop (state=RUN_STATE_PAUSED) at
->
-/mnt/sdb/lzc/code/open/qemu/cpus.c:941
->
-#4  vm_stop (address@hidden) at /mnt/sdb/lzc/code/open/qemu/cpus.c:1807
->
-#5  0x00007f9a7262eb1b in qmp_stop (address@hidden) at qmp.c:102
->
-#6  0x00007f9a7262c70a in qmp_marshal_stop (args=<optimized out>,
->
-ret=<optimized out>, errp=0x7ffe63e255d8) at qmp-marshal.c:5854
->
-#7  0x00007f9a72897e79 in do_qmp_dispatch (errp=0x7ffe63e255d0,
->
-request=0x7f9a76510120, cmds=0x7f9a72ee7980 <qmp_commands>) at
->
-qapi/qmp-dispatch.c:104
->
-#8  qmp_dispatch (cmds=0x7f9a72ee7980 <qmp_commands>, address@hidden) at
->
-qapi/qmp-dispatch.c:131
->
-#9  0x00007f9a725288d5 in handle_qmp_command (parser=<optimized out>,
->
-tokens=<optimized out>) at /mnt/sdb/lzc/code/open/qemu/monitor.c:3852
->
-#10 0x00007f9a7289d514 in json_message_process_token (lexer=0x7f9a73ce4498,
->
-input=0x7f9a73cc6880, type=JSON_RCURLY, x=36, y=17) at
->
-qobject/json-streamer.c:105
->
-#11 0x00007f9a728bb69b in json_lexer_feed_char (address@hidden, ch=125 '}',
->
-address@hidden) at qobject/json-lexer.c:323
->
-#12 0x00007f9a728bb75e in json_lexer_feed (lexer=0x7f9a73ce4498,
->
-buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:373
->
-#13 0x00007f9a7289d5d9 in json_message_parser_feed (parser=<optimized out>,
->
-buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124
->
-#14 0x00007f9a7252722e in monitor_qmp_read (opaque=<optimized out>,
->
-buf=<optimized out>, size=<optimized out>) at
->
-/mnt/sdb/lzc/code/open/qemu/monitor.c:3894
->
-#15 0x00007f9a7284ee1b in tcp_chr_read (chan=<optimized out>, cond=<optimized
->
-out>, opaque=<optimized out>) at chardev/char-socket.c:441
->
-#16 0x00007f9a6e03e99a in g_main_context_dispatch () from
->
-/usr/lib64/libglib-2.0.so.0
->
-#17 0x00007f9a728a342c in glib_pollfds_poll () at util/main-loop.c:214
->
-#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
->
-#19 main_loop_wait (address@hidden) at util/main-loop.c:515
->
-#20 0x00007f9a724e7547 in main_loop () at vl.c:1999
->
-#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
->
-at vl.c:4877
->
->
-Problem happens in virtio_vmstate_change which is called by vm_state_notify,
->
-static void virtio_vmstate_change(void *opaque, int running, RunState state)
->
-{
->
-VirtIODevice *vdev = opaque;
->
-BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
->
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
->
-bool backend_run = running && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK);
->
-vdev->vm_running = running;
->
->
-if (backend_run) {
->
-virtio_set_status(vdev, vdev->status);
->
-}
->
->
-if (k->vmstate_change) {
->
-k->vmstate_change(qbus->parent, backend_run);
->
-}
->
->
-if (!backend_run) {
->
-virtio_set_status(vdev, vdev->status);
->
-}
->
-}
->
->
-Vdev's parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash.
->
-virtio_vmstate_change is added to the list vm_change_state_head at
->
-virtio_blk_device_realize(virtio_init),
->
-but after hotplug virtio-blk failed, virtio_vmstate_change will not be
->
-removed from vm_change_state_head.
->
->
->
-I apply a patch as follews:
->
->
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
->
-index 5884ce3..ea532dc 100644
->
---- a/hw/virtio/virtio.c
->
-+++ b/hw/virtio/virtio.c
->
-@@ -2491,6 +2491,7 @@ static void virtio_device_realize(DeviceState *dev,
->
-Error **errp)
->
-virtio_bus_device_plugged(vdev, &err);
->
-if (err != NULL) {
->
-error_propagate(errp, err);
->
-+        vdc->unrealize(dev, NULL);
->
-return;
->
-}
-signature.asc
-Description:
-PGP signature
-
diff --git a/classification_output/04/other/57756589 b/classification_output/04/other/57756589
deleted file mode 100644
index 7a168faac..000000000
--- a/classification_output/04/other/57756589
+++ /dev/null
@@ -1,1429 +0,0 @@
-other: 0.899
-mistranslation: 0.861
-instruction: 0.854
-device: 0.853
-vnc: 0.851
-assembly: 0.841
-semantic: 0.835
-boot: 0.827
-graphic: 0.824
-network: 0.822
-socket: 0.820
-KVM: 0.817
-
-[Qemu-devel] 答复: Re:   答复: Re:  答复: Re: [BUG]COLO failover hang
-
-amost like wikiï¼but panic in Primary Node.
-
-
-
-
-setp:
-
-1 
-
-Primary Node.
-
-x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio 
--vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb 
--usbdevice tablet\
-
-  -drive 
-if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,
-
-   
-children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=qcow2
- -S \
-
-  -netdev 
-tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-
-  -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \
-
-  -netdev 
-tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-
-  -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66 \
-
-  -chardev socket,id=mirror0,host=9.61.1.8,port=9003,server,nowait -chardev 
-socket,id=compare1,host=9.61.1.8,port=9004,server,nowait \
-
-  -chardev socket,id=compare0,host=9.61.1.8,port=9001,server,nowait -chardev 
-socket,id=compare0-0,host=9.61.1.8,port=9001 \
-
-  -chardev socket,id=compare_out,host=9.61.1.8,port=9005,server,nowait \
-
-  -chardev socket,id=compare_out0,host=9.61.1.8,port=9005 \
-
-  -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
-
-  -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out 
--object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
-
-  -object 
-colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0
-
-2 Second node:
-
-x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 
--name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb 
--usbdevice tablet\
-
-  -drive 
-if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=qcow2,node-name=node0
- \
-
-  -drive 
-if=virtio,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,top-id=active-disk0,file.file.filename=/mnt/ramfstest/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfstest/hidden_disk.img,file.backing.backing=colo-disk0
-  \
-
-   -netdev 
-tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-
-  -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \
-
-  -netdev 
-tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-
-  -device e1000,netdev=hn0,mac=52:a4:00:12:78:66 -chardev 
-socket,id=red0,host=9.61.1.8,port=9003 \
-
-  -chardev socket,id=red1,host=9.61.1.8,port=9004 \
-
-  -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
-
-  -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
-
-  -object filter-rewriter,id=rew0,netdev=hn0,queue=all -incoming tcp:0:8888
-
-3  Secondary node:
-
-{'execute':'qmp_capabilities'}
-
-{ 'execute': 'nbd-server-start',
-
-  'arguments': {'addr': {'type': 'inet', 'data': {'host': '9.61.1.7', 'port': 
-'8889'} } }
-
-}
-
-{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': 
-true } }
-
-4:Primary Nodeï¼
-
-{'execute':'qmp_capabilities'}
-
-
-{ 'execute': 'human-monitor-command',
-
-  'arguments': {'command-line': 'drive_add -n buddy 
-driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0'}}
-
-{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 
-'node0' } }
-
-{ 'execute': 'migrate-set-capabilities',
-
-      'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } 
-] } }
-
-{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:9.61.1.7:8888' } }
-
-
-
-
-then can see two runing VMs, whenever you make changes to PVM, SVM will be 
-synced.  
-
-
-
-
-5ï¼Primary Nodeï¼
-
-echo c ï¼ /proc/sysrq-trigger
-
-
-
-
-ï¼ï¼Secondary node:
-
-{ 'execute': 'nbd-server-stop' }
-
-{ "execute": "x-colo-lost-heartbeat" }
-
-
-
-
-then can see the Secondary node hang at recvmsg recvmsg .
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ21æ¥ 16:27
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
-
-
-
-
-
-Hi,
-
-On 2017/3/21 16:10, address@hidden wrote:
-ï¼ Thank youã
-ï¼
-ï¼ I have test areadyã
-ï¼
-ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-ï¼
-ï¼ Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node qemu 
-will not produce the problem,but Primary Node panic canã
-ï¼
-ï¼ I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-ï¼
-ï¼
-
-Yes, you are right, when we do failover for primary/secondary VM, we will 
-shutdown the related
-fd in case it is stuck in the read/write fd.
-
-It seems that you didn't follow the above introduction exactly to do the test. 
-Could you
-share your test procedures ? Especially the commands used in the test.
-
-Thanks,
-Hailiang
-
-ï¼ when failover,channel_shutdown could not shut down the channel.
-ï¼
-ï¼
-ï¼ so the colo_process_incoming_thread will hang at recvmsg.
-ï¼
-ï¼
-ï¼ I test a patch:
-ï¼
-ï¼
-ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼
-ï¼
-ï¼ index 13966f1..d65a0ea 100644
-ï¼
-ï¼
-ï¼ --- a/migration/socket.c
-ï¼
-ï¼
-ï¼ +++ b/migration/socket.c
-ï¼
-ï¼
-ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼
-ï¼
-ï¼       }
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       trace_migration_socket_incoming_accepted()
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-ï¼
-ï¼
-ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼
-ï¼
-ï¼       migration_channel_process_incoming(migrate_get_current(),
-ï¼
-ï¼
-ï¼                                          QIO_CHANNEL(sioc))
-ï¼
-ï¼
-ï¼       object_unref(OBJECT(sioc))
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ My test will not hang any more.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼
-ï¼
-ï¼
-ï¼ åä»¶äººï¼ address@hidden
-ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-ï¼ æéäººï¼ address@hidden address@hidden
-ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ Hi,Wang.
-ï¼
-ï¼ You can test this branch:
-ï¼
-ï¼
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-ï¼
-ï¼ and please follow wiki ensure your own configuration correctly.
-ï¼
-ï¼
-http://wiki.qemu-project.org/Features/COLO
-ï¼
-ï¼
-ï¼ Thanks
-ï¼
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼ ï¼
-ï¼ ï¼ hi.
-ï¼ ï¼
-ï¼ ï¼ I test the git qemu master have the same problem.
-ï¼ ï¼
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼
-ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼ ï¼
-ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ ï¼ (address@hidden, address@hidden "",
-ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼ ï¼
-ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼
-ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ ï¼ migration/qemu-file.c:295
-ï¼ ï¼
-ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼ ï¼
-ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:568
-ï¼ ï¼
-ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:648
-ï¼ ï¼
-ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ ï¼ address@hidden) at migration/colo.c:244
-ï¼ ï¼
-ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ ï¼ outï¼, address@hidden,
-ï¼ ï¼ address@hidden)
-ï¼ ï¼
-ï¼ ï¼     at migration/colo.c:264
-ï¼ ï¼
-ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼ ï¼
-ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼
-ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼ ï¼
-ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼
-ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼ ï¼
-ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼ ï¼
-ï¼ ï¼ $3 = 0
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼
-ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼ ï¼
-ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ ï¼ gmain.c:3054
-ï¼ ï¼
-ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ ï¼ address@hidden) at gmain.c:3630
-ï¼ ï¼
-ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼ ï¼
-ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ ï¼ util/main-loop.c:258
-ï¼ ï¼
-ï¼ ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ ï¼ util/main-loop.c:506
-ï¼ ï¼
-ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼ ï¼
-ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ ï¼ outï¼) at vl.c:4709
-ï¼ ï¼
-ï¼ ï¼ (gdb) p ioc-ï¼features
-ï¼ ï¼
-ï¼ ï¼ $1 = 6
-ï¼ ï¼
-ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼
-ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ May be socket_accept_incoming_migration should
-ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ thank you.
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ åå§é®ä»¶
-ï¼ ï¼ address@hidden
-ï¼ ï¼ address@hidden
-ï¼ ï¼ address@hidden@huawei.comï¼
-ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ ï¼ Do the colo have any plan for development?
-ï¼ ï¼
-ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼ ï¼
-ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼ ï¼
-ï¼ ï¼ In our internal version can run it successfully,
-ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ ï¼ Next time if you have some question about COLO,
-ï¼ ï¼ please cc me and zhanghailiang address@hidden
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ Thanks
-ï¼ ï¼ Zhang Chen
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ ï¼ (gdb) bt
-ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ ï¼ address@hidden) at migration/colo..c:215
-ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼ --
-ï¼ ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ --
-ï¼ ï¼ Thanks
-ï¼ ï¼ Zhang Chen
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-     }
-
-
- 
-
-
-     trace_migration_socket_incoming_accepted()
-
-
-    
-
-
-     qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-     migration_channel_process_incoming(migrate_get_current(),
-
-
-                                        QIO_CHANNEL(sioc))
-
-
-     object_unref(OBJECT(sioc))
-
-
-
-
-Is this patch ok? 
-
-I have test it . The test could not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼ address@hidden address@hidden
-æéäººï¼ address@hidden address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
-
-
-
-
-
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-ï¼ * Hailiang Zhang (address@hidden) wrote:
-ï¼ï¼ Hi,
-ï¼ï¼
-ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-ï¼ï¼
-ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do failover,
-ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
-ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
-ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-ï¼ï¼
-ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
-ï¼ï¼
-ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
-ï¼ï¼ if we tried to cancel migration.
-ï¼ï¼
-ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
-ï¼ï¼ {
-ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
-ï¼ï¼      migration_channel_connect(s, ioc, NULL)
-ï¼ï¼      ... ...
-ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-ï¼ï¼ and the
-ï¼ï¼ migrate_fd_cancel()
-ï¼ï¼ {
-ï¼ï¼   ... ...
-ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
-ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
-ï¼ï¼      }
-ï¼ï¼ }
-ï¼
-ï¼ (cc'd in Daniel Berrange).
-ï¼ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) 
-at the
-ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
-ï¼
-
-Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-
-ï¼ Dave
-ï¼
-ï¼ï¼ Thanks,
-ï¼ï¼ Hailiang
-ï¼ï¼
-ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
-ï¼ï¼ï¼ Thank youã
-ï¼ï¼ï¼
-ï¼ï¼ï¼ I have test areadyã
-ï¼ï¼ï¼
-ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-ï¼ï¼ï¼
-ï¼ï¼ï¼ Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node 
-qemu will not produce the problem,but Primary Node panic canã
-ï¼ï¼ï¼
-ï¼ï¼ï¼ I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ I test a patch:
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ --- a/migration/socket.c
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ +++ b/migration/socket.c
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        }
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        object_unref(OBJECT(sioc))
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ My test will not hang any more.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ åå§é®ä»¶
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
-ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
-ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ Hi,Wang.
-ï¼ï¼ï¼
-ï¼ï¼ï¼ You can test this branch:
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-ï¼ï¼ï¼
-ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-http://wiki.qemu-project.org/Features/COLO
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ Thanks
-ï¼ï¼ï¼
-ï¼ï¼ï¼ Zhang Chen
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ hi.
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
-ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
-ï¼ï¼ï¼ ï¼ address@hidden)
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼     at migration/colo.c:264
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ $3 = 0
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ï¼ï¼ ï¼ gmain.c:3054
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ï¼ï¼ ï¼ util/main-loop.c:258
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ï¼ï¼ ï¼ util/main-loop.c:506
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ $1 = 6
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
-ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ thank you.
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ åå§é®ä»¶
-ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
-ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ï¼ï¼ ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
-ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
-ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ Thanks
-ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
-ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) 
-at
-ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
-ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
-ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
-ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
-ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
-ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
-ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ --
-ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ --
-ï¼ï¼ï¼ ï¼ Thanks
-ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼
-ï¼ï¼
-ï¼ --
-ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
-ï¼
-ï¼ .
-ï¼
-
-Hi,
-
-On 2017/3/22 9:42, address@hidden wrote:
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-      }
-
-
-
-
-
-      trace_migration_socket_incoming_accepted()
-
-
-
-
-
-      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-      migration_channel_process_incoming(migrate_get_current(),
-
-
-                                         QIO_CHANNEL(sioc))
-
-
-      object_unref(OBJECT(sioc))
-
-
-
-
-Is this patch ok?
-Yes, i think this works, but a better way maybe to call 
-qio_channel_set_feature()
-in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
-socket accept fd,
-Or fix it by this:
-
-diff --git a/io/channel-socket.c b/io/channel-socket.c
-index f546c68..ce6894c 100644
---- a/io/channel-socket.c
-+++ b/io/channel-socket.c
-@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
-                           Error **errp)
- {
-     QIOChannelSocket *cioc;
--
--    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
--    cioc->fd = -1;
-+
-+    cioc = qio_channel_socket_new();
-     cioc->remoteAddrLen = sizeof(ioc->remoteAddr);
-     cioc->localAddrLen = sizeof(ioc->localAddr);
-
-
-Thanks,
-Hailiang
-I have test it . The test could not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼ address@hidden address@hidden
-æéäººï¼ address@hidden address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
-
-
-
-
-
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-ï¼ * Hailiang Zhang (address@hidden) wrote:
-ï¼ï¼ Hi,
-ï¼ï¼
-ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-ï¼ï¼
-ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do failover,
-ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
-ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
-ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-ï¼ï¼
-ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
-ï¼ï¼
-ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
-ï¼ï¼ if we tried to cancel migration.
-ï¼ï¼
-ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
-ï¼ï¼ {
-ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
-ï¼ï¼      migration_channel_connect(s, ioc, NULL)
-ï¼ï¼      ... ...
-ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-ï¼ï¼ and the
-ï¼ï¼ migrate_fd_cancel()
-ï¼ï¼ {
-ï¼ï¼   ... ...
-ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
-ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
-ï¼ï¼      }
-ï¼ï¼ }
-ï¼
-ï¼ (cc'd in Daniel Berrange).
-ï¼ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) 
-at the
-ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
-ï¼
-
-Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-
-ï¼ Dave
-ï¼
-ï¼ï¼ Thanks,
-ï¼ï¼ Hailiang
-ï¼ï¼
-ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
-ï¼ï¼ï¼ Thank youã
-ï¼ï¼ï¼
-ï¼ï¼ï¼ I have test areadyã
-ï¼ï¼ï¼
-ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-ï¼ï¼ï¼
-ï¼ï¼ï¼ Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node 
-qemu will not produce the problem,but Primary Node panic canã
-ï¼ï¼ï¼
-ï¼ï¼ï¼ I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ I test a patch:
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ --- a/migration/socket.c
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ +++ b/migration/socket.c
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        }
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼        object_unref(OBJECT(sioc))
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ My test will not hang any more.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ åå§é®ä»¶
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
-ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
-ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ Hi,Wang.
-ï¼ï¼ï¼
-ï¼ï¼ï¼ You can test this branch:
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-ï¼ï¼ï¼
-ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-http://wiki.qemu-project.org/Features/COLO
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ Thanks
-ï¼ï¼ï¼
-ï¼ï¼ï¼ Zhang Chen
-ï¼ï¼ï¼
-ï¼ï¼ï¼
-ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ hi.
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
-ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
-ï¼ï¼ï¼ ï¼ address@hidden)
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼     at migration/colo.c:264
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ $3 = 0
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ï¼ï¼ ï¼ gmain.c:3054
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ï¼ï¼ ï¼ util/main-loop.c:258
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ï¼ï¼ ï¼ util/main-loop.c:506
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ $1 = 6
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
-ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ thank you.
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ åå§é®ä»¶
-ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
-ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ï¼ï¼ ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
-ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
-ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ Thanks
-ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
-ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) 
-at
-ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
-ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
-ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
-ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
-ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
-ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
-ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼ --
-ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼ --
-ï¼ï¼ï¼ ï¼ Thanks
-ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼ ï¼
-ï¼ï¼ï¼
-ï¼ï¼
-ï¼ --
-ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
-ï¼
-ï¼ .
-ï¼
-
diff --git a/classification_output/04/other/59540920 b/classification_output/04/other/59540920
deleted file mode 100644
index ae43e07cf..000000000
--- a/classification_output/04/other/59540920
+++ /dev/null
@@ -1,384 +0,0 @@
-other: 0.989
-instruction: 0.986
-graphic: 0.985
-device: 0.985
-semantic: 0.985
-assembly: 0.984
-socket: 0.983
-network: 0.981
-boot: 0.980
-mistranslation: 0.978
-vnc: 0.977
-KVM: 0.970
-
-[BUG] No irqchip created after commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property")
-
-I apologize if this was already reported,
-
-I just noticed that with the latest updates QEMU doesn't start with the
-following configuration:
-
-qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu 
-host,hv_vpindex,hv_synic ...
-
-qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument
-qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument
-
-If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as
-usual. I bisected this to the following commit:
-
-commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad)
-Author: Paolo Bonzini <address@hidden>
-Date:   Wed Nov 13 10:56:53 2019 +0100
-
-    kvm: convert "-machine kernel_irqchip" to an accelerator property
-    
-so aparently we now default to 'kernel_irqchip=off'. Is this the desired
-behavior?
-
--- 
-Vitaly
-
-No, absolutely not. I was sure I had tested it, but I will take a look.
-Paolo
-Il ven 20 dic 2019, 15:11 Vitaly Kuznetsov <
-address@hidden
-> ha scritto:
-I apologize if this was already reported,
-I just noticed that with the latest updates QEMU doesn't start with the
-following configuration:
-qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu host,hv_vpindex,hv_synic ...
-qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument
-qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument
-If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as
-usual. I bisected this to the following commit:
-commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad)
-Author: Paolo Bonzini <
-address@hidden
->
-Date:Â  Â Wed Nov 13 10:56:53 2019 +0100
-Â  Â  kvm: convert "-machine kernel_irqchip" to an accelerator property
-so aparently we now default to 'kernel_irqchip=off'. Is this the desired
-behavior?
---
-Vitaly
-
-Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an
-accelerator property") moves kernel_irqchip property from "-machine" to
-"-accel kvm", but it forgets to set the default value of
-kernel_irqchip_allowed and kernel_irqchip_split.
-
-Also cleaning up the three useless members (kernel_irqchip_allowed,
-kernel_irqchip_required, kernel_irqchip_split) in struct MachineState.
-
-Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator 
-property")
-Signed-off-by: Xiaoyao Li <address@hidden>
----
- accel/kvm/kvm-all.c | 3 +++
- include/hw/boards.h | 3 ---
- 2 files changed, 3 insertions(+), 3 deletions(-)
-
-diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
-index b2f1a5bcb5ef..40f74094f8d3 100644
---- a/accel/kvm/kvm-all.c
-+++ b/accel/kvm/kvm-all.c
-@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void)
- static void kvm_accel_instance_init(Object *obj)
- {
-     KVMState *s = KVM_STATE(obj);
-+    MachineClass *mc = MACHINE_GET_CLASS(current_machine);
- 
-     s->kvm_shadow_mem = -1;
-+    s->kernel_irqchip_allowed = true;
-+    s->kernel_irqchip_split = mc->default_kernel_irqchip_split;
- }
- 
- static void kvm_accel_class_init(ObjectClass *oc, void *data)
-diff --git a/include/hw/boards.h b/include/hw/boards.h
-index 61f8bb8e5a42..fb1b43d5b972 100644
---- a/include/hw/boards.h
-+++ b/include/hw/boards.h
-@@ -271,9 +271,6 @@ struct MachineState {
- 
-     /*< public >*/
- 
--    bool kernel_irqchip_allowed;
--    bool kernel_irqchip_required;
--    bool kernel_irqchip_split;
-     char *dtb;
-     char *dumpdtb;
-     int phandle_start;
--- 
-2.19.1
-
-Il sab 28 dic 2019, 09:48 Xiaoyao Li <
-address@hidden
-> ha scritto:
-Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an
-accelerator property") moves kernel_irqchip property from "-machine" to
-"-accel kvm", but it forgets to set the default value of
-kernel_irqchip_allowed and kernel_irqchip_split.
-Also cleaning up the three useless members (kernel_irqchip_allowed,
-kernel_irqchip_required, kernel_irqchip_split) in struct MachineState.
-Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property")
-Signed-off-by: Xiaoyao Li <
-address@hidden
->
-Please also add a Reported-by line for Vitaly Kuznetsov.
----
-Â accel/kvm/kvm-all.c | 3 +++
-Â include/hw/boards.h | 3 ---
-Â 2 files changed, 3 insertions(+), 3 deletions(-)
-diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
-index b2f1a5bcb5ef..40f74094f8d3 100644
---- a/accel/kvm/kvm-all.c
-+++ b/accel/kvm/kvm-all.c
-@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void)
-Â static void kvm_accel_instance_init(Object *obj)
-Â {
-Â  Â  Â KVMState *s = KVM_STATE(obj);
-+Â  Â  MachineClass *mc = MACHINE_GET_CLASS(current_machine);
-Â  Â  Â s->kvm_shadow_mem = -1;
-+Â  Â  s->kernel_irqchip_allowed = true;
-+Â  Â  s->kernel_irqchip_split = mc->default_kernel_irqchip_split;
-Can you initialize this from the init_machine method instead of assuming that current_machine has been initialized earlier?
-Thanks for the quick fix!
-Paolo
-Â }
-Â static void kvm_accel_class_init(ObjectClass *oc, void *data)
-diff --git a/include/hw/boards.h b/include/hw/boards.h
-index 61f8bb8e5a42..fb1b43d5b972 100644
---- a/include/hw/boards.h
-+++ b/include/hw/boards.h
-@@ -271,9 +271,6 @@ struct MachineState {
-Â  Â  Â /*< public >*/
--Â  Â  bool kernel_irqchip_allowed;
--Â  Â  bool kernel_irqchip_required;
--Â  Â  bool kernel_irqchip_split;
-Â  Â  Â char *dtb;
-Â  Â  Â char *dumpdtb;
-Â  Â  Â int phandle_start;
---
-2.19.1
-
-On Sat, 2019-12-28 at 10:02 +0000, Paolo Bonzini wrote:
->
->
->
-Il sab 28 dic 2019, 09:48 Xiaoyao Li <address@hidden> ha scritto:
->
-> Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an
->
-> accelerator property") moves kernel_irqchip property from "-machine" to
->
-> "-accel kvm", but it forgets to set the default value of
->
-> kernel_irqchip_allowed and kernel_irqchip_split.
->
->
->
-> Also cleaning up the three useless members (kernel_irqchip_allowed,
->
-> kernel_irqchip_required, kernel_irqchip_split) in struct MachineState.
->
->
->
-> Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an
->
-> accelerator property")
->
-> Signed-off-by: Xiaoyao Li <address@hidden>
->
->
-Please also add a Reported-by line for Vitaly Kuznetsov.
-Sure.
-
->
-> ---
->
->  accel/kvm/kvm-all.c | 3 +++
->
->  include/hw/boards.h | 3 ---
->
->  2 files changed, 3 insertions(+), 3 deletions(-)
->
->
->
-> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
->
-> index b2f1a5bcb5ef..40f74094f8d3 100644
->
-> --- a/accel/kvm/kvm-all.c
->
-> +++ b/accel/kvm/kvm-all.c
->
-> @@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void)
->
->  static void kvm_accel_instance_init(Object *obj)
->
->  {
->
->      KVMState *s = KVM_STATE(obj);
->
-> +    MachineClass *mc = MACHINE_GET_CLASS(current_machine);
->
->
->
->      s->kvm_shadow_mem = -1;
->
-> +    s->kernel_irqchip_allowed = true;
->
-> +    s->kernel_irqchip_split = mc->default_kernel_irqchip_split;
->
->
-Can you initialize this from the init_machine method instead of assuming that
->
-current_machine has been initialized earlier?
-OK, will do it in v2.
-
->
-Thanks for the quick fix!
-BTW, it seems that this patch makes kernel_irqchip default on to workaround the
-bug. 
-However, when explicitly configuring kernel_irqchip=off, guest still fails
-booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel
-ubuntu guest. Any idea about this? 
-
->
-Paolo
->
->  }
->
->
->
->  static void kvm_accel_class_init(ObjectClass *oc, void *data)
->
-> diff --git a/include/hw/boards.h b/include/hw/boards.h
->
-> index 61f8bb8e5a42..fb1b43d5b972 100644
->
-> --- a/include/hw/boards.h
->
-> +++ b/include/hw/boards.h
->
-> @@ -271,9 +271,6 @@ struct MachineState {
->
->
->
->      /*< public >*/
->
->
->
-> -    bool kernel_irqchip_allowed;
->
-> -    bool kernel_irqchip_required;
->
-> -    bool kernel_irqchip_split;
->
->      char *dtb;
->
->      char *dumpdtb;
->
->      int phandle_start;
-
-Il sab 28 dic 2019, 10:24 Xiaoyao Li <
-address@hidden
-> ha scritto:
-BTW, it seems that this patch makes kernel_irqchip default on to workaround the
-bug.
-However, when explicitly configuring kernel_irqchip=off, guest still fails
-booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel
-ubuntu guest. Any idea about this?
-We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu host by chance?
-Paolo
-> Paolo
-> >Â  }
-> >
-> >Â  static void kvm_accel_class_init(ObjectClass *oc, void *data)
-> > diff --git a/include/hw/boards.h b/include/hw/boards.h
-> > index 61f8bb8e5a42..fb1b43d5b972 100644
-> > --- a/include/hw/boards.h
-> > +++ b/include/hw/boards.h
-> > @@ -271,9 +271,6 @@ struct MachineState {
-> >
-> >Â  Â  Â  /*< public >*/
-> >
-> > -Â  Â  bool kernel_irqchip_allowed;
-> > -Â  Â  bool kernel_irqchip_required;
-> > -Â  Â  bool kernel_irqchip_split;
-> >Â  Â  Â  char *dtb;
-> >Â  Â  Â  char *dumpdtb;
-> >Â  Â  Â  int phandle_start;
-
-On Sat, 2019-12-28 at 10:57 +0000, Paolo Bonzini wrote:
->
->
->
-Il sab 28 dic 2019, 10:24 Xiaoyao Li <address@hidden> ha scritto:
->
-> BTW, it seems that this patch makes kernel_irqchip default on to workaround
->
-> the
->
-> bug.
->
-> However, when explicitly configuring kernel_irqchip=off, guest still fails
->
-> booting due to "KVM: failed to send PV IPI: -95" with a latest upstream
->
-> kernel
->
-> ubuntu guest. Any idea about this?
->
->
-We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu
->
-host by chance?
-Yes, I used -cpu host.
-
-After using "-cpu host,-kvm-pv-ipi" with kernel_irqchip=off, it can boot
-successfully.
-
->
-Paolo
->
->
-> > Paolo
->
-> > >  }
->
-> > >
->
-> > >  static void kvm_accel_class_init(ObjectClass *oc, void *data)
->
-> > > diff --git a/include/hw/boards.h b/include/hw/boards.h
->
-> > > index 61f8bb8e5a42..fb1b43d5b972 100644
->
-> > > --- a/include/hw/boards.h
->
-> > > +++ b/include/hw/boards.h
->
-> > > @@ -271,9 +271,6 @@ struct MachineState {
->
-> > >
->
-> > >      /*< public >*/
->
-> > >
->
-> > > -    bool kernel_irqchip_allowed;
->
-> > > -    bool kernel_irqchip_required;
->
-> > > -    bool kernel_irqchip_split;
->
-> > >      char *dtb;
->
-> > >      char *dumpdtb;
->
-> > >      int phandle_start;
->
->
-
diff --git a/classification_output/04/other/64571620 b/classification_output/04/other/64571620
deleted file mode 100644
index e6da5ef48..000000000
--- a/classification_output/04/other/64571620
+++ /dev/null
@@ -1,793 +0,0 @@
-other: 0.922
-mistranslation: 0.917
-assembly: 0.913
-semantic: 0.903
-device: 0.899
-graphic: 0.897
-instruction: 0.894
-boot: 0.879
-KVM: 0.867
-socket: 0.855
-network: 0.853
-vnc: 0.819
-
-[BUG] Migration hv_time rollback
-
-Hi,
-
-We are experiencing timestamp rollbacks during live-migration of
-Windows 10 guests with the following qemu configuration (linux 5.4.46
-and qemu master):
-```
-$ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
-```
-
-I have tracked the bug to the fact that `kvmclock` is not exposed and
-disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
-
-I think we should enable the `kvmclock` (qemu device) if `hv-time` is
-present and add Hyper-V support for the `kvmclock_current_nsec`
-function.
-
-I'm asking for advice because I am unsure this is the _right_ approach
-and how to keep migration compatibility between qemu versions.
-
-Thank you all,
-
--- 
-Antoine 'xdbob' Damhet
-signature.asc
-Description:
-PGP signature
-
-cc'ing in Vitaly who knows about the hv stuff.
-
-* Antoine Damhet (antoine.damhet@blade-group.com) wrote:
->
-Hi,
->
->
-We are experiencing timestamp rollbacks during live-migration of
->
-Windows 10 guests with the following qemu configuration (linux 5.4.46
->
-and qemu master):
->
-```
->
-$ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
->
-```
-How big a jump are you seeing, and how did you notice it in the guest?
-
-Dave
-
->
-I have tracked the bug to the fact that `kvmclock` is not exposed and
->
-disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
->
->
-I think we should enable the `kvmclock` (qemu device) if `hv-time` is
->
-present and add Hyper-V support for the `kvmclock_current_nsec`
->
-function.
->
->
-I'm asking for advice because I am unsure this is the _right_ approach
->
-and how to keep migration compatibility between qemu versions.
->
->
-Thank you all,
->
->
---
->
-Antoine 'xdbob' Damhet
--- 
-Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-
-"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
-
->
-cc'ing in Vitaly who knows about the hv stuff.
->
-cc'ing Marcelo who knows about clocksources :-)
-
->
-* Antoine Damhet (antoine.damhet@blade-group.com) wrote:
->
-> Hi,
->
->
->
-> We are experiencing timestamp rollbacks during live-migration of
->
-> Windows 10 guests
-Are you migrating to the same hardware (with the same TSC frequency)? Is
-TSC used as the clocksource on the host?
-
->
->  with the following qemu configuration (linux 5.4.46
->
-> and qemu master):
->
-> ```
->
-> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
->
-> ```
-Out of pure curiosity, what's the purpose of doing 'kvm=off'? Windows is
-not going to check for KVM identification anyway so we pretend we're
-Hyper-V. 
-
-Also, have you tried adding more Hyper-V enlightenments? 
-
->
->
-How big a jump are you seeing, and how did you notice it in the guest?
->
->
-Dave
->
->
-> I have tracked the bug to the fact that `kvmclock` is not exposed and
->
-> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
->
->
->
-> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
->
-> present and add Hyper-V support for the `kvmclock_current_nsec`
->
-> function.
-AFAICT kvmclock_current_nsec() checks whether kvmclock was enabled by
-the guest:
-
-   if (!(env->system_time_msr & 1ULL)) {
-        /* KVM clock not active */
-        return 0;
-    }
-
-and this is (and way) always false for Windows guests.
-
->
->
->
-> I'm asking for advice because I am unsure this is the _right_ approach
->
-> and how to keep migration compatibility between qemu versions.
->
->
->
-> Thank you all,
->
->
->
-> --
->
-> Antoine 'xdbob' Damhet
--- 
-Vitaly
-
-On Wed, Sep 16, 2020 at 01:59:43PM +0200, Vitaly Kuznetsov wrote:
->
-"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
->
->
-> cc'ing in Vitaly who knows about the hv stuff.
->
->
->
->
-cc'ing Marcelo who knows about clocksources :-)
->
->
-> * Antoine Damhet (antoine.damhet@blade-group.com) wrote:
->
->> Hi,
->
->>
->
->> We are experiencing timestamp rollbacks during live-migration of
->
->> Windows 10 guests
->
->
-Are you migrating to the same hardware (with the same TSC frequency)? Is
->
-TSC used as the clocksource on the host?
-Yes we are migrating to the exact same hardware. And yes TSC is used as
-a clocksource in the host (but the bug is still happening with `hpet` as
-a clocksource).
-
->
->
->>  with the following qemu configuration (linux 5.4.46
->
->> and qemu master):
->
->> ```
->
->> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
->
->> ```
->
->
-Out of pure curiosity, what's the purpose of doing 'kvm=off'? Windows is
->
-not going to check for KVM identification anyway so we pretend we're
->
-Hyper-V.
-Some softwares explicitly checks for the presence of KVM and then crash
-if they find it in CPUID :/
-
->
->
-Also, have you tried adding more Hyper-V enlightenments?
-Yes, I published a stripped-down command-line for a minimal reproducer
-but even `hv-frequencies` and `hv-reenlightenment` don't help.
-
->
->
->
->
-> How big a jump are you seeing, and how did you notice it in the guest?
->
->
->
-> Dave
->
->
->
->> I have tracked the bug to the fact that `kvmclock` is not exposed and
->
->> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
->
->>
->
->> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
->
->> present and add Hyper-V support for the `kvmclock_current_nsec`
->
->> function.
->
->
-AFAICT kvmclock_current_nsec() checks whether kvmclock was enabled by
->
-the guest:
->
->
-if (!(env->system_time_msr & 1ULL)) {
->
-/* KVM clock not active */
->
-return 0;
->
-}
->
->
-and this is (and way) always false for Windows guests.
-Hooo, I missed this piece. When is `clock_is_reliable` expected to be
-false ? Because if it is I still think we should be able to query at
-least `HV_X64_MSR_REFERENCE_TSC`
-
->
->
->>
->
->> I'm asking for advice because I am unsure this is the _right_ approach
->
->> and how to keep migration compatibility between qemu versions.
->
->>
->
->> Thank you all,
->
->>
->
->> --
->
->> Antoine 'xdbob' Damhet
->
->
---
->
-Vitaly
->
--- 
-Antoine 'xdbob' Damhet
-signature.asc
-Description:
-PGP signature
-
-On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote:
->
-cc'ing in Vitaly who knows about the hv stuff.
-Thanks
-
->
->
-* Antoine Damhet (antoine.damhet@blade-group.com) wrote:
->
-> Hi,
->
->
->
-> We are experiencing timestamp rollbacks during live-migration of
->
-> Windows 10 guests with the following qemu configuration (linux 5.4.46
->
-> and qemu master):
->
-> ```
->
-> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
->
-> ```
->
->
-How big a jump are you seeing, and how did you notice it in the guest?
-I'm seeing jumps of about the guest uptime (indicating a reset of the
-counter). It's expected because we won't call `KVM_SET_CLOCK` to
-restore any value.
-
-We first noticed it because after some migrations `dwm.exe` crashes with
-the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in
-the past." error code.
-
-I can also confirm the following hack makes the behavior disappear:
-
-```
-diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
-index 64283358f9..f334bdf35f 100644
---- a/hw/i386/kvm/clock.c
-+++ b/hw/i386/kvm/clock.c
-@@ -332,11 +332,7 @@ void kvmclock_create(void)
- {
-     X86CPU *cpu = X86_CPU(first_cpu);
-
--    if (kvm_enabled() &&
--        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
--                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
--        sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
--    }
-+    sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
- }
-
- static void kvmclock_register_types(void)
-diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
-index 32b1453e6a..11d980ba85 100644
---- a/hw/i386/pc_piix.c
-+++ b/hw/i386/pc_piix.c
-@@ -158,9 +158,7 @@ static void pc_init1(MachineState *machine,
-
-     x86_cpus_init(x86ms, pcmc->default_cpu_version);
-
--    if (kvm_enabled() && pcmc->kvmclock_enabled) {
--        kvmclock_create();
--    }
-+    kvmclock_create();
-
-     if (pcmc->pci_enabled) {
-         pci_memory = g_new(MemoryRegion, 1);
-```
-
->
->
-Dave
->
->
-> I have tracked the bug to the fact that `kvmclock` is not exposed and
->
-> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
->
->
->
-> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
->
-> present and add Hyper-V support for the `kvmclock_current_nsec`
->
-> function.
->
->
->
-> I'm asking for advice because I am unsure this is the _right_ approach
->
-> and how to keep migration compatibility between qemu versions.
->
->
->
-> Thank you all,
->
->
->
-> --
->
-> Antoine 'xdbob' Damhet
->
->
->
---
->
-Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
->
--- 
-Antoine 'xdbob' Damhet
-signature.asc
-Description:
-PGP signature
-
-Antoine Damhet <antoine.damhet@blade-group.com> writes:
-
->
-On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote:
->
-> cc'ing in Vitaly who knows about the hv stuff.
->
->
-Thanks
->
->
->
->
-> * Antoine Damhet (antoine.damhet@blade-group.com) wrote:
->
-> > Hi,
->
-> >
->
-> > We are experiencing timestamp rollbacks during live-migration of
->
-> > Windows 10 guests with the following qemu configuration (linux 5.4.46
->
-> > and qemu master):
->
-> > ```
->
-> > $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
->
-> > ```
->
->
->
-> How big a jump are you seeing, and how did you notice it in the guest?
->
->
-I'm seeing jumps of about the guest uptime (indicating a reset of the
->
-counter). It's expected because we won't call `KVM_SET_CLOCK` to
->
-restore any value.
->
->
-We first noticed it because after some migrations `dwm.exe` crashes with
->
-the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in
->
-the past." error code.
->
->
-I can also confirm the following hack makes the behavior disappear:
->
->
-```
->
-diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
->
-index 64283358f9..f334bdf35f 100644
->
---- a/hw/i386/kvm/clock.c
->
-+++ b/hw/i386/kvm/clock.c
->
-@@ -332,11 +332,7 @@ void kvmclock_create(void)
->
-{
->
-X86CPU *cpu = X86_CPU(first_cpu);
->
->
--    if (kvm_enabled() &&
->
--        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
->
--                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
->
--        sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
->
--    }
->
-+    sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
->
-}
->
-Oh, I think I see what's going on. When you add 'kvm=off'
-cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so
-kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on
-migration.
-
-In case we really want to support 'kvm=off' I think we can add Hyper-V
-features check here along with KVM, this should do the job.
-
--- 
-Vitaly
-
-Vitaly Kuznetsov <vkuznets@redhat.com> writes:
-
->
-Antoine Damhet <antoine.damhet@blade-group.com> writes:
->
->
-> On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote:
->
->> cc'ing in Vitaly who knows about the hv stuff.
->
->
->
-> Thanks
->
->
->
->>
->
->> * Antoine Damhet (antoine.damhet@blade-group.com) wrote:
->
->> > Hi,
->
->> >
->
->> > We are experiencing timestamp rollbacks during live-migration of
->
->> > Windows 10 guests with the following qemu configuration (linux 5.4.46
->
->> > and qemu master):
->
->> > ```
->
->> > $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
->
->> > ```
->
->>
->
->> How big a jump are you seeing, and how did you notice it in the guest?
->
->
->
-> I'm seeing jumps of about the guest uptime (indicating a reset of the
->
-> counter). It's expected because we won't call `KVM_SET_CLOCK` to
->
-> restore any value.
->
->
->
-> We first noticed it because after some migrations `dwm.exe` crashes with
->
-> the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in
->
-> the past." error code.
->
->
->
-> I can also confirm the following hack makes the behavior disappear:
->
->
->
-> ```
->
-> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
->
-> index 64283358f9..f334bdf35f 100644
->
-> --- a/hw/i386/kvm/clock.c
->
-> +++ b/hw/i386/kvm/clock.c
->
-> @@ -332,11 +332,7 @@ void kvmclock_create(void)
->
->  {
->
->      X86CPU *cpu = X86_CPU(first_cpu);
->
->
->
-> -    if (kvm_enabled() &&
->
-> -        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
->
-> -                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2)))
->
-> {
->
-> -        sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
->
-> -    }
->
-> +    sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
->
->  }
->
->
->
->
->
-Oh, I think I see what's going on. When you add 'kvm=off'
->
-cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so
->
-kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on
->
-migration.
->
->
-In case we really want to support 'kvm=off' I think we can add Hyper-V
->
-features check here along with KVM, this should do the job.
-Does the untested
-
-diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
-index 64283358f91d..e03b2ca6d8f6 100644
---- a/hw/i386/kvm/clock.c
-+++ b/hw/i386/kvm/clock.c
-@@ -333,8 +333,9 @@ void kvmclock_create(void)
-     X86CPU *cpu = X86_CPU(first_cpu);
- 
-     if (kvm_enabled() &&
--        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
--                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
-+        ((cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
-+                                         (1ULL << KVM_FEATURE_CLOCKSOURCE2))) 
-||
-+         (cpu->env.features[FEAT_HYPERV_EAX] & HV_TIME_REF_COUNT_AVAILABLE))) {
-         sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
-     }
- }
-
-help?
-
-(I don't think we need to remove all 'if (kvm_enabled())' checks from
-machine types as 'kvm=off' should not be related).
-
--- 
-Vitaly
-
-On Wed, Sep 16, 2020 at 02:50:56PM +0200, Vitaly Kuznetsov wrote:
-[...]
-
->
->>
->
->
->
->
->
-> Oh, I think I see what's going on. When you add 'kvm=off'
->
-> cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so
->
-> kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on
->
-> migration.
->
->
->
-> In case we really want to support 'kvm=off' I think we can add Hyper-V
->
-> features check here along with KVM, this should do the job.
->
->
-Does the untested
->
->
-diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
->
-index 64283358f91d..e03b2ca6d8f6 100644
->
---- a/hw/i386/kvm/clock.c
->
-+++ b/hw/i386/kvm/clock.c
->
-@@ -333,8 +333,9 @@ void kvmclock_create(void)
->
-X86CPU *cpu = X86_CPU(first_cpu);
->
->
-if (kvm_enabled() &&
->
--        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
->
--                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
->
-+        ((cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
->
-+                                         (1ULL <<
->
-KVM_FEATURE_CLOCKSOURCE2))) ||
->
-+         (cpu->env.features[FEAT_HYPERV_EAX] &
->
-HV_TIME_REF_COUNT_AVAILABLE))) {
->
-sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
->
-}
->
-}
->
->
-help?
-It appears to work :)
-
->
->
-(I don't think we need to remove all 'if (kvm_enabled())' checks from
->
-machine types as 'kvm=off' should not be related).
-Indeed (I didn't look at the macro, it was just quick & dirty).
-
->
->
---
->
-Vitaly
->
->
--- 
-Antoine 'xdbob' Damhet
-signature.asc
-Description:
-PGP signature
-
-On 16/09/20 13:29, Dr. David Alan Gilbert wrote:
->
-> I have tracked the bug to the fact that `kvmclock` is not exposed and
->
-> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
->
->
->
-> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
->
-> present and add Hyper-V support for the `kvmclock_current_nsec`
->
-> function.
-Yes, this seems correct.  I would have to check but it may even be
-better to _always_ send kvmclock data in the live migration stream.
-
-Paolo
-
-Paolo Bonzini <pbonzini@redhat.com> writes:
-
->
-On 16/09/20 13:29, Dr. David Alan Gilbert wrote:
->
->> I have tracked the bug to the fact that `kvmclock` is not exposed and
->
->> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
->
->>
->
->> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
->
->> present and add Hyper-V support for the `kvmclock_current_nsec`
->
->> function.
->
->
-Yes, this seems correct.  I would have to check but it may even be
->
-better to _always_ send kvmclock data in the live migration stream.
->
-The question I have is: with 'kvm=off', do we actually restore TSC
-reading on migration? (and I guess the answer is 'no' or Hyper-V TSC
-page would 'just work' I guess). So yea, maybe dropping the
-'cpu->env.features[FEAT_KVM]' check is the right fix.
-
--- 
-Vitaly
-
diff --git a/classification_output/04/other/65781993 b/classification_output/04/other/65781993
deleted file mode 100644
index a7d133b69..000000000
--- a/classification_output/04/other/65781993
+++ /dev/null
@@ -1,2801 +0,0 @@
-other: 0.727
-instruction: 0.670
-assembly: 0.666
-semantic: 0.665
-graphic: 0.664
-socket: 0.660
-network: 0.657
-mistranslation: 0.650
-device: 0.647
-boot: 0.635
-KVM: 0.627
-vnc: 0.590
-
-[Qemu-devel] 答复: Re:   答复: Re:  [BUG]COLO failover hang
-
-Thank youã
-
-I have test areadyã
-
-When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node qemu 
-will not produce the problem,but Primary Node panic canã
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-
-
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-     }
-
-
- 
-
-
-     trace_migration_socket_incoming_accepted()
-
-
-    
-
-
-     qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-     migration_channel_process_incoming(migrate_get_current(),
-
-
-                                        QIO_CHANNEL(sioc))
-
-
-     object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼
-ï¼ hi.
-ï¼
-ï¼ I test the git qemu master have the same problem.
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, 
-ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼
-ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read 
-ï¼ (address@hidden, address@hidden "", 
-ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼
-ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, 
-ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at 
-ï¼ migration/qemu-file-channel.c:78
-ï¼
-ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
-ï¼ migration/qemu-file.c:295
-ï¼
-ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, 
-ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼
-ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at 
-ï¼ migration/qemu-file.c:568
-ï¼
-ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at 
-ï¼ migration/qemu-file.c:648
-ï¼
-ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
-ï¼ address@hidden) at migration/colo.c:244
-ï¼
-ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized 
-ï¼ outï¼, address@hidden, 
-ï¼ address@hidden)
-ï¼
-ï¼     at migration/colo.c:264
-ï¼
-ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread 
-ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼
-ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼
-ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼
-ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼
-ï¼ $3 = 0
-ï¼
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, 
-ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼
-ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at 
-ï¼ gmain.c:3054
-ï¼
-ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼, 
-ï¼ address@hidden) at gmain.c:3630
-ï¼
-ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼
-ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at 
-ï¼ util/main-loop.c:258
-ï¼
-ï¼ #5  main_loop_wait (address@hidden) at 
-ï¼ util/main-loop.c:506
-ï¼
-ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼
-ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized 
-ï¼ outï¼) at vl.c:4709
-ï¼
-ï¼ (gdb) p ioc-ï¼features
-ï¼
-ï¼ $1 = 6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼
-ï¼
-ï¼ May be socket_accept_incoming_migration should 
-ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼
-ï¼
-ï¼ thank you.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼ address@hidden
-ï¼ address@hidden
-ï¼ address@hidden@huawei.comï¼
-ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼
-ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼
-ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ Do the colo have any plan for development?
-ï¼
-ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼
-ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼
-ï¼ In our internal version can run it successfully,
-ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ Next time if you have some question about COLO,
-ï¼ please cc me and zhanghailiang address@hidden
-ï¼
-ï¼
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ --
-ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼
-ï¼ -- 
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-
--- 
-Thanks
-Zhang Chen
-
-Hi,
-
-On 2017/3/21 16:10, address@hidden wrote:
-Thank youã
-
-I have test areadyã
-
-When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node qemu 
-will not produce the problem,but Primary Node panic canã
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-Yes, you are right, when we do failover for primary/secondary VM, we will 
-shutdown the related
-fd in case it is stuck in the read/write fd.
-
-It seems that you didn't follow the above introduction exactly to do the test. 
-Could you
-share your test procedures ? Especially the commands used in the test.
-
-Thanks,
-Hailiang
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-      }
-
-
-
-
-
-      trace_migration_socket_incoming_accepted()
-
-
-
-
-
-      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-      migration_channel_process_incoming(migrate_get_current(),
-
-
-                                         QIO_CHANNEL(sioc))
-
-
-      object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼
-ï¼ hi.
-ï¼
-ï¼ I test the git qemu master have the same problem.
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼
-ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ (address@hidden, address@hidden "",
-ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼
-ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ migration/qemu-file-channel.c:78
-ï¼
-ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ migration/qemu-file.c:295
-ï¼
-ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼
-ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ migration/qemu-file.c:568
-ï¼
-ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ migration/qemu-file.c:648
-ï¼
-ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ address@hidden) at migration/colo.c:244
-ï¼
-ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ outï¼, address@hidden,
-ï¼ address@hidden)
-ï¼
-ï¼     at migration/colo.c:264
-ï¼
-ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼
-ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼
-ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼
-ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼
-ï¼ $3 = 0
-ï¼
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼
-ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ gmain.c:3054
-ï¼
-ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ address@hidden) at gmain.c:3630
-ï¼
-ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼
-ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ util/main-loop.c:258
-ï¼
-ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ util/main-loop.c:506
-ï¼
-ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼
-ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ outï¼) at vl.c:4709
-ï¼
-ï¼ (gdb) p ioc-ï¼features
-ï¼
-ï¼ $1 = 6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼
-ï¼
-ï¼ May be socket_accept_incoming_migration should
-ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼
-ï¼
-ï¼ thank you.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼ address@hidden
-ï¼ address@hidden
-ï¼ address@hidden@huawei.comï¼
-ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼
-ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼
-ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ Do the colo have any plan for development?
-ï¼
-ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼
-ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼
-ï¼ In our internal version can run it successfully,
-ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ Next time if you have some question about COLO,
-ï¼ please cc me and zhanghailiang address@hidden
-ï¼
-ï¼
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ --
-ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼
-ï¼ --
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-
-Hi,
-
-Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-
-Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-case COLO thread/incoming thread is stuck in read/write() while do failover,
-but it didn't take effect, because all the fd used by COLO (also migration)
-has been wrapped by qio channel, and it will not call the shutdown API if
-we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-
-Cc: Dr. David Alan Gilbert <address@hidden>
-
-I doubted migration cancel has the same problem, it may be stuck in write()
-if we tried to cancel migration.
-
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
-**errp)
-{
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
-    migration_channel_connect(s, ioc, NULL);
-    ... ...
-We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-and the
-migrate_fd_cancel()
-{
- ... ...
-    if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-        qemu_file_shutdown(f);  --> This will not take effect. No ?
-    }
-}
-
-Thanks,
-Hailiang
-
-On 2017/3/21 16:10, address@hidden wrote:
-Thank youã
-
-I have test areadyã
-
-When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node qemu 
-will not produce the problem,but Primary Node panic canã
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-
-
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-      }
-
-
-
-
-
-      trace_migration_socket_incoming_accepted()
-
-
-
-
-
-      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-      migration_channel_process_incoming(migrate_get_current(),
-
-
-                                         QIO_CHANNEL(sioc))
-
-
-      object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼
-ï¼ hi.
-ï¼
-ï¼ I test the git qemu master have the same problem.
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼
-ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ (address@hidden, address@hidden "",
-ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼
-ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ migration/qemu-file-channel.c:78
-ï¼
-ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ migration/qemu-file.c:295
-ï¼
-ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼
-ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ migration/qemu-file.c:568
-ï¼
-ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ migration/qemu-file.c:648
-ï¼
-ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ address@hidden) at migration/colo.c:244
-ï¼
-ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ outï¼, address@hidden,
-ï¼ address@hidden)
-ï¼
-ï¼     at migration/colo.c:264
-ï¼
-ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼
-ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼
-ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼
-ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼
-ï¼ $3 = 0
-ï¼
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼
-ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ gmain.c:3054
-ï¼
-ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ address@hidden) at gmain.c:3630
-ï¼
-ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼
-ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ util/main-loop.c:258
-ï¼
-ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ util/main-loop.c:506
-ï¼
-ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼
-ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ outï¼) at vl.c:4709
-ï¼
-ï¼ (gdb) p ioc-ï¼features
-ï¼
-ï¼ $1 = 6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼
-ï¼
-ï¼ May be socket_accept_incoming_migration should
-ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼
-ï¼
-ï¼ thank you.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼ address@hidden
-ï¼ address@hidden
-ï¼ address@hidden@huawei.comï¼
-ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼
-ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼
-ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ Do the colo have any plan for development?
-ï¼
-ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼
-ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼
-ï¼ In our internal version can run it successfully,
-ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ Next time if you have some question about COLO,
-ï¼ please cc me and zhanghailiang address@hidden
-ï¼
-ï¼
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ --
-ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼
-ï¼ --
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-
-* Hailiang Zhang (address@hidden) wrote:
->
-Hi,
->
->
-Thanks for reporting this, and i confirmed it in my test, and it is a bug.
->
->
-Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
->
-case COLO thread/incoming thread is stuck in read/write() while do failover,
->
-but it didn't take effect, because all the fd used by COLO (also migration)
->
-has been wrapped by qio channel, and it will not call the shutdown API if
->
-we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-QIO_CHANNEL_FEATURE_SHUTDOWN).
->
->
-Cc: Dr. David Alan Gilbert <address@hidden>
->
->
-I doubted migration cancel has the same problem, it may be stuck in write()
->
-if we tried to cancel migration.
->
->
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error
->
-**errp)
->
-{
->
-qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
->
-migration_channel_connect(s, ioc, NULL);
->
-... ...
->
-We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
->
-and the
->
-migrate_fd_cancel()
->
-{
->
-... ...
->
-if (s->state == MIGRATION_STATUS_CANCELLING && f) {
->
-qemu_file_shutdown(f);  --> This will not take effect. No ?
->
-}
->
-}
-(cc'd in Daniel Berrange).
-I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
-at the
-top of qio_channel_socket_new;  so I think that's safe isn't it?
-
-Dave
-
->
-Thanks,
->
-Hailiang
->
->
-On 2017/3/21 16:10, address@hidden wrote:
->
-> Thank youã
->
->
->
-> I have test areadyã
->
->
->
-> When the Primary Node panic,the Secondary Node qemu hang at the same placeã
->
->
->
-> Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node
->
-> qemu will not produce the problem,but Primary Node panic canã
->
->
->
-> I think due to the feature of channel does not support
->
-> QIO_CHANNEL_FEATURE_SHUTDOWN.
->
->
->
->
->
-> when failover,channel_shutdown could not shut down the channel.
->
->
->
->
->
-> so the colo_process_incoming_thread will hang at recvmsg.
->
->
->
->
->
-> I test a patch:
->
->
->
->
->
-> diff --git a/migration/socket.c b/migration/socket.c
->
->
->
->
->
-> index 13966f1..d65a0ea 100644
->
->
->
->
->
-> --- a/migration/socket.c
->
->
->
->
->
-> +++ b/migration/socket.c
->
->
->
->
->
-> @@ -147,8 +147,9 @@ static gboolean
->
-> socket_accept_incoming_migration(QIOChannel *ioc,
->
->
->
->
->
->       }
->
->
->
->
->
->
->
->
->
->
->
->       trace_migration_socket_incoming_accepted()
->
->
->
->
->
->
->
->
->
->
->
->       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
->
->
->
->
->
-> +    qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> QIO_CHANNEL_FEATURE_SHUTDOWN)
->
->
->
->
->
->       migration_channel_process_incoming(migrate_get_current(),
->
->
->
->
->
->                                          QIO_CHANNEL(sioc))
->
->
->
->
->
->       object_unref(OBJECT(sioc))
->
->
->
->
->
->
->
->
->
-> My test will not hang any more.
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
-> åå§é®ä»¶
->
->
->
->
->
->
->
-> åä»¶äººï¼ address@hidden
->
-> æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
->
-> æéäººï¼ address@hidden address@hidden
->
-> æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
->
-> ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
->
->
->
->
->
->
->
->
->
->
->
-> Hi,Wang.
->
->
->
-> You can test this branch:
->
->
->
->
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
->
->
->
-> and please follow wiki ensure your own configuration correctly.
->
->
->
->
-http://wiki.qemu-project.org/Features/COLO
->
->
->
->
->
-> Thanks
->
->
->
-> Zhang Chen
->
->
->
->
->
-> On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> ï¼
->
-> ï¼ hi.
->
-> ï¼
->
-> ï¼ I test the git qemu master have the same problem.
->
-> ï¼
->
-> ï¼ (gdb) bt
->
-> ï¼
->
-> ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
->
-> ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> ï¼
->
-> ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
->
-> ï¼ (address@hidden, address@hidden "",
->
-> ï¼ address@hidden, address@hidden) at io/channel.c:114
->
-> ï¼
->
-> ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
->
-> ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
->
-> ï¼ migration/qemu-file-channel.c:78
->
-> ï¼
->
-> ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
->
-> ï¼ migration/qemu-file.c:295
->
-> ï¼
->
-> ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
->
-> ï¼ address@hidden) at migration/qemu-file.c:555
->
-> ï¼
->
-> ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
->
-> ï¼ migration/qemu-file.c:568
->
-> ï¼
->
-> ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
->
-> ï¼ migration/qemu-file.c:648
->
-> ï¼
->
-> ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
->
-> ï¼ address@hidden) at migration/colo.c:244
->
-> ï¼
->
-> ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
->
-> ï¼ outï¼, address@hidden,
->
-> ï¼ address@hidden)
->
-> ï¼
->
-> ï¼     at migration/colo.c:264
->
-> ï¼
->
-> ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
->
-> ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
->
-> ï¼
->
-> ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> ï¼
->
-> ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> ï¼
->
-> ï¼ (gdb) p ioc-ï¼name
->
-> ï¼
->
-> ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> ï¼
->
-> ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> ï¼
->
-> ï¼ $3 = 0
->
-> ï¼
->
-> ï¼
->
-> ï¼ (gdb) bt
->
-> ï¼
->
-> ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
->
-> ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> ï¼
->
-> ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
->
-> ï¼ gmain.c:3054
->
-> ï¼
->
-> ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
->
-> ï¼ address@hidden) at gmain.c:3630
->
-> ï¼
->
-> ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> ï¼
->
-> ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
->
-> ï¼ util/main-loop.c:258
->
-> ï¼
->
-> ï¼ #5  main_loop_wait (address@hidden) at
->
-> ï¼ util/main-loop.c:506
->
-> ï¼
->
-> ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> ï¼
->
-> ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
->
-> ï¼ outï¼) at vl.c:4709
->
-> ï¼
->
-> ï¼ (gdb) p ioc-ï¼features
->
-> ï¼
->
-> ï¼ $1 = 6
->
-> ï¼
->
-> ï¼ (gdb) p ioc-ï¼name
->
-> ï¼
->
-> ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
-> ï¼
->
-> ï¼
->
-> ï¼ May be socket_accept_incoming_migration should
->
-> ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
-> ï¼
->
-> ï¼
->
-> ï¼ thank you.
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼ åå§é®ä»¶
->
-> ï¼ address@hidden
->
-> ï¼ address@hidden
->
-> ï¼ address@hidden@huawei.comï¼
->
-> ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
->
-> ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
->
-> ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
->
-> ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->
-> ï¼ ï¼
->
-> ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
->
-> ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
->
-> ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
->
-> ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
->
-> ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
->
-> ï¼ ï¼
->
-> ï¼ ï¼ I found that the colo in qemu is not complete yet.
->
-> ï¼ ï¼ Do the colo have any plan for development?
->
-> ï¼
->
-> ï¼ Yes, We are developing. You can see some of patch we pushing.
->
-> ï¼
->
-> ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
->
-> ï¼
->
-> ï¼ In our internal version can run it successfully,
->
-> ï¼ The failover detail you can ask Zhanghailiang for help.
->
-> ï¼ Next time if you have some question about COLO,
->
-> ï¼ please cc me and zhanghailiang address@hidden
->
-> ï¼
->
-> ï¼
->
-> ï¼ Thanks
->
-> ï¼ Zhang Chen
->
-> ï¼
->
-> ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼ centos7.2+qemu2.7.50
->
-> ï¼ ï¼ (gdb) bt
->
-> ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
->
-> ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
->
-> ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0)
->
-> at
->
-> ï¼ ï¼ io/channel-socket.c:497
->
-> ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
->
-> ï¼ ï¼ address@hidden "", address@hidden,
->
-> ï¼ ï¼ address@hidden) at io/channel.c:97
->
-> ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
->
-> ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
->
-> ï¼ ï¼ migration/qemu-file-channel.c:78
->
-> ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
->
-> ï¼ ï¼ migration/qemu-file.c:257
->
-> ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
->
-> ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
->
-> ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
->
-> ï¼ ï¼ migration/qemu-file.c:523
->
-> ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
->
-> ï¼ ï¼ migration/qemu-file.c:603
->
-> ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
->
-> ï¼ ï¼ address@hidden) at migration/colo.c:215
->
-> ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
->
-> ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
->
-> ï¼ ï¼ migration/colo.c:546
->
-> ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
->
-> ï¼ ï¼ migration/colo.c:649
->
-> ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
->
-> ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼ --
->
-> ï¼ ï¼ View this message in context:
->
->
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
->
-> ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼
->
-> ï¼ --
->
-> ï¼ Thanks
->
-> ï¼ Zhang Chen
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-* Hailiang Zhang (address@hidden) wrote:
-Hi,
-
-Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-
-Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-case COLO thread/incoming thread is stuck in read/write() while do failover,
-but it didn't take effect, because all the fd used by COLO (also migration)
-has been wrapped by qio channel, and it will not call the shutdown API if
-we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-
-Cc: Dr. David Alan Gilbert <address@hidden>
-
-I doubted migration cancel has the same problem, it may be stuck in write()
-if we tried to cancel migration.
-
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
-**errp)
-{
-     qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
-     migration_channel_connect(s, ioc, NULL);
-     ... ...
-We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-and the
-migrate_fd_cancel()
-{
-  ... ...
-     if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-         qemu_file_shutdown(f);  --> This will not take effect. No ?
-     }
-}
-(cc'd in Daniel Berrange).
-I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
-at the
-top of qio_channel_socket_new;  so I think that's safe isn't it?
-Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-Dave
-Thanks,
-Hailiang
-
-On 2017/3/21 16:10, address@hidden wrote:
-Thank youã
-
-I have test areadyã
-
-When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node qemu 
-will not produce the problem,but Primary Node panic canã
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-
-
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-       }
-
-
-
-
-
-       trace_migration_socket_incoming_accepted()
-
-
-
-
-
-       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-       migration_channel_process_incoming(migrate_get_current(),
-
-
-                                          QIO_CHANNEL(sioc))
-
-
-       object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼
-ï¼ hi.
-ï¼
-ï¼ I test the git qemu master have the same problem.
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼
-ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ (address@hidden, address@hidden "",
-ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼
-ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ migration/qemu-file-channel.c:78
-ï¼
-ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ migration/qemu-file.c:295
-ï¼
-ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼
-ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ migration/qemu-file.c:568
-ï¼
-ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ migration/qemu-file.c:648
-ï¼
-ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ address@hidden) at migration/colo.c:244
-ï¼
-ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ outï¼, address@hidden,
-ï¼ address@hidden)
-ï¼
-ï¼     at migration/colo.c:264
-ï¼
-ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼
-ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼
-ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼
-ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼
-ï¼ $3 = 0
-ï¼
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼
-ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ gmain.c:3054
-ï¼
-ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ address@hidden) at gmain.c:3630
-ï¼
-ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼
-ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ util/main-loop.c:258
-ï¼
-ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ util/main-loop.c:506
-ï¼
-ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼
-ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ outï¼) at vl.c:4709
-ï¼
-ï¼ (gdb) p ioc-ï¼features
-ï¼
-ï¼ $1 = 6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼
-ï¼
-ï¼ May be socket_accept_incoming_migration should
-ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼
-ï¼
-ï¼ thank you.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼ address@hidden
-ï¼ address@hidden
-ï¼ address@hidden@huawei.comï¼
-ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼
-ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼
-ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ Do the colo have any plan for development?
-ï¼
-ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼
-ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼
-ï¼ In our internal version can run it successfully,
-ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ Next time if you have some question about COLO,
-ï¼ please cc me and zhanghailiang address@hidden
-ï¼
-ï¼
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ --
-ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼
-ï¼ --
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-.
-
-* Hailiang Zhang (address@hidden) wrote:
->
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
->
-> * Hailiang Zhang (address@hidden) wrote:
->
-> > Hi,
->
-> >
->
-> > Thanks for reporting this, and i confirmed it in my test, and it is a bug.
->
-> >
->
-> > Though we tried to call qemu_file_shutdown() to shutdown the related fd,
->
-> > in
->
-> > case COLO thread/incoming thread is stuck in read/write() while do
->
-> > failover,
->
-> > but it didn't take effect, because all the fd used by COLO (also
->
-> > migration)
->
-> > has been wrapped by qio channel, and it will not call the shutdown API if
->
-> > we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> > QIO_CHANNEL_FEATURE_SHUTDOWN).
->
-> >
->
-> > Cc: Dr. David Alan Gilbert <address@hidden>
->
-> >
->
-> > I doubted migration cancel has the same problem, it may be stuck in
->
-> > write()
->
-> > if we tried to cancel migration.
->
-> >
->
-> > void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
->
-> > Error **errp)
->
-> > {
->
-> >      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
->
-> >      migration_channel_connect(s, ioc, NULL);
->
-> >      ... ...
->
-> > We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> > QIO_CHANNEL_FEATURE_SHUTDOWN) above,
->
-> > and the
->
-> > migrate_fd_cancel()
->
-> > {
->
-> >   ... ...
->
-> >      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
->
-> >          qemu_file_shutdown(f);  --> This will not take effect. No ?
->
-> >      }
->
-> > }
->
->
->
-> (cc'd in Daniel Berrange).
->
-> I see that we call qio_channel_set_feature(ioc,
->
-> QIO_CHANNEL_FEATURE_SHUTDOWN); at the
->
-> top of qio_channel_socket_new;  so I think that's safe isn't it?
->
->
->
->
-Hmm, you are right, this problem is only exist for the migration incoming fd,
->
-thanks.
-Yes, and I don't think we normally do a cancel on the incoming side of a 
-migration.
-
-Dave
-
->
-> Dave
->
->
->
-> > Thanks,
->
-> > Hailiang
->
-> >
->
-> > On 2017/3/21 16:10, address@hidden wrote:
->
-> > > Thank youã
->
-> > >
->
-> > > I have test areadyã
->
-> > >
->
-> > > When the Primary Node panic,the Secondary Node qemu hang at the same
->
-> > > placeã
->
-> > >
->
-> > > Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary
->
-> > > Node qemu will not produce the problem,but Primary Node panic canã
->
-> > >
->
-> > > I think due to the feature of channel does not support
->
-> > > QIO_CHANNEL_FEATURE_SHUTDOWN.
->
-> > >
->
-> > >
->
-> > > when failover,channel_shutdown could not shut down the channel.
->
-> > >
->
-> > >
->
-> > > so the colo_process_incoming_thread will hang at recvmsg.
->
-> > >
->
-> > >
->
-> > > I test a patch:
->
-> > >
->
-> > >
->
-> > > diff --git a/migration/socket.c b/migration/socket.c
->
-> > >
->
-> > >
->
-> > > index 13966f1..d65a0ea 100644
->
-> > >
->
-> > >
->
-> > > --- a/migration/socket.c
->
-> > >
->
-> > >
->
-> > > +++ b/migration/socket.c
->
-> > >
->
-> > >
->
-> > > @@ -147,8 +147,9 @@ static gboolean
->
-> > > socket_accept_incoming_migration(QIOChannel *ioc,
->
-> > >
->
-> > >
->
-> > >        }
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >        trace_migration_socket_incoming_accepted()
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >        qio_channel_set_name(QIO_CHANNEL(sioc),
->
-> > > "migration-socket-incoming")
->
-> > >
->
-> > >
->
-> > > +    qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> > > QIO_CHANNEL_FEATURE_SHUTDOWN)
->
-> > >
->
-> > >
->
-> > >        migration_channel_process_incoming(migrate_get_current(),
->
-> > >
->
-> > >
->
-> > >                                           QIO_CHANNEL(sioc))
->
-> > >
->
-> > >
->
-> > >        object_unref(OBJECT(sioc))
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > My test will not hang any more.
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > åå§é®ä»¶
->
-> > >
->
-> > >
->
-> > >
->
-> > > åä»¶äººï¼ address@hidden
->
-> > > æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
->
-> > > æéäººï¼ address@hidden address@hidden
->
-> > > æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
->
-> > > ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > Hi,Wang.
->
-> > >
->
-> > > You can test this branch:
->
-> > >
->
-> > >
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
->
-> > >
->
-> > > and please follow wiki ensure your own configuration correctly.
->
-> > >
->
-> > >
-http://wiki.qemu-project.org/Features/COLO
->
-> > >
->
-> > >
->
-> > > Thanks
->
-> > >
->
-> > > Zhang Chen
->
-> > >
->
-> > >
->
-> > > On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> > > ï¼
->
-> > > ï¼ hi.
->
-> > > ï¼
->
-> > > ï¼ I test the git qemu master have the same problem.
->
-> > > ï¼
->
-> > > ï¼ (gdb) bt
->
-> > > ï¼
->
-> > > ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
->
-> > > ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> > > ï¼
->
-> > > ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
->
-> > > ï¼ (address@hidden, address@hidden "",
->
-> > > ï¼ address@hidden, address@hidden) at io/channel.c:114
->
-> > > ï¼
->
-> > > ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
->
-> > > ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
->
-> > > ï¼ migration/qemu-file-channel.c:78
->
-> > > ï¼
->
-> > > ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
->
-> > > ï¼ migration/qemu-file.c:295
->
-> > > ï¼
->
-> > > ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
->
-> > > ï¼ address@hidden) at migration/qemu-file.c:555
->
-> > > ï¼
->
-> > > ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
->
-> > > ï¼ migration/qemu-file.c:568
->
-> > > ï¼
->
-> > > ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
->
-> > > ï¼ migration/qemu-file.c:648
->
-> > > ï¼
->
-> > > ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
->
-> > > ï¼ address@hidden) at migration/colo.c:244
->
-> > > ï¼
->
-> > > ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
->
-> > > ï¼ outï¼, address@hidden,
->
-> > > ï¼ address@hidden)
->
-> > > ï¼
->
-> > > ï¼     at migration/colo.c:264
->
-> > > ï¼
->
-> > > ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
->
-> > > ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
->
-> > > ï¼
->
-> > > ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> > > ï¼
->
-> > > ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> > > ï¼
->
-> > > ï¼ (gdb) p ioc-ï¼name
->
-> > > ï¼
->
-> > > ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> > > ï¼
->
-> > > ï¼ (gdb) p ioc-ï¼features        Do not support
->
-> > > QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> > > ï¼
->
-> > > ï¼ $3 = 0
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ (gdb) bt
->
-> > > ï¼
->
-> > > ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
->
-> > > ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> > > ï¼
->
-> > > ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
->
-> > > ï¼ gmain.c:3054
->
-> > > ï¼
->
-> > > ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
->
-> > > ï¼ address@hidden) at gmain.c:3630
->
-> > > ï¼
->
-> > > ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> > > ï¼
->
-> > > ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
->
-> > > ï¼ util/main-loop.c:258
->
-> > > ï¼
->
-> > > ï¼ #5  main_loop_wait (address@hidden) at
->
-> > > ï¼ util/main-loop.c:506
->
-> > > ï¼
->
-> > > ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> > > ï¼
->
-> > > ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
->
-> > > ï¼ outï¼) at vl.c:4709
->
-> > > ï¼
->
-> > > ï¼ (gdb) p ioc-ï¼features
->
-> > > ï¼
->
-> > > ï¼ $1 = 6
->
-> > > ï¼
->
-> > > ï¼ (gdb) p ioc-ï¼name
->
-> > > ï¼
->
-> > > ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ May be socket_accept_incoming_migration should
->
-> > > ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ thank you.
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ åå§é®ä»¶
->
-> > > ï¼ address@hidden
->
-> > > ï¼ address@hidden
->
-> > > ï¼ address@hidden@huawei.comï¼
->
-> > > ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
->
-> > > ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
->
-> > > ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
->
-> > > ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
->
-> > > ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
->
-> > > ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
->
-> > > ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
->
-> > > ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼ I found that the colo in qemu is not complete yet.
->
-> > > ï¼ ï¼ Do the colo have any plan for development?
->
-> > > ï¼
->
-> > > ï¼ Yes, We are developing. You can see some of patch we pushing.
->
-> > > ï¼
->
-> > > ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
->
-> > > ï¼
->
-> > > ï¼ In our internal version can run it successfully,
->
-> > > ï¼ The failover detail you can ask Zhanghailiang for help.
->
-> > > ï¼ Next time if you have some question about COLO,
->
-> > > ï¼ please cc me and zhanghailiang address@hidden
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ Thanks
->
-> > > ï¼ Zhang Chen
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼ centos7.2+qemu2.7.50
->
-> > > ï¼ ï¼ (gdb) bt
->
-> > > ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
->
-> > > ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized
->
-> > > outï¼,
->
-> > > ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0,
->
-> > > errp=0x0) at
->
-> > > ï¼ ï¼ io/channel-socket.c:497
->
-> > > ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
->
-> > > ï¼ ï¼ address@hidden "", address@hidden,
->
-> > > ï¼ ï¼ address@hidden) at io/channel.c:97
->
-> > > ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized
->
-> > > outï¼,
->
-> > > ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
->
-> > > ï¼ ï¼ migration/qemu-file-channel.c:78
->
-> > > ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
->
-> > > ï¼ ï¼ migration/qemu-file.c:257
->
-> > > ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
->
-> > > ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
->
-> > > ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
->
-> > > ï¼ ï¼ migration/qemu-file.c:523
->
-> > > ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
->
-> > > ï¼ ï¼ migration/qemu-file.c:603
->
-> > > ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
->
-> > > ï¼ ï¼ address@hidden) at migration/colo.c:215
->
-> > > ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message
->
-> > > (errp=0x7f3d62bfaa48,
->
-> > > ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
->
-> > > ï¼ ï¼ migration/colo.c:546
->
-> > > ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
->
-> > > ï¼ ï¼ migration/colo.c:649
->
-> > > ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from
->
-> > > /lib64/libpthread.so.0
->
-> > > ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼ --
->
-> > > ï¼ ï¼ View this message in context:
->
-> > >
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
->
-> > > ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼
->
-> > > ï¼ --
->
-> > > ï¼ Thanks
->
-> > > ï¼ Zhang Chen
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > >
->
-> >
->
-> --
->
-> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
->
-> .
->
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
diff --git a/classification_output/04/other/66743673 b/classification_output/04/other/66743673
deleted file mode 100644
index 5562ad60b..000000000
--- a/classification_output/04/other/66743673
+++ /dev/null
@@ -1,372 +0,0 @@
-other: 0.967
-assembly: 0.954
-semantic: 0.951
-boot: 0.938
-instruction: 0.930
-network: 0.930
-graphic: 0.927
-socket: 0.926
-vnc: 0.926
-device: 0.926
-KVM: 0.891
-mistranslation: 0.855
-
-[Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT / CMP_LEG bits
-
-Hi Community,
-
-This email contains 3 bugs appear to share the same root cause.
-
-[1] We ran into the following warnings when running QEMU v10.0.0 in TCG mode:
-
-qemu-system-x86_64 \
-  -machine q35 \
-  -m 4G -smp 4 \
-  -kernel ./arch/x86/boot/bzImage \
-  -bios /usr/share/ovmf/OVMF.fd \
-  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
-  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
-  -nographic \
-  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.01H:EDX.ht [bit 28]
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.80000001H:ECX.cmp-legacy [bit 1]
-(repeats 4 times, once per vCPU)
-Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up CPUID_HT in
-x86_cpu_expand_features() instead of cpu_x86_cpuid()" is what introduced the
-warnings.
-Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and
-CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT support, these
-bits trigger the warnings above.
-[2] Also, Zhao pointed me to a similar report on GitLab:
-https://gitlab.com/qemu-project/qemu/-/issues/2894
-The symptoms there look identical to what we're seeing.
-By convention we file one issue per email, but these two appear to share the
-same root cause, so I'm describing them together here.
-[3] My colleague Alan noticed what appears to be a related problem: if we launch
-a guest with '-cpu <model>,-ht --enable-kvm', which means explicitly removing
-the ht flag, but the guest still reports HT(cat /proc/cpuinfo in linux guest)
-enabled. In other words, under KVM the ht bit seems to be forced on even when
-the user tries to disable it.
-Best regards,
-Ewan
-
-On 4/29/25 11:02 AM, Ewan Hai wrote:
-Hi Community,
-
-This email contains 3 bugs appear to share the same root cause.
-
-[1] We ran into the following warnings when running QEMU v10.0.0 in TCG mode:
-
-qemu-system-x86_64 \
- Â  -machine q35 \
- Â  -m 4G -smp 4 \
- Â  -kernel ./arch/x86/boot/bzImage \
- Â  -bios /usr/share/ovmf/OVMF.fd \
- Â  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
- Â  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
- Â  -nographic \
- Â  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.01H:EDX.ht [bit 28]
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.80000001H:ECX.cmp-legacy [bit 1]
-(repeats 4 times, once per vCPU)
-Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up CPUID_HT in
-x86_cpu_expand_features() instead of cpu_x86_cpuid()" is what introduced the
-warnings.
-Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and
-CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT support, these
-bits trigger the warnings above.
-[2] Also, Zhao pointed me to a similar report on GitLab:
-https://gitlab.com/qemu-project/qemu/-/issues/2894
-The symptoms there look identical to what we're seeing.
-By convention we file one issue per email, but these two appear to share the
-same root cause, so I'm describing them together here.
-[3] My colleague Alan noticed what appears to be a related problem: if we launch
-a guest with '-cpu <model>,-ht --enable-kvm', which means explicitly removing
-the ht flag, but the guest still reports HT(cat /proc/cpuinfo in linux guest)
-enabled. In other words, under KVM the ht bit seems to be forced on even when
-the user tries to disable it.
-XiaoYao reminded me that issue [3] stems from a different patch. Please ignore
-it for nowâI'll start a separate thread to discuss that one independently.
-Best regards,
-Ewan
-
-On 4/29/2025 11:02 AM, Ewan Hai wrote:
-Hi Community,
-
-This email contains 3 bugs appear to share the same root cause.
-[1] We ran into the following warnings when running QEMU v10.0.0 in TCG
-mode:
-qemu-system-x86_64 \
- Â  -machine q35 \
- Â  -m 4G -smp 4 \
- Â  -kernel ./arch/x86/boot/bzImage \
- Â  -bios /usr/share/ovmf/OVMF.fd \
- Â  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
- Â  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
- Â  -nographic \
- Â  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.01H:EDX.ht [bit 28]
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.80000001H:ECX.cmp-legacy [bit 1]
-(repeats 4 times, once per vCPU)
-Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up
-CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is
-what introduced the warnings.
-Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28])
-and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT
-support, these bits trigger the warnings above.
-[2] Also, Zhao pointed me to a similar report on GitLab:
-https://gitlab.com/qemu-project/qemu/-/issues/2894
-The symptoms there look identical to what we're seeing.
-By convention we file one issue per email, but these two appear to share
-the same root cause, so I'm describing them together here.
-It was caused by my two patches. I think the fix can be as follow.
-If no objection from the community, I can submit the formal patch.
-
-diff --git a/target/i386/cpu.c b/target/i386/cpu.c
-index 1f970aa4daa6..fb95aadd6161 100644
---- a/target/i386/cpu.c
-+++ b/target/i386/cpu.c
-@@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
-vendor1,
-CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \
-           CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \
-           CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \
--          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE)
-+          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \
-+          CPUID_HT)
-           /* partly implemented:
-           CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */
-           /* missing:
--          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */
-+          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */
-
- /*
-  * Kernel-only features that can be shown to usermode programs even if
-@@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
-vendor1,
-#define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
-           CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \
--          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES)
-+          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \
-+          CPUID_EXT3_CMP_LEG)
-
- #define TCG_EXT4_FEATURES 0
-[3] My colleague Alan noticed what appears to be a related problem: if
-we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
-explicitly removing the ht flag, but the guest still reports HT(cat /
-proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
-bit seems to be forced on even when the user tries to disable it.
-This has been the behavior of QEMU for many years, not some regression
-introduced by my patches. We can discuss how to address it separately.
-Best regards,
-Ewan
-
-On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote:
->
-Date: Tue, 29 Apr 2025 13:55:59 +0800
->
-From: Xiaoyao Li <xiaoyao.li@intel.com>
->
-Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT /
->
-CMP_LEG bits
->
->
-On 4/29/2025 11:02 AM, Ewan Hai wrote:
->
-> Hi Community,
->
->
->
-> This email contains 3 bugs appear to share the same root cause.
->
->
->
-> [1] We ran into the following warnings when running QEMU v10.0.0 in TCG
->
-> mode:
->
->
->
-> qemu-system-x86_64 \
->
->  Â  -machine q35 \
->
->  Â  -m 4G -smp 4 \
->
->  Â  -kernel ./arch/x86/boot/bzImage \
->
->  Â  -bios /usr/share/ovmf/OVMF.fd \
->
->  Â  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
->
->  Â  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
->
->  Â  -nographic \
->
->  Â  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
->
->
->
-> qemu-system-x86_64: warning: TCG doesn't support requested feature:
->
-> CPUID.01H:EDX.ht [bit 28]
->
-> qemu-system-x86_64: warning: TCG doesn't support requested feature:
->
-> CPUID.80000001H:ECX.cmp-legacy [bit 1]
->
-> (repeats 4 times, once per vCPU)
->
->
->
-> Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up
->
-> CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is
->
-> what introduced the warnings.
->
->
->
-> Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28])
->
-> and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT
->
-> support, these bits trigger the warnings above.
->
->
->
-> [2] Also, Zhao pointed me to a similar report on GitLab:
->
->
-https://gitlab.com/qemu-project/qemu/-/issues/2894
->
-> The symptoms there look identical to what we're seeing.
->
->
->
-> By convention we file one issue per email, but these two appear to share
->
-> the same root cause, so I'm describing them together here.
->
->
-It was caused by my two patches. I think the fix can be as follow.
->
-If no objection from the community, I can submit the formal patch.
->
->
-diff --git a/target/i386/cpu.c b/target/i386/cpu.c
->
-index 1f970aa4daa6..fb95aadd6161 100644
->
---- a/target/i386/cpu.c
->
-+++ b/target/i386/cpu.c
->
-@@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
->
-vendor1,
->
-CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \
->
-CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \
->
-CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \
->
--          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE)
->
-+          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \
->
-+          CPUID_HT)
->
-/* partly implemented:
->
-CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */
->
-/* missing:
->
--          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */
->
-+          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */
->
->
-/*
->
-* Kernel-only features that can be shown to usermode programs even if
->
-@@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
->
-vendor1,
->
->
-#define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
->
-CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \
->
--          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES)
->
-+          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \
->
-+          CPUID_EXT3_CMP_LEG)
->
->
-#define TCG_EXT4_FEATURES 0
-This fix is fine for me...at least from SDM, HTT depends on topology and
-it should exist when user sets "-smp 4".
-
->
-> [3] My colleague Alan noticed what appears to be a related problem: if
->
-> we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
->
-> explicitly removing the ht flag, but the guest still reports HT(cat
->
-> /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
->
-> bit seems to be forced on even when the user tries to disable it.
->
->
-XiaoYao reminded me that issue [3] stems from a different patch. Please
->
-ignore it for nowâI'll start a separate thread to discuss that one
->
-independently.
-I haven't found any other thread :-).
-
-By the way, just curious, in what cases do you need to disbale the HT
-flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
-enough?
-
-As for the â-htâ behavior, I'm also unsure whether this should be fixed
-or not - one possible consideration is whether â-htâ would be useful.
-
-On 5/8/25 5:04 PM, Zhao Liu wrote:
-[3] My colleague Alan noticed what appears to be a related problem: if
-we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
-explicitly removing the ht flag, but the guest still reports HT(cat
-/proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
-bit seems to be forced on even when the user tries to disable it.
-XiaoYao reminded me that issue [3] stems from a different patch. Please
-ignore it for nowâI'll start a separate thread to discuss that one
-independently.
-I haven't found any other thread :-).
-Please refer to
-https://lore.kernel.org/all/db6ae3bb-f4e5-4719-9beb-623fcff56af2@zhaoxin.com/
-.
-By the way, just curious, in what cases do you need to disbale the HT
-flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
-enough?
-
-As for the â-htâ behavior, I'm also unsure whether this should be fixed
-or not - one possible consideration is whether â-htâ would be useful.
-I wasn't trying to target any specific use case, using "-ht" was simply a way to
-check how the ht feature behaves under both KVM and TCG. There's no special
-workload behind it; I just wanted to confirm that the flag is respected (or not)
-in each mode.
-
diff --git a/classification_output/04/other/68897003 b/classification_output/04/other/68897003
deleted file mode 100644
index 5076ebda5..000000000
--- a/classification_output/04/other/68897003
+++ /dev/null
@@ -1,724 +0,0 @@
-other: 0.714
-assembly: 0.697
-graphic: 0.694
-semantic: 0.671
-device: 0.647
-instruction: 0.641
-network: 0.614
-KVM: 0.598
-socket: 0.585
-boot: 0.569
-mistranslation: 0.535
-vnc: 0.525
-
-[Qemu-devel] [BUG] VM abort after migration
-
-Hi guys,
-
-We found a qemu core in our testing environment, the assertion
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
-the bus->irq_count[i] is '-1'.
-
-Through analysis, it was happened after VM migration and we think
-it was caused by the following sequence:
-
-*Migration Source*
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
-2. save E1000:
-   e1000_pre_save
-    e1000_mit_timer
-     set_interrupt_cause
-      pci_set_irq --> update pci_dev->irq_state to 1 and
-                  update bus->irq_count[x] to 1 ( new )
-    the irq_state sent to dest.
-
-*Migration Dest*
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
-  the irq_state maybe change to 0 and bus->irq_count[x] will become
-  -1 in this situation.
-3. do VM reboot then the assertion will be triggered.
-
-We also found some guys faced the similar problem:
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
-Is there some patches to fix this problem ?
-Can we save pcibus state after all the pci devs are saved ?
-
-Thanks,
-Longpeng(Mike)
-
-* longpeng (address@hidden) wrote:
->
-Hi guys,
->
->
-We found a qemu core in our testing environment, the assertion
->
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
->
-the bus->irq_count[i] is '-1'.
->
->
-Through analysis, it was happened after VM migration and we think
->
-it was caused by the following sequence:
->
->
-*Migration Source*
->
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
->
-2. save E1000:
->
-e1000_pre_save
->
-e1000_mit_timer
->
-set_interrupt_cause
->
-pci_set_irq --> update pci_dev->irq_state to 1 and
->
-update bus->irq_count[x] to 1 ( new )
->
-the irq_state sent to dest.
->
->
-*Migration Dest*
->
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
->
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
->
-the irq_state maybe change to 0 and bus->irq_count[x] will become
->
--1 in this situation.
->
-3. do VM reboot then the assertion will be triggered.
->
->
-We also found some guys faced the similar problem:
->
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
->
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
->
->
-Is there some patches to fix this problem ?
-I don't remember any.
-
->
-Can we save pcibus state after all the pci devs are saved ?
-Does this problem only happen with e1000? I think so.
-If it's only e1000 I think we should fix it - I think once the VM is
-stopped for doing the device migration it shouldn't be raising
-interrupts.
-
-Dave
-
->
-Thanks,
->
-Longpeng(Mike)
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
-* longpeng (address@hidden) wrote:
-Hi guys,
-
-We found a qemu core in our testing environment, the assertion
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
-the bus->irq_count[i] is '-1'.
-
-Through analysis, it was happened after VM migration and we think
-it was caused by the following sequence:
-
-*Migration Source*
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
-2. save E1000:
-    e1000_pre_save
-     e1000_mit_timer
-      set_interrupt_cause
-       pci_set_irq --> update pci_dev->irq_state to 1 and
-                   update bus->irq_count[x] to 1 ( new )
-     the irq_state sent to dest.
-
-*Migration Dest*
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
-   the irq_state maybe change to 0 and bus->irq_count[x] will become
-   -1 in this situation.
-3. do VM reboot then the assertion will be triggered.
-
-We also found some guys faced the similar problem:
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
-Is there some patches to fix this problem ?
-I don't remember any.
-Can we save pcibus state after all the pci devs are saved ?
-Does this problem only happen with e1000? I think so.
-If it's only e1000 I think we should fix it - I think once the VM is
-stopped for doing the device migration it shouldn't be raising
-interrupts.
-I wonder maybe we can simply fix this by no setting ICS on pre_save()
-but scheduling mit timer unconditionally in post_load().
-Thanks
-Dave
-Thanks,
-Longpeng(Mike)
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-å¨ 2019/7/10 11:25, Jason Wang åé:
->
->
-On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
->
-> * longpeng (address@hidden) wrote:
->
->> Hi guys,
->
->>
->
->> We found a qemu core in our testing environment, the assertion
->
->> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
->
->> the bus->irq_count[i] is '-1'.
->
->>
->
->> Through analysis, it was happened after VM migration and we think
->
->> it was caused by the following sequence:
->
->>
->
->> *Migration Source*
->
->> 1. save bus pci.0 state, including irq_count[x] ( =0 , old )
->
->> 2. save E1000:
->
->> Â Â Â  e1000_pre_save
->
->> Â Â Â Â  e1000_mit_timer
->
->> Â Â Â Â Â  set_interrupt_cause
->
->> Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
->
->> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
->
->> Â Â Â Â  the irq_state sent to dest.
->
->>
->
->> *Migration Dest*
->
->> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
->
->> 2. If the e1000 need change irqline , it would call to pci_irq_handler(),
->
->> Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
->
->> Â Â  -1 in this situation.
->
->> 3. do VM reboot then the assertion will be triggered.
->
->>
->
->> We also found some guys faced the similar problem:
->
->> [1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
->
->> [2]
-https://bugs.launchpad.net/qemu/+bug/1702621
->
->>
->
->> Is there some patches to fix this problem ?
->
-> I don't remember any.
->
->
->
->> Can we save pcibus state after all the pci devs are saved ?
->
-> Does this problem only happen with e1000? I think so.
->
-> If it's only e1000 I think we should fix it - I think once the VM is
->
-> stopped for doing the device migration it shouldn't be raising
->
-> interrupts.
->
->
->
-I wonder maybe we can simply fix this by no setting ICS on pre_save() but
->
-scheduling mit timer unconditionally in post_load().
->
-I also think this is a bug of e1000 because we find more cores with the same
-frame thease days.
-
-I'm not familiar with e1000 so hope someone could fix it, thanks. :)
-
->
-Thanks
->
->
->
->
->
-> Dave
->
->
->
->> Thanks,
->
->> Longpeng(Mike)
->
-> --
->
-> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
-.
->
--- 
-Regards,
-Longpeng(Mike)
-
-On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote:
-å¨ 2019/7/10 11:25, Jason Wang åé:
-On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
-* longpeng (address@hidden) wrote:
-Hi guys,
-
-We found a qemu core in our testing environment, the assertion
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
-the bus->irq_count[i] is '-1'.
-
-Through analysis, it was happened after VM migration and we think
-it was caused by the following sequence:
-
-*Migration Source*
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
-2. save E1000:
- Â Â Â  e1000_pre_save
- Â Â Â Â  e1000_mit_timer
- Â Â Â Â Â  set_interrupt_cause
- Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
- Â Â Â Â  the irq_state sent to dest.
-
-*Migration Dest*
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
- Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
- Â Â  -1 in this situation.
-3. do VM reboot then the assertion will be triggered.
-
-We also found some guys faced the similar problem:
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
-Is there some patches to fix this problem ?
-I don't remember any.
-Can we save pcibus state after all the pci devs are saved ?
-Does this problem only happen with e1000? I think so.
-If it's only e1000 I think we should fix it - I think once the VM is
-stopped for doing the device migration it shouldn't be raising
-interrupts.
-I wonder maybe we can simply fix this by no setting ICS on pre_save() but
-scheduling mit timer unconditionally in post_load().
-I also think this is a bug of e1000 because we find more cores with the same
-frame thease days.
-
-I'm not familiar with e1000 so hope someone could fix it, thanks. :)
-Draft a path in attachment, please test.
-
-Thanks
-Thanks
-Dave
-Thanks,
-Longpeng(Mike)
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-.
-0001-e1000-don-t-raise-interrupt-in-pre_save.patch
-Description:
-Text Data
-
-å¨ 2019/7/10 11:57, Jason Wang åé:
->
->
-On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote:
->
-> å¨ 2019/7/10 11:25, Jason Wang åé:
->
->> On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
->
->>> * longpeng (address@hidden) wrote:
->
->>>> Hi guys,
->
->>>>
->
->>>> We found a qemu core in our testing environment, the assertion
->
->>>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
->
->>>> the bus->irq_count[i] is '-1'.
->
->>>>
->
->>>> Through analysis, it was happened after VM migration and we think
->
->>>> it was caused by the following sequence:
->
->>>>
->
->>>> *Migration Source*
->
->>>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old )
->
->>>> 2. save E1000:
->
->>>> Â Â Â Â  e1000_pre_save
->
->>>> Â Â Â Â Â  e1000_mit_timer
->
->>>> Â Â Â Â Â Â  set_interrupt_cause
->
->>>> Â Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
->
->>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
->
->>>> Â Â Â Â Â  the irq_state sent to dest.
->
->>>>
->
->>>> *Migration Dest*
->
->>>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is
->
->>>> 1.
->
->>>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(),
->
->>>> Â Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
->
->>>> Â Â Â  -1 in this situation.
->
->>>> 3. do VM reboot then the assertion will be triggered.
->
->>>>
->
->>>> We also found some guys faced the similar problem:
->
->>>> [1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
->
->>>> [2]
-https://bugs.launchpad.net/qemu/+bug/1702621
->
->>>>
->
->>>> Is there some patches to fix this problem ?
->
->>> I don't remember any.
->
->>>
->
->>>> Can we save pcibus state after all the pci devs are saved ?
->
->>> Does this problem only happen with e1000? I think so.
->
->>> If it's only e1000 I think we should fix it - I think once the VM is
->
->>> stopped for doing the device migration it shouldn't be raising
->
->>> interrupts.
->
->>
->
->> I wonder maybe we can simply fix this by no setting ICS on pre_save() but
->
->> scheduling mit timer unconditionally in post_load().
->
->>
->
-> I also think this is a bug of e1000 because we find more cores with the same
->
-> frame thease days.
->
->
->
-> I'm not familiar with e1000 so hope someone could fix it, thanks. :)
->
->
->
->
-Draft a path in attachment, please test.
->
-Thanks. We'll test it for a few weeks and then give you the feedback. :)
-
->
-Thanks
->
->
->
->> Thanks
->
->>
->
->>
->
->>> Dave
->
->>>
->
->>>> Thanks,
->
->>>> Longpeng(Mike)
->
->>> --Â
->
->>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->> .
->
->>
--- 
-Regards,
-Longpeng(Mike)
-
-å¨ 2019/7/10 11:57, Jason Wang åé:
->
->
-On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote:
->
-> å¨ 2019/7/10 11:25, Jason Wang åé:
->
->> On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
->
->>> * longpeng (address@hidden) wrote:
->
->>>> Hi guys,
->
->>>>
->
->>>> We found a qemu core in our testing environment, the assertion
->
->>>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
->
->>>> the bus->irq_count[i] is '-1'.
->
->>>>
->
->>>> Through analysis, it was happened after VM migration and we think
->
->>>> it was caused by the following sequence:
->
->>>>
->
->>>> *Migration Source*
->
->>>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old )
->
->>>> 2. save E1000:
->
->>>> Â Â Â Â  e1000_pre_save
->
->>>> Â Â Â Â Â  e1000_mit_timer
->
->>>> Â Â Â Â Â Â  set_interrupt_cause
->
->>>> Â Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
->
->>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
->
->>>> Â Â Â Â Â  the irq_state sent to dest.
->
->>>>
->
->>>> *Migration Dest*
->
->>>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is
->
->>>> 1.
->
->>>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(),
->
->>>> Â Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
->
->>>> Â Â Â  -1 in this situation.
->
->>>> 3. do VM reboot then the assertion will be triggered.
->
->>>>
->
->>>> We also found some guys faced the similar problem:
->
->>>> [1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
->
->>>> [2]
-https://bugs.launchpad.net/qemu/+bug/1702621
->
->>>>
->
->>>> Is there some patches to fix this problem ?
->
->>> I don't remember any.
->
->>>
->
->>>> Can we save pcibus state after all the pci devs are saved ?
->
->>> Does this problem only happen with e1000? I think so.
->
->>> If it's only e1000 I think we should fix it - I think once the VM is
->
->>> stopped for doing the device migration it shouldn't be raising
->
->>> interrupts.
->
->>
->
->> I wonder maybe we can simply fix this by no setting ICS on pre_save() but
->
->> scheduling mit timer unconditionally in post_load().
->
->>
->
-> I also think this is a bug of e1000 because we find more cores with the same
->
-> frame thease days.
->
->
->
-> I'm not familiar with e1000 so hope someone could fix it, thanks. :)
->
->
->
->
-Draft a path in attachment, please test.
->
-Hi Jason,
-
-We've tested the patch for about two weeks, everything went well, thanks!
-
-Feel free to add my:
-Reported-and-tested-by: Longpeng <address@hidden>
-
->
-Thanks
->
->
->
->> Thanks
->
->>
->
->>
->
->>> Dave
->
->>>
->
->>>> Thanks,
->
->>>> Longpeng(Mike)
->
->>> --Â
->
->>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->> .
->
->>
--- 
-Regards,
-Longpeng(Mike)
-
-On 2019/7/27 ä¸å2:10, Longpeng (Mike) wrote:
-å¨ 2019/7/10 11:57, Jason Wang åé:
-On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote:
-å¨ 2019/7/10 11:25, Jason Wang åé:
-On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
-* longpeng (address@hidden) wrote:
-Hi guys,
-
-We found a qemu core in our testing environment, the assertion
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
-the bus->irq_count[i] is '-1'.
-
-Through analysis, it was happened after VM migration and we think
-it was caused by the following sequence:
-
-*Migration Source*
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
-2. save E1000:
- Â Â Â Â  e1000_pre_save
- Â Â Â Â Â  e1000_mit_timer
- Â Â Â Â Â Â  set_interrupt_cause
- Â Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
- Â Â Â Â Â  the irq_state sent to dest.
-
-*Migration Dest*
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
- Â Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
- Â Â Â  -1 in this situation.
-3. do VM reboot then the assertion will be triggered.
-
-We also found some guys faced the similar problem:
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
-Is there some patches to fix this problem ?
-I don't remember any.
-Can we save pcibus state after all the pci devs are saved ?
-Does this problem only happen with e1000? I think so.
-If it's only e1000 I think we should fix it - I think once the VM is
-stopped for doing the device migration it shouldn't be raising
-interrupts.
-I wonder maybe we can simply fix this by no setting ICS on pre_save() but
-scheduling mit timer unconditionally in post_load().
-I also think this is a bug of e1000 because we find more cores with the same
-frame thease days.
-
-I'm not familiar with e1000 so hope someone could fix it, thanks. :)
-Draft a path in attachment, please test.
-Hi Jason,
-
-We've tested the patch for about two weeks, everything went well, thanks!
-
-Feel free to add my:
-Reported-and-tested-by: Longpeng <address@hidden>
-Applied.
-
-Thanks
-Thanks
-Thanks
-Dave
-Thanks,
-Longpeng(Mike)
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-.
-
diff --git a/classification_output/04/other/70021271 b/classification_output/04/other/70021271
deleted file mode 100644
index b8e3e5353..000000000
--- a/classification_output/04/other/70021271
+++ /dev/null
@@ -1,7456 +0,0 @@
-other: 0.963
-graphic: 0.958
-KVM: 0.949
-semantic: 0.946
-mistranslation: 0.929
-assembly: 0.910
-vnc: 0.900
-device: 0.887
-instruction: 0.880
-socket: 0.873
-network: 0.873
-boot: 0.872
-
-[Qemu-devel] [BUG]Unassigned mem write during pci device hot-plug
-
-Hi all,
-
-In our test, we configured VM with several pci-bridges and a virtio-net nic 
-been attached with bus 4,
-After VM is startup, We ping this nic from host to judge if it is working 
-normally. Then, we hot add pci devices to this VM with bus 0.
-We  found the virtio-net NIC in bus 4 is not working (can not connect) 
-occasionally, as it kick virtio backend failure with error below:
-    Unassigned mem write 00000000fc803004 = 0x1
-
-memory-region: pci_bridge_pci
-  0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
-    00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
-      00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common
-      00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
-      00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device
-      00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify  <- io 
-mem unassigned
-      â¦
-
-We caught an exceptional address changing while this problem happened, show as 
-follow:
-Before pci_bridge_update_mappingsï¼
-      00000000fc000000-00000000fc1fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc000000-00000000fc1fffff
-      00000000fc200000-00000000fc3fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc200000-00000000fc3fffff
-      00000000fc400000-00000000fc5fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc400000-00000000fc5fffff
-      00000000fc600000-00000000fc7fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc600000-00000000fc7fffff
-      00000000fc800000-00000000fc9fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce
-      00000000fca00000-00000000fcbfffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fca00000-00000000fcbfffff
-      00000000fcc00000-00000000fcdfffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fcc00000-00000000fcdfffff
-      00000000fce00000-00000000fcffffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fce00000-00000000fcffffff
-
-After pci_bridge_update_mappingsï¼
-      00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fda00000-00000000fdbfffff
-      00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fdc00000-00000000fddfffff
-      00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fde00000-00000000fdffffff
-      00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe000000-00000000fe1fffff
-      00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe200000-00000000fe3fffff
-      00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe400000-00000000fe5fffff
-      00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe600000-00000000fe7fffff
-      00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe800000-00000000fe9fffff
-      fffffffffc800000-fffffffffc800000 (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress Space
-
-We have figured out why this address becomes this value,  according to pci 
-spec,  pci driver can get BAR address size by writing 0xffffffff to
-the pci register firstly, and then read back the value from this register.
-We didn't handle this value  specially while process pci write in qemu, the 
-function call stack is:
-Pci_bridge_dev_write_config
--> pci_bridge_write_config
--> pci_default_write_config (we update the config[address] value here to 
-fffffffffc800000, which should be 0xfc800000 )
--> pci_bridge_update_mappings
-                ->pci_bridge_region_del(br, br->windows);
--> pci_bridge_region_init
-                                                                
-->pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value 
-fffffffffc800000)
-                                                -> 
-memory_region_transaction_commit
-
-So, as we can see, we use the wrong base address in qemu to update the memory 
-regions, though, we update the base address to
-The correct value after pci driver in VM write the original value back, the 
-virtio NIC in bus 4 may still sends net packets concurrently with
-The wrong memory region address.
-
-We have tried to skip the memory region update action in qemu while detect pci 
-write with 0xffffffff value, and it does work, but
-This seems to be not gently.
-
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
-index b2e50c3..84b405d 100644
---- a/hw/pci/pci_bridge.c
-+++ b/hw/pci/pci_bridge.c
-@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
-     pci_default_write_config(d, address, val, len);
--    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
-+    if ( (val != 0xffffffff) &&
-+        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
-         /* io base/limit */
-         ranges_overlap(address, len, PCI_IO_BASE, 2) ||
-@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
-         ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
-         /* vga enable */
--        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
-+        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
-         pci_bridge_update_mappings(s);
-     }
-
-Thinks,
-Xu
-
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-Hi all,
->
->
->
->
-In our test, we configured VM with several pci-bridges and a virtio-net nic
->
-been attached with bus 4,
->
->
-After VM is startup, We ping this nic from host to judge if it is working
->
-normally. Then, we hot add pci devices to this VM with bus 0.
->
->
-We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-occasionally, as it kick virtio backend failure with error below:
->
->
-Unassigned mem write 00000000fc803004 = 0x1
-Thanks for the report. Which guest was used to produce this problem?
-
--- 
-MST
-
-n Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> Hi all,
->
->
->
->
->
->
->
-> In our test, we configured VM with several pci-bridges and a
->
-> virtio-net nic been attached with bus 4,
->
->
->
-> After VM is startup, We ping this nic from host to judge if it is
->
-> working normally. Then, we hot add pci devices to this VM with bus 0.
->
->
->
-> We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-> occasionally, as it kick virtio backend failure with error below:
->
->
->
->     Unassigned mem write 00000000fc803004 = 0x1
->
->
-Thanks for the report. Which guest was used to produce this problem?
->
->
---
->
-MST
-I was seeing this problem when I hotplug a VFIO device to guest CentOS 7.4,
-after that I compiled the latest Linux kernel and it also contains this problem.
-
-Thinks,
-Xu
-
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-Hi all,
->
->
->
->
-In our test, we configured VM with several pci-bridges and a virtio-net nic
->
-been attached with bus 4,
->
->
-After VM is startup, We ping this nic from host to judge if it is working
->
-normally. Then, we hot add pci devices to this VM with bus 0.
->
->
-We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-occasionally, as it kick virtio backend failure with error below:
->
->
-Unassigned mem write 00000000fc803004 = 0x1
->
->
->
->
-memory-region: pci_bridge_pci
->
->
-0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
->
-00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
->
-00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common
->
->
-00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
->
->
-00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device
->
->
-00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify  <- io
->
-mem unassigned
->
->
-â¦
->
->
->
->
-We caught an exceptional address changing while this problem happened, show as
->
-follow:
->
->
-Before pci_bridge_update_mappingsï¼
->
->
-00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc000000-00000000fc1fffff
->
->
-00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc200000-00000000fc3fffff
->
->
-00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc400000-00000000fc5fffff
->
->
-00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc600000-00000000fc7fffff
->
->
-00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce
->
->
-00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fca00000-00000000fcbfffff
->
->
-00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fcc00000-00000000fcdfffff
->
->
-00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fce00000-00000000fcffffff
->
->
->
->
-After pci_bridge_update_mappingsï¼
->
->
-00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
->
-00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
->
-00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fde00000-00000000fdffffff
->
->
-00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
->
-00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
->
-00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
->
-00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
->
-00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
->
-fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress
->
-Space
-This one is empty though right?
-
->
->
->
-We have figured out why this address becomes this value,  according to pci
->
-spec,  pci driver can get BAR address size by writing 0xffffffff to
->
->
-the pci register firstly, and then read back the value from this register.
-OK however as you show below the BAR being sized is the BAR
-if a bridge. Are you then adding a bridge device by hotplug?
-
-
-
->
-We didn't handle this value  specially while process pci write in qemu, the
->
-function call stack is:
->
->
-Pci_bridge_dev_write_config
->
->
--> pci_bridge_write_config
->
->
--> pci_default_write_config (we update the config[address] value here to
->
-fffffffffc800000, which should be 0xfc800000 )
->
->
--> pci_bridge_update_mappings
->
->
-->pci_bridge_region_del(br, br->windows);
->
->
--> pci_bridge_region_init
->
->
-->
->
-pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value
->
-fffffffffc800000)
->
->
-->
->
-memory_region_transaction_commit
->
->
->
->
-So, as we can see, we use the wrong base address in qemu to update the memory
->
-regions, though, we update the base address to
->
->
-The correct value after pci driver in VM write the original value back, the
->
-virtio NIC in bus 4 may still sends net packets concurrently with
->
->
-The wrong memory region address.
->
->
->
->
-We have tried to skip the memory region update action in qemu while detect pci
->
-write with 0xffffffff value, and it does work, but
->
->
-This seems to be not gently.
-For sure. But I'm still puzzled as to why does Linux try to
-size the BAR of the bridge while a device behind it is
-used.
-
-Can you pls post your QEMU command line?
-
-
-
->
->
->
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
->
-index b2e50c3..84b405d 100644
->
->
---- a/hw/pci/pci_bridge.c
->
->
-+++ b/hw/pci/pci_bridge.c
->
->
-@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
->
->
-pci_default_write_config(d, address, val, len);
->
->
--    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
-+    if ( (val != 0xffffffff) &&
->
->
-+        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
-/* io base/limit */
->
->
-ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
-@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
->
->
-ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
->
-/* vga enable */
->
->
--        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
->
-+        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
->
->
-pci_bridge_update_mappings(s);
->
->
-}
->
->
->
->
-Thinks,
->
->
-Xu
->
-
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> Hi all,
->
->
->
->
->
->
->
-> In our test, we configured VM with several pci-bridges and a
->
-> virtio-net nic been attached with bus 4,
->
->
->
-> After VM is startup, We ping this nic from host to judge if it is
->
-> working normally. Then, we hot add pci devices to this VM with bus 0.
->
->
->
-> We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-> occasionally, as it kick virtio backend failure with error below:
->
->
->
->     Unassigned mem write 00000000fc803004 = 0x1
->
->
->
->
->
->
->
-> memory-region: pci_bridge_pci
->
->
->
->   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
->
->
->     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
->
->
->       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> virtio-pci-common
->
->
->
->       00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
->
->
->
->       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> virtio-pci-device
->
->
->
->       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> virtio-pci-notify  <- io mem unassigned
->
->
->
->       â¦
->
->
->
->
->
->
->
-> We caught an exceptional address changing while this problem happened,
->
-> show as
->
-> follow:
->
->
->
-> Before pci_bridge_update_mappingsï¼
->
->
->
->       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff
->
->
->
->       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff
->
->
->
->       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff
->
->
->
->       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff
->
->
->
->       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff
->
-> <- correct Adress Spce
->
->
->
->       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff
->
->
->
->       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff
->
->
->
->       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff
->
->
->
->
->
->
->
-> After pci_bridge_update_mappingsï¼
->
->
->
->       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
->
->
->       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
->
->
->       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
->
->
->
->       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
->
->
->       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
->
->
->       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
->
->
->       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
->
->
->       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
->
->
->       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> pci_bridge_pref_mem
->
-> @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress
->
-Space
->
->
-This one is empty though right?
->
->
->
->
->
->
-> We have figured out why this address becomes this value,  according to
->
-> pci spec,  pci driver can get BAR address size by writing 0xffffffff
->
-> to
->
->
->
-> the pci register firstly, and then read back the value from this register.
->
->
->
-OK however as you show below the BAR being sized is the BAR if a bridge. Are
->
-you then adding a bridge device by hotplug?
-No, I just simply hot plugged a VFIO device to Bus 0, another interesting 
-phenomenon is
-If I hot plug the device to other bus, this doesn't happened.
- 
->
->
->
-> We didn't handle this value  specially while process pci write in
->
-> qemu, the function call stack is:
->
->
->
-> Pci_bridge_dev_write_config
->
->
->
-> -> pci_bridge_write_config
->
->
->
-> -> pci_default_write_config (we update the config[address] value here
->
-> -> to
->
-> fffffffffc800000, which should be 0xfc800000 )
->
->
->
-> -> pci_bridge_update_mappings
->
->
->
->                 ->pci_bridge_region_del(br, br->windows);
->
->
->
-> -> pci_bridge_region_init
->
->
->
->                                                                 ->
->
-> pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
->
-> value
->
-> fffffffffc800000)
->
->
->
->                                                 ->
->
-> memory_region_transaction_commit
->
->
->
->
->
->
->
-> So, as we can see, we use the wrong base address in qemu to update the
->
-> memory regions, though, we update the base address to
->
->
->
-> The correct value after pci driver in VM write the original value
->
-> back, the virtio NIC in bus 4 may still sends net packets concurrently
->
-> with
->
->
->
-> The wrong memory region address.
->
->
->
->
->
->
->
-> We have tried to skip the memory region update action in qemu while
->
-> detect pci write with 0xffffffff value, and it does work, but
->
->
->
-> This seems to be not gently.
->
->
-For sure. But I'm still puzzled as to why does Linux try to size the BAR of
->
-the
->
-bridge while a device behind it is used.
->
->
-Can you pls post your QEMU command line?
-My QEMU command line:
-/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object 
-secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes
- -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu 
-host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m 
-size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp 
-20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa 
-node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa 
-node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439 
--no-user-config -nodefaults -chardev 
-socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait
- -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet 
--global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device 
-pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device 
-pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device 
-pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device 
-pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device 
-pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device 
-piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
-usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device 
-nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device 
-virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device 
-virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device 
-virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device 
-virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device 
-virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive 
-file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none
- -device 
-virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
- -drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device 
-ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev 
-tap,fd=35,id=hostnet0 -device 
-virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1 
--chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
--device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device 
-cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device 
-virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
-
-I am also very curious about this issue, in the linux kernel code, maybe double 
-check in function pci_bridge_check_ranges triggered this problem.
-
-
->
->
->
->
->
->
->
->
-> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
->
->
-> index b2e50c3..84b405d 100644
->
->
->
-> --- a/hw/pci/pci_bridge.c
->
->
->
-> +++ b/hw/pci/pci_bridge.c
->
->
->
-> @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
->
->
->
->      pci_default_write_config(d, address, val, len);
->
->
->
-> -    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
->
-> +    if ( (val != 0xffffffff) &&
->
->
->
-> +        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
->
->          /* io base/limit */
->
->
->
->          ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
->
-> @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
->
->
->
->          ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
->
->
->          /* vga enable */
->
->
->
-> -        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
->
->
-> +        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
->
->
->
->          pci_bridge_update_mappings(s);
->
->
->
->      }
->
->
->
->
->
->
->
-> Thinks,
->
->
->
-> Xu
->
->
-
-On Mon, Dec 10, 2018 at 03:12:53AM +0000, xuyandong wrote:
->
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > Hi all,
->
-> >
->
-> >
->
-> >
->
-> > In our test, we configured VM with several pci-bridges and a
->
-> > virtio-net nic been attached with bus 4,
->
-> >
->
-> > After VM is startup, We ping this nic from host to judge if it is
->
-> > working normally. Then, we hot add pci devices to this VM with bus 0.
->
-> >
->
-> > We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-> > occasionally, as it kick virtio backend failure with error below:
->
-> >
->
-> >     Unassigned mem write 00000000fc803004 = 0x1
->
-> >
->
-> >
->
-> >
->
-> > memory-region: pci_bridge_pci
->
-> >
->
-> >   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
-> >
->
-> >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> >
->
-> >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > virtio-pci-common
->
-> >
->
-> >       00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
->
-> >
->
-> >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > virtio-pci-device
->
-> >
->
-> >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > virtio-pci-notify  <- io mem unassigned
->
-> >
->
-> >       â¦
->
-> >
->
-> >
->
-> >
->
-> > We caught an exceptional address changing while this problem happened,
->
-> > show as
->
-> > follow:
->
-> >
->
-> > Before pci_bridge_update_mappingsï¼
->
-> >
->
-> >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff
->
-> >
->
-> >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff
->
-> >
->
-> >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff
->
-> >
->
-> >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff
->
-> >
->
-> >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff
->
-> > <- correct Adress Spce
->
-> >
->
-> >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff
->
-> >
->
-> >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff
->
-> >
->
-> >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff
->
-> >
->
-> >
->
-> >
->
-> > After pci_bridge_update_mappingsï¼
->
-> >
->
-> >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
-> >
->
-> >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
-> >
->
-> >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
->
-> >
->
-> >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
-> >
->
-> >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
-> >
->
-> >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
-> >
->
-> >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
-> >
->
-> >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
-> >
->
-> >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> > pci_bridge_pref_mem
->
-> > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress
->
-> Space
->
->
->
-> This one is empty though right?
->
->
->
-> >
->
-> >
->
-> > We have figured out why this address becomes this value,  according to
->
-> > pci spec,  pci driver can get BAR address size by writing 0xffffffff
->
-> > to
->
-> >
->
-> > the pci register firstly, and then read back the value from this register.
->
->
->
->
->
-> OK however as you show below the BAR being sized is the BAR if a bridge. Are
->
-> you then adding a bridge device by hotplug?
->
->
-No, I just simply hot plugged a VFIO device to Bus 0, another interesting
->
-phenomenon is
->
-If I hot plug the device to other bus, this doesn't happened.
->
->
->
->
->
->
-> > We didn't handle this value  specially while process pci write in
->
-> > qemu, the function call stack is:
->
-> >
->
-> > Pci_bridge_dev_write_config
->
-> >
->
-> > -> pci_bridge_write_config
->
-> >
->
-> > -> pci_default_write_config (we update the config[address] value here
->
-> > -> to
->
-> > fffffffffc800000, which should be 0xfc800000 )
->
-> >
->
-> > -> pci_bridge_update_mappings
->
-> >
->
-> >                 ->pci_bridge_region_del(br, br->windows);
->
-> >
->
-> > -> pci_bridge_region_init
->
-> >
->
-> >                                                                 ->
->
-> > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
->
-> > value
->
-> > fffffffffc800000)
->
-> >
->
-> >                                                 ->
->
-> > memory_region_transaction_commit
->
-> >
->
-> >
->
-> >
->
-> > So, as we can see, we use the wrong base address in qemu to update the
->
-> > memory regions, though, we update the base address to
->
-> >
->
-> > The correct value after pci driver in VM write the original value
->
-> > back, the virtio NIC in bus 4 may still sends net packets concurrently
->
-> > with
->
-> >
->
-> > The wrong memory region address.
->
-> >
->
-> >
->
-> >
->
-> > We have tried to skip the memory region update action in qemu while
->
-> > detect pci write with 0xffffffff value, and it does work, but
->
-> >
->
-> > This seems to be not gently.
->
->
->
-> For sure. But I'm still puzzled as to why does Linux try to size the BAR of
->
-> the
->
-> bridge while a device behind it is used.
->
->
->
-> Can you pls post your QEMU command line?
->
->
-My QEMU command line:
->
-/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object
->
-secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes
->
--machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439
->
--no-user-config -nodefaults -chardev
->
-socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait
->
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
--global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device
->
-pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none
->
--device
->
-virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
->
--drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-tap,fd=35,id=hostnet0 -device
->
-virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1
->
--chardev pty,id=charserial0 -device
->
-isa-serial,chardev=charserial0,id=serial0 -device
->
-usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
->
->
-I am also very curious about this issue, in the linux kernel code, maybe
->
-double check in function pci_bridge_check_ranges triggered this problem.
-If you can get the stacktrace in Linux when it tries to write this
-fffff value, that would be quite helpful.
-
-
->
->
->
->
->
->
->
->
-> >
->
-> >
->
-> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
-> >
->
-> > index b2e50c3..84b405d 100644
->
-> >
->
-> > --- a/hw/pci/pci_bridge.c
->
-> >
->
-> > +++ b/hw/pci/pci_bridge.c
->
-> >
->
-> > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> >
->
-> >      pci_default_write_config(d, address, val, len);
->
-> >
->
-> > -    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> >
->
-> > +    if ( (val != 0xffffffff) &&
->
-> >
->
-> > +        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> >
->
-> >          /* io base/limit */
->
-> >
->
-> >          ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> >
->
-> > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> >
->
-> >          ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-> >
->
-> >          /* vga enable */
->
-> >
->
-> > -        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
-> >
->
-> > +        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
->
-> >
->
-> >          pci_bridge_update_mappings(s);
->
-> >
->
-> >      }
->
-> >
->
-> >
->
-> >
->
-> > Thinks,
->
-> >
->
-> > Xu
->
-> >
-
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > Hi all,
->
-> > >
->
-> > >
->
-> > >
->
-> > > In our test, we configured VM with several pci-bridges and a
->
-> > > virtio-net nic been attached with bus 4,
->
-> > >
->
-> > > After VM is startup, We ping this nic from host to judge if it is
->
-> > > working normally. Then, we hot add pci devices to this VM with bus 0.
->
-> > >
->
-> > > We  found the virtio-net NIC in bus 4 is not working (can not
->
-> > > connect) occasionally, as it kick virtio backend failure with error
->
-> > > below:
->
-> > >
->
-> > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > >
->
-> > >
->
-> > >
->
-> > > memory-region: pci_bridge_pci
->
-> > >
->
-> > >   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
-> > >
->
-> > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> > >
->
-> > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > virtio-pci-common
->
-> > >
->
-> > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > virtio-pci-isr
->
-> > >
->
-> > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > virtio-pci-device
->
-> > >
->
-> > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > virtio-pci-notify  <- io mem unassigned
->
-> > >
->
-> > >       â¦
->
-> > >
->
-> > >
->
-> > >
->
-> > > We caught an exceptional address changing while this problem
->
-> > > happened, show as
->
-> > > follow:
->
-> > >
->
-> > > Before pci_bridge_update_mappingsï¼
->
-> > >
->
-> > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc000000-00000000fc1fffff
->
-> > >
->
-> > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc200000-00000000fc3fffff
->
-> > >
->
-> > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc400000-00000000fc5fffff
->
-> > >
->
-> > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc600000-00000000fc7fffff
->
-> > >
->
-> > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc800000-00000000fc9fffff
->
-> > > <- correct Adress Spce
->
-> > >
->
-> > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fca00000-00000000fcbfffff
->
-> > >
->
-> > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fcc00000-00000000fcdfffff
->
-> > >
->
-> > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fce00000-00000000fcffffff
->
-> > >
->
-> > >
->
-> > >
->
-> > > After pci_bridge_update_mappingsï¼
->
-> > >
->
-> > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
-> > >
->
-> > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
-> > >
->
-> > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
->
-> > >
->
-> > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
-> > >
->
-> > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
-> > >
->
-> > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
-> > >
->
-> > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
-> > >
->
-> > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
-> > >
->
-> > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-> > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
->
-> > > Adress
->
-> > Space
->
-> >
->
-> > This one is empty though right?
->
-> >
->
-> > >
->
-> > >
->
-> > > We have figured out why this address becomes this value,
->
-> > > according to pci spec,  pci driver can get BAR address size by
->
-> > > writing 0xffffffff to
->
-> > >
->
-> > > the pci register firstly, and then read back the value from this
->
-> > > register.
->
-> >
->
-> >
->
-> > OK however as you show below the BAR being sized is the BAR if a
->
-> > bridge. Are you then adding a bridge device by hotplug?
->
->
->
-> No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> interesting phenomenon is If I hot plug the device to other bus, this
->
-> doesn't
->
-happened.
->
->
->
-> >
->
-> >
->
-> > > We didn't handle this value  specially while process pci write in
->
-> > > qemu, the function call stack is:
->
-> > >
->
-> > > Pci_bridge_dev_write_config
->
-> > >
->
-> > > -> pci_bridge_write_config
->
-> > >
->
-> > > -> pci_default_write_config (we update the config[address] value
->
-> > > -> here to
->
-> > > fffffffffc800000, which should be 0xfc800000 )
->
-> > >
->
-> > > -> pci_bridge_update_mappings
->
-> > >
->
-> > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > >
->
-> > > -> pci_bridge_region_init
->
-> > >
->
-> > >                                                                 ->
->
-> > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
->
-> > > value
->
-> > > fffffffffc800000)
->
-> > >
->
-> > >                                                 ->
->
-> > > memory_region_transaction_commit
->
-> > >
->
-> > >
->
-> > >
->
-> > > So, as we can see, we use the wrong base address in qemu to update
->
-> > > the memory regions, though, we update the base address to
->
-> > >
->
-> > > The correct value after pci driver in VM write the original value
->
-> > > back, the virtio NIC in bus 4 may still sends net packets
->
-> > > concurrently with
->
-> > >
->
-> > > The wrong memory region address.
->
-> > >
->
-> > >
->
-> > >
->
-> > > We have tried to skip the memory region update action in qemu
->
-> > > while detect pci write with 0xffffffff value, and it does work,
->
-> > > but
->
-> > >
->
-> > > This seems to be not gently.
->
-> >
->
-> > For sure. But I'm still puzzled as to why does Linux try to size the
->
-> > BAR of the bridge while a device behind it is used.
->
-> >
->
-> > Can you pls post your QEMU command line?
->
->
->
-> My QEMU command line:
->
-> /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
->
-> -object
->
-> secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-
->
-> Linux/master-key.aes -machine
->
-> pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-> 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024
->
-> -numa node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
->
-> -chardev
->
-> socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni
->
-> tor.sock,server,nowait -mon
->
-> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
-> -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on
->
-> -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v
->
-> irtio-disk0,cache=none -device
->
-> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id
->
-> =virtio-disk0,bootindex=1 -drive
->
-> if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-> tap,fd=35,id=hostnet0 -device
->
-> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4
->
-> ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> isa-serial,chardev=charserial0,id=serial0 -device
->
-> usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
->
->
->
-> I am also very curious about this issue, in the linux kernel code, maybe
->
-> double
->
-check in function pci_bridge_check_ranges triggered this problem.
->
->
-If you can get the stacktrace in Linux when it tries to write this fffff
->
-value, that
->
-would be quite helpful.
->
-After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon is
-easier to reproduce, below is my modify in kernel:
-diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
-index cb389277..86e232d 100644
---- a/drivers/pci/setup-bus.c
-+++ b/drivers/pci/setup-bus.c
-@@ -27,7 +27,7 @@
- #include <linux/slab.h>
- #include <linux/acpi.h>
- #include "pci.h"
--
-+#include <linux/delay.h>
- unsigned int pci_flags;
- 
- struct pci_dev_resource {
-@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
-                pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
-                                               0xffffffff);
-                pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
-+               mdelay(100);
-+               printk(KERN_ERR "sleep\n");
-+                dump_stack();
-                if (!tmp)
-                        b_res[2].flags &= ~IORESOURCE_MEM_64;
-                pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
-
-After hot plugging, we get the following log:
-
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 0: assigned [mem 
-0xc2360000-0xc237ffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 3: assigned [mem 
-0xc2328000-0xc232bfff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:18 uefi-linux kernel: sleep
-Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:18 uefi-linux kernel: Call Trace:
-Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:18 uefi-linux kernel: sleep
-Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:18 uefi-linux kernel: Call Trace:
-Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:19 uefi-linux kernel: sleep
-Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:19 uefi-linux kernel: Call Trace:
-Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:19 uefi-linux kernel: sleep
-Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:19 uefi-linux kernel: Call Trace:
-Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:19 uefi-linux kernel: ? pci_conf1_read+0xba/0x100
-Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0xe9/0x960
-Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:19 uefi-linux kernel: ? pcibios_allocate_rom_resources+0x45/0x80
-Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:19 uefi-linux kernel: sleep
-Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:19 uefi-linux kernel: Call Trace:
-Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-
->
->
->
-> >
->
-> >
->
-> >
->
-> > >
->
-> > >
->
-> > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
-> > >
->
-> > > index b2e50c3..84b405d 100644
->
-> > >
->
-> > > --- a/hw/pci/pci_bridge.c
->
-> > >
->
-> > > +++ b/hw/pci/pci_bridge.c
->
-> > >
->
-> > > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> > >
->
-> > >      pci_default_write_config(d, address, val, len);
->
-> > >
->
-> > > -    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> > >
->
-> > > +    if ( (val != 0xffffffff) &&
->
-> > >
->
-> > > +        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> > >
->
-> > >          /* io base/limit */
->
-> > >
->
-> > >          ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> > >
->
-> > > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> > >
->
-> > >          ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-> > >
->
-> > >          /* vga enable */
->
-> > >
->
-> > > -        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
-> > >
->
-> > > +        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
->
-> > >
->
-> > >          pci_bridge_update_mappings(s);
->
-> > >
->
-> > >      }
->
-> > >
->
-> > >
->
-> > >
->
-> > > Thinks,
->
-> > >
->
-> > > Xu
->
-> > >
-
-On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > Hi all,
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > In our test, we configured VM with several pci-bridges and a
->
-> > > > virtio-net nic been attached with bus 4,
->
-> > > >
->
-> > > > After VM is startup, We ping this nic from host to judge if it is
->
-> > > > working normally. Then, we hot add pci devices to this VM with bus 0.
->
-> > > >
->
-> > > > We  found the virtio-net NIC in bus 4 is not working (can not
->
-> > > > connect) occasionally, as it kick virtio backend failure with error
->
-> > > > below:
->
-> > > >
->
-> > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > memory-region: pci_bridge_pci
->
-> > > >
->
-> > > >   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
-> > > >
->
-> > > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> > > >
->
-> > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > virtio-pci-common
->
-> > > >
->
-> > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > virtio-pci-isr
->
-> > > >
->
-> > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > virtio-pci-device
->
-> > > >
->
-> > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > virtio-pci-notify  <- io mem unassigned
->
-> > > >
->
-> > > >       â¦
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > We caught an exceptional address changing while this problem
->
-> > > > happened, show as
->
-> > > > follow:
->
-> > > >
->
-> > > > Before pci_bridge_update_mappingsï¼
->
-> > > >
->
-> > > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc000000-00000000fc1fffff
->
-> > > >
->
-> > > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc200000-00000000fc3fffff
->
-> > > >
->
-> > > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc400000-00000000fc5fffff
->
-> > > >
->
-> > > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc600000-00000000fc7fffff
->
-> > > >
->
-> > > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc800000-00000000fc9fffff
->
-> > > > <- correct Adress Spce
->
-> > > >
->
-> > > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fca00000-00000000fcbfffff
->
-> > > >
->
-> > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fcc00000-00000000fcdfffff
->
-> > > >
->
-> > > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fce00000-00000000fcffffff
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > After pci_bridge_update_mappingsï¼
->
-> > > >
->
-> > > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
-> > > >
->
-> > > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
-> > > >
->
-> > > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
->
-> > > >
->
-> > > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
-> > > >
->
-> > > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
-> > > >
->
-> > > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
-> > > >
->
-> > > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
-> > > >
->
-> > > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
-> > > >
->
-> > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> pci_bridge_pref_mem
->
-> > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
->
-> > > > Adress
->
-> > > Space
->
-> > >
->
-> > > This one is empty though right?
->
-> > >
->
-> > > >
->
-> > > >
->
-> > > > We have figured out why this address becomes this value,
->
-> > > > according to pci spec,  pci driver can get BAR address size by
->
-> > > > writing 0xffffffff to
->
-> > > >
->
-> > > > the pci register firstly, and then read back the value from this
->
-> > > > register.
->
-> > >
->
-> > >
->
-> > > OK however as you show below the BAR being sized is the BAR if a
->
-> > > bridge. Are you then adding a bridge device by hotplug?
->
-> >
->
-> > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > interesting phenomenon is If I hot plug the device to other bus, this
->
-> > doesn't
->
-> happened.
->
-> >
->
-> > >
->
-> > >
->
-> > > > We didn't handle this value  specially while process pci write in
->
-> > > > qemu, the function call stack is:
->
-> > > >
->
-> > > > Pci_bridge_dev_write_config
->
-> > > >
->
-> > > > -> pci_bridge_write_config
->
-> > > >
->
-> > > > -> pci_default_write_config (we update the config[address] value
->
-> > > > -> here to
->
-> > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > >
->
-> > > > -> pci_bridge_update_mappings
->
-> > > >
->
-> > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > >
->
-> > > > -> pci_bridge_region_init
->
-> > > >
->
-> > > >                                                                 ->
->
-> > > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
->
-> > > > value
->
-> > > > fffffffffc800000)
->
-> > > >
->
-> > > >                                                 ->
->
-> > > > memory_region_transaction_commit
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > So, as we can see, we use the wrong base address in qemu to update
->
-> > > > the memory regions, though, we update the base address to
->
-> > > >
->
-> > > > The correct value after pci driver in VM write the original value
->
-> > > > back, the virtio NIC in bus 4 may still sends net packets
->
-> > > > concurrently with
->
-> > > >
->
-> > > > The wrong memory region address.
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > We have tried to skip the memory region update action in qemu
->
-> > > > while detect pci write with 0xffffffff value, and it does work,
->
-> > > > but
->
-> > > >
->
-> > > > This seems to be not gently.
->
-> > >
->
-> > > For sure. But I'm still puzzled as to why does Linux try to size the
->
-> > > BAR of the bridge while a device behind it is used.
->
-> > >
->
-> > > Can you pls post your QEMU command line?
->
-> >
->
-> > My QEMU command line:
->
-> > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
->
-> > -object
->
-> > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-
->
-> > Linux/master-key.aes -machine
->
-> > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-> > 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024
->
-> > -numa node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
->
-> > -chardev
->
-> > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni
->
-> > tor.sock,server,nowait -mon
->
-> > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
-> > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on
->
-> > -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v
->
-> > irtio-disk0,cache=none -device
->
-> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id
->
-> > =virtio-disk0,bootindex=1 -drive
->
-> > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-> > tap,fd=35,id=hostnet0 -device
->
-> > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4
->
-> > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
->
-> >
->
-> > I am also very curious about this issue, in the linux kernel code, maybe
->
-> > double
->
-> check in function pci_bridge_check_ranges triggered this problem.
->
->
->
-> If you can get the stacktrace in Linux when it tries to write this fffff
->
-> value, that
->
-> would be quite helpful.
->
->
->
->
-After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon
->
-is
->
-easier to reproduce, below is my modify in kernel:
->
-diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
->
-index cb389277..86e232d 100644
->
---- a/drivers/pci/setup-bus.c
->
-+++ b/drivers/pci/setup-bus.c
->
-@@ -27,7 +27,7 @@
->
-#include <linux/slab.h>
->
-#include <linux/acpi.h>
->
-#include "pci.h"
->
--
->
-+#include <linux/delay.h>
->
-unsigned int pci_flags;
->
->
-struct pci_dev_resource {
->
-@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
->
-pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-0xffffffff);
->
-pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
->
-+               mdelay(100);
->
-+               printk(KERN_ERR "sleep\n");
->
-+                dump_stack();
->
-if (!tmp)
->
-b_res[2].flags &= ~IORESOURCE_MEM_64;
->
-pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-OK!
-I just sent a Linux patch that should help.
-I would appreciate it if you will give it a try
-and if that helps reply to it with
-a Tested-by: tag.
-
--- 
-MST
-
-On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-> On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > > Hi all,
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > In our test, we configured VM with several pci-bridges and a
->
-> > > > > virtio-net nic been attached with bus 4,
->
-> > > > >
->
-> > > > > After VM is startup, We ping this nic from host to judge if it
->
-> > > > > is working normally. Then, we hot add pci devices to this VM with
->
-> > > > > bus
->
-0.
->
-> > > > >
->
-> > > > > We  found the virtio-net NIC in bus 4 is not working (can not
->
-> > > > > connect) occasionally, as it kick virtio backend failure with error
->
-> > > > > below:
->
-> > > > >
->
-> > > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > memory-region: pci_bridge_pci
->
-> > > > >
->
-> > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
->
-> > > > > pci_bridge_pci
->
-> > > > >
->
-> > > > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> > > > >
->
-> > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > > virtio-pci-common
->
-> > > > >
->
-> > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > > virtio-pci-isr
->
-> > > > >
->
-> > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > > virtio-pci-device
->
-> > > > >
->
-> > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > > virtio-pci-notify  <- io mem unassigned
->
-> > > > >
->
-> > > > >       â¦
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > We caught an exceptional address changing while this problem
->
-> > > > > happened, show as
->
-> > > > > follow:
->
-> > > > >
->
-> > > > > Before pci_bridge_update_mappingsï¼
->
-> > > > >
->
-> > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc000000-00000000fc1fffff
->
-> > > > >
->
-> > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc200000-00000000fc3fffff
->
-> > > > >
->
-> > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc400000-00000000fc5fffff
->
-> > > > >
->
-> > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc600000-00000000fc7fffff
->
-> > > > >
->
-> > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc800000-00000000fc9fffff
->
-> > > > > <- correct Adress Spce
->
-> > > > >
->
-> > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fca00000-00000000fcbfffff
->
-> > > > >
->
-> > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fcc00000-00000000fcdfffff
->
-> > > > >
->
-> > > > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fce00000-00000000fcffffff
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > After pci_bridge_update_mappingsï¼
->
-> > > > >
->
-> > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fda00000-00000000fdbfffff
->
-> > > > >
->
-> > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fdc00000-00000000fddfffff
->
-> > > > >
->
-> > > > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fde00000-00000000fdffffff
->
-> > > > >
->
-> > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe000000-00000000fe1fffff
->
-> > > > >
->
-> > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe200000-00000000fe3fffff
->
-> > > > >
->
-> > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe400000-00000000fe5fffff
->
-> > > > >
->
-> > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe600000-00000000fe7fffff
->
-> > > > >
->
-> > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe800000-00000000fe9fffff
->
-> > > > >
->
-> > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> > pci_bridge_pref_mem
->
-> > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
->
-Adress
->
-> > > > Space
->
-> > > >
->
-> > > > This one is empty though right?
->
-> > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > We have figured out why this address becomes this value,
->
-> > > > > according to pci spec,  pci driver can get BAR address size by
->
-> > > > > writing 0xffffffff to
->
-> > > > >
->
-> > > > > the pci register firstly, and then read back the value from this
->
-> > > > > register.
->
-> > > >
->
-> > > >
->
-> > > > OK however as you show below the BAR being sized is the BAR if a
->
-> > > > bridge. Are you then adding a bridge device by hotplug?
->
-> > >
->
-> > > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > > interesting phenomenon is If I hot plug the device to other bus,
->
-> > > this doesn't
->
-> > happened.
->
-> > >
->
-> > > >
->
-> > > >
->
-> > > > > We didn't handle this value  specially while process pci write
->
-> > > > > in qemu, the function call stack is:
->
-> > > > >
->
-> > > > > Pci_bridge_dev_write_config
->
-> > > > >
->
-> > > > > -> pci_bridge_write_config
->
-> > > > >
->
-> > > > > -> pci_default_write_config (we update the config[address]
->
-> > > > > -> value here to
->
-> > > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > > >
->
-> > > > > -> pci_bridge_update_mappings
->
-> > > > >
->
-> > > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > > >
->
-> > > > > -> pci_bridge_region_init
->
-> > > > >
->
-> > > > >
->
-> > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the
->
-> > > > > wrong value
->
-> > > > > fffffffffc800000)
->
-> > > > >
->
-> > > > >                                                 ->
->
-> > > > > memory_region_transaction_commit
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > So, as we can see, we use the wrong base address in qemu to
->
-> > > > > update the memory regions, though, we update the base address
->
-> > > > > to
->
-> > > > >
->
-> > > > > The correct value after pci driver in VM write the original
->
-> > > > > value back, the virtio NIC in bus 4 may still sends net
->
-> > > > > packets concurrently with
->
-> > > > >
->
-> > > > > The wrong memory region address.
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > We have tried to skip the memory region update action in qemu
->
-> > > > > while detect pci write with 0xffffffff value, and it does
->
-> > > > > work, but
->
-> > > > >
->
-> > > > > This seems to be not gently.
->
-> > > >
->
-> > > > For sure. But I'm still puzzled as to why does Linux try to size
->
-> > > > the BAR of the bridge while a device behind it is used.
->
-> > > >
->
-> > > > Can you pls post your QEMU command line?
->
-> > >
->
-> > > My QEMU command line:
->
-> > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
->
-> > > -object
->
-> > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-
->
-> > > 194-
->
-> > > Linux/master-key.aes -machine
->
-> > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-> > > 20,sockets=20,cores=1,threads=1 -numa
->
-> > > node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-> > > node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
->
-> > > -chardev
->
-> > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/
->
-> > > moni
->
-> > > tor.sock,server,nowait -mon
->
-> > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
-> > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot
->
-> > > strict=on -device
->
-> > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri
->
-> > > ve-v
->
-> > > irtio-disk0,cache=none -device
->
-> > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk
->
-> > > 0,id
->
-> > > =virtio-disk0,bootindex=1 -drive
->
-> > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-> > > tap,fd=35,id=hostnet0 -device
->
-> > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p
->
-> > > ci.4
->
-> > > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
->
-> > > timestamp=on
->
-> > >
->
-> > > I am also very curious about this issue, in the linux kernel code,
->
-> > > maybe double
->
-> > check in function pci_bridge_check_ranges triggered this problem.
->
-> >
->
-> > If you can get the stacktrace in Linux when it tries to write this
->
-> > fffff value, that would be quite helpful.
->
-> >
->
->
->
-> After I add mdelay(100) in function pci_bridge_check_ranges, this
->
-> phenomenon is easier to reproduce, below is my modify in kernel:
->
-> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index
->
-> cb389277..86e232d 100644
->
-> --- a/drivers/pci/setup-bus.c
->
-> +++ b/drivers/pci/setup-bus.c
->
-> @@ -27,7 +27,7 @@
->
->  #include <linux/slab.h>
->
->  #include <linux/acpi.h>
->
->  #include "pci.h"
->
-> -
->
-> +#include <linux/delay.h>
->
->  unsigned int pci_flags;
->
->
->
->  struct pci_dev_resource {
->
-> @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus
->
-*bus)
->
->                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
->                                                0xffffffff);
->
->                 pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> &tmp);
->
-> +               mdelay(100);
->
-> +               printk(KERN_ERR "sleep\n");
->
-> +                dump_stack();
->
->                 if (!tmp)
->
->                         b_res[2].flags &= ~IORESOURCE_MEM_64;
->
->                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
->
->
->
-OK!
->
-I just sent a Linux patch that should help.
->
-I would appreciate it if you will give it a try and if that helps reply to it
->
-with a
->
-Tested-by: tag.
->
-I tested this patch and it works fine on my machine.
-
-But I have another question, if we only fix this problem in the kernel, the 
-Linux
-version that has been released does not work well on the virtualization 
-platform. 
-Is there a way to fix this problem in the backend?
-
->
---
->
-MST
-
-On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote:
->
-On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-> > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > > > Hi all,
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > In our test, we configured VM with several pci-bridges and a
->
-> > > > > > virtio-net nic been attached with bus 4,
->
-> > > > > >
->
-> > > > > > After VM is startup, We ping this nic from host to judge if it
->
-> > > > > > is working normally. Then, we hot add pci devices to this VM with
->
-> > > > > > bus
->
-> 0.
->
-> > > > > >
->
-> > > > > > We  found the virtio-net NIC in bus 4 is not working (can not
->
-> > > > > > connect) occasionally, as it kick virtio backend failure with
->
-> > > > > > error below:
->
-> > > > > >
->
-> > > > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > memory-region: pci_bridge_pci
->
-> > > > > >
->
-> > > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
->
-> > > > > > pci_bridge_pci
->
-> > > > > >
->
-> > > > > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> > > > > >
->
-> > > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > > > virtio-pci-common
->
-> > > > > >
->
-> > > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > > > virtio-pci-isr
->
-> > > > > >
->
-> > > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > > > virtio-pci-device
->
-> > > > > >
->
-> > > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > > > virtio-pci-notify  <- io mem unassigned
->
-> > > > > >
->
-> > > > > >       â¦
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > We caught an exceptional address changing while this problem
->
-> > > > > > happened, show as
->
-> > > > > > follow:
->
-> > > > > >
->
-> > > > > > Before pci_bridge_update_mappingsï¼
->
-> > > > > >
->
-> > > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc000000-00000000fc1fffff
->
-> > > > > >
->
-> > > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc200000-00000000fc3fffff
->
-> > > > > >
->
-> > > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc400000-00000000fc5fffff
->
-> > > > > >
->
-> > > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc600000-00000000fc7fffff
->
-> > > > > >
->
-> > > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc800000-00000000fc9fffff
->
-> > > > > > <- correct Adress Spce
->
-> > > > > >
->
-> > > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fca00000-00000000fcbfffff
->
-> > > > > >
->
-> > > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fcc00000-00000000fcdfffff
->
-> > > > > >
->
-> > > > > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fce00000-00000000fcffffff
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > After pci_bridge_update_mappingsï¼
->
-> > > > > >
->
-> > > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fda00000-00000000fdbfffff
->
-> > > > > >
->
-> > > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fdc00000-00000000fddfffff
->
-> > > > > >
->
-> > > > > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fde00000-00000000fdffffff
->
-> > > > > >
->
-> > > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe000000-00000000fe1fffff
->
-> > > > > >
->
-> > > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe200000-00000000fe3fffff
->
-> > > > > >
->
-> > > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe400000-00000000fe5fffff
->
-> > > > > >
->
-> > > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe600000-00000000fe7fffff
->
-> > > > > >
->
-> > > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe800000-00000000fe9fffff
->
-> > > > > >
->
-> > > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem
->
-> > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
->
-> Adress
->
-> > > > > Space
->
-> > > > >
->
-> > > > > This one is empty though right?
->
-> > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > We have figured out why this address becomes this value,
->
-> > > > > > according to pci spec,  pci driver can get BAR address size by
->
-> > > > > > writing 0xffffffff to
->
-> > > > > >
->
-> > > > > > the pci register firstly, and then read back the value from this
->
-> > > > > > register.
->
-> > > > >
->
-> > > > >
->
-> > > > > OK however as you show below the BAR being sized is the BAR if a
->
-> > > > > bridge. Are you then adding a bridge device by hotplug?
->
-> > > >
->
-> > > > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > > > interesting phenomenon is If I hot plug the device to other bus,
->
-> > > > this doesn't
->
-> > > happened.
->
-> > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > > We didn't handle this value  specially while process pci write
->
-> > > > > > in qemu, the function call stack is:
->
-> > > > > >
->
-> > > > > > Pci_bridge_dev_write_config
->
-> > > > > >
->
-> > > > > > -> pci_bridge_write_config
->
-> > > > > >
->
-> > > > > > -> pci_default_write_config (we update the config[address]
->
-> > > > > > -> value here to
->
-> > > > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > > > >
->
-> > > > > > -> pci_bridge_update_mappings
->
-> > > > > >
->
-> > > > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > > > >
->
-> > > > > > -> pci_bridge_region_init
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the
->
-> > > > > > wrong value
->
-> > > > > > fffffffffc800000)
->
-> > > > > >
->
-> > > > > >                                                 ->
->
-> > > > > > memory_region_transaction_commit
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > So, as we can see, we use the wrong base address in qemu to
->
-> > > > > > update the memory regions, though, we update the base address
->
-> > > > > > to
->
-> > > > > >
->
-> > > > > > The correct value after pci driver in VM write the original
->
-> > > > > > value back, the virtio NIC in bus 4 may still sends net
->
-> > > > > > packets concurrently with
->
-> > > > > >
->
-> > > > > > The wrong memory region address.
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > We have tried to skip the memory region update action in qemu
->
-> > > > > > while detect pci write with 0xffffffff value, and it does
->
-> > > > > > work, but
->
-> > > > > >
->
-> > > > > > This seems to be not gently.
->
-> > > > >
->
-> > > > > For sure. But I'm still puzzled as to why does Linux try to size
->
-> > > > > the BAR of the bridge while a device behind it is used.
->
-> > > > >
->
-> > > > > Can you pls post your QEMU command line?
->
-> > > >
->
-> > > > My QEMU command line:
->
-> > > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
->
-> > > > -object
->
-> > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-
->
-> > > > 194-
->
-> > > > Linux/master-key.aes -machine
->
-> > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-> > > > 20,sockets=20,cores=1,threads=1 -numa
->
-> > > > node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-> > > > node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > > > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
->
-> > > > -chardev
->
-> > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/
->
-> > > > moni
->
-> > > > tor.sock,server,nowait -mon
->
-> > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
-> > > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot
->
-> > > > strict=on -device
->
-> > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri
->
-> > > > ve-v
->
-> > > > irtio-disk0,cache=none -device
->
-> > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk
->
-> > > > 0,id
->
-> > > > =virtio-disk0,bootindex=1 -drive
->
-> > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-> > > > tap,fd=35,id=hostnet0 -device
->
-> > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p
->
-> > > > ci.4
->
-> > > > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > > > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
->
-> > > > timestamp=on
->
-> > > >
->
-> > > > I am also very curious about this issue, in the linux kernel code,
->
-> > > > maybe double
->
-> > > check in function pci_bridge_check_ranges triggered this problem.
->
-> > >
->
-> > > If you can get the stacktrace in Linux when it tries to write this
->
-> > > fffff value, that would be quite helpful.
->
-> > >
->
-> >
->
-> > After I add mdelay(100) in function pci_bridge_check_ranges, this
->
-> > phenomenon is easier to reproduce, below is my modify in kernel:
->
-> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index
->
-> > cb389277..86e232d 100644
->
-> > --- a/drivers/pci/setup-bus.c
->
-> > +++ b/drivers/pci/setup-bus.c
->
-> > @@ -27,7 +27,7 @@
->
-> >  #include <linux/slab.h>
->
-> >  #include <linux/acpi.h>
->
-> >  #include "pci.h"
->
-> > -
->
-> > +#include <linux/delay.h>
->
-> >  unsigned int pci_flags;
->
-> >
->
-> >  struct pci_dev_resource {
->
-> > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus
->
-> *bus)
->
-> >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> >                                                0xffffffff);
->
-> >                 pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> > &tmp);
->
-> > +               mdelay(100);
->
-> > +               printk(KERN_ERR "sleep\n");
->
-> > +                dump_stack();
->
-> >                 if (!tmp)
->
-> >                         b_res[2].flags &= ~IORESOURCE_MEM_64;
->
-> >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> >
->
->
->
-> OK!
->
-> I just sent a Linux patch that should help.
->
-> I would appreciate it if you will give it a try and if that helps reply to
->
-> it with a
->
-> Tested-by: tag.
->
->
->
->
-I tested this patch and it works fine on my machine.
->
->
-But I have another question, if we only fix this problem in the kernel, the
->
-Linux
->
-version that has been released does not work well on the virtualization
->
-platform.
->
-Is there a way to fix this problem in the backend?
-There could we a way to work around this.
-Does below help?
-
-diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
-index 236a20eaa8..7834cac4b0 100644
---- a/hw/i386/acpi-build.c
-+++ b/hw/i386/acpi-build.c
-@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml *parent_scope, 
-PCIBus *bus,
- 
-         aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
-         aml_append(method,
--            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check */)
-+            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device Check 
-Light */)
-         );
-         aml_append(method,
-             aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request */)
-
->
-On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote:
->
-> On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-> > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > > > > Hi all,
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > In our test, we configured VM with several pci-bridges and
->
-> > > > > > > a virtio-net nic been attached with bus 4,
->
-> > > > > > >
->
-> > > > > > > After VM is startup, We ping this nic from host to judge
->
-> > > > > > > if it is working normally. Then, we hot add pci devices to
->
-> > > > > > > this VM with bus
->
-> > 0.
->
-> > > > > > >
->
-> > > > > > > We  found the virtio-net NIC in bus 4 is not working (can
->
-> > > > > > > not
->
-> > > > > > > connect) occasionally, as it kick virtio backend failure with
->
-> > > > > > > error
->
-below:
->
-> > > > > > >
->
-> > > > > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > memory-region: pci_bridge_pci
->
-> > > > > > >
->
-> > > > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
->
-> > > > > > > pci_bridge_pci
->
-> > > > > > >
->
-> > > > > > >     00000000fc800000-00000000fc803fff (prio 1, RW):
->
-> > > > > > > virtio-pci
->
-> > > > > > >
->
-> > > > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > > > > virtio-pci-common
->
-> > > > > > >
->
-> > > > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > > > > virtio-pci-isr
->
-> > > > > > >
->
-> > > > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > > > > virtio-pci-device
->
-> > > > > > >
->
-> > > > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > > > > virtio-pci-notify  <- io mem unassigned
->
-> > > > > > >
->
-> > > > > > >       â¦
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > We caught an exceptional address changing while this
->
-> > > > > > > problem happened, show as
->
-> > > > > > > follow:
->
-> > > > > > >
->
-> > > > > > > Before pci_bridge_update_mappingsï¼
->
-> > > > > > >
->
-> > > > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc000000-00000000fc1fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc200000-00000000fc3fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc400000-00000000fc5fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc600000-00000000fc7fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc800000-00000000fc9fffff
->
-> > > > > > > <- correct Adress Spce
->
-> > > > > > >
->
-> > > > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fca00000-00000000fcbfffff
->
-> > > > > > >
->
-> > > > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fcc00000-00000000fcdfffff
->
-> > > > > > >
->
-> > > > > > >       00000000fce00000-00000000fcffffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fce00000-00000000fcffffff
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > After pci_bridge_update_mappingsï¼
->
-> > > > > > >
->
-> > > > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fda00000-00000000fdbfffff
->
-> > > > > > >
->
-> > > > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fdc00000-00000000fddfffff
->
-> > > > > > >
->
-> > > > > > >       00000000fde00000-00000000fdffffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fde00000-00000000fdffffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe000000-00000000fe1fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe200000-00000000fe3fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe400000-00000000fe5fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe600000-00000000fe7fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe800000-00000000fe9fffff
->
-> > > > > > >
->
-> > > > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW):
->
-> > > > > > > alias
->
-> > > > pci_bridge_pref_mem
->
-> > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <-
->
-> > > > > > > Exceptional
->
-> > Adress
->
-> > > > > > Space
->
-> > > > > >
->
-> > > > > > This one is empty though right?
->
-> > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > We have figured out why this address becomes this value,
->
-> > > > > > > according to pci spec,  pci driver can get BAR address
->
-> > > > > > > size by writing 0xffffffff to
->
-> > > > > > >
->
-> > > > > > > the pci register firstly, and then read back the value from this
->
-register.
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > OK however as you show below the BAR being sized is the BAR
->
-> > > > > > if a bridge. Are you then adding a bridge device by hotplug?
->
-> > > > >
->
-> > > > > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > > > > interesting phenomenon is If I hot plug the device to other
->
-> > > > > bus, this doesn't
->
-> > > > happened.
->
-> > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > > We didn't handle this value  specially while process pci
->
-> > > > > > > write in qemu, the function call stack is:
->
-> > > > > > >
->
-> > > > > > > Pci_bridge_dev_write_config
->
-> > > > > > >
->
-> > > > > > > -> pci_bridge_write_config
->
-> > > > > > >
->
-> > > > > > > -> pci_default_write_config (we update the config[address]
->
-> > > > > > > -> value here to
->
-> > > > > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > > > > >
->
-> > > > > > > -> pci_bridge_update_mappings
->
-> > > > > > >
->
-> > > > > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > > > > >
->
-> > > > > > > -> pci_bridge_region_init
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use
->
-> > > > > > > -> the
->
-> > > > > > > wrong value
->
-> > > > > > > fffffffffc800000)
->
-> > > > > > >
->
-> > > > > > >                                                 ->
->
-> > > > > > > memory_region_transaction_commit
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > So, as we can see, we use the wrong base address in qemu
->
-> > > > > > > to update the memory regions, though, we update the base
->
-> > > > > > > address to
->
-> > > > > > >
->
-> > > > > > > The correct value after pci driver in VM write the
->
-> > > > > > > original value back, the virtio NIC in bus 4 may still
->
-> > > > > > > sends net packets concurrently with
->
-> > > > > > >
->
-> > > > > > > The wrong memory region address.
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > We have tried to skip the memory region update action in
->
-> > > > > > > qemu while detect pci write with 0xffffffff value, and it
->
-> > > > > > > does work, but
->
-> > > > > > >
->
-> > > > > > > This seems to be not gently.
->
-> > > > > >
->
-> > > > > > For sure. But I'm still puzzled as to why does Linux try to
->
-> > > > > > size the BAR of the bridge while a device behind it is used.
->
-> > > > > >
->
-> > > > > > Can you pls post your QEMU command line?
->
-> > > > >
->
-> > > > > My QEMU command line:
->
-> > > > > /root/xyd/qemu-system-x86_64 -name
->
-> > > > > guest=Linux,debug-threads=on -S -object
->
-> > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom
->
-> > > > > ain-
->
-> > > > > 194-
->
-> > > > > Linux/master-key.aes -machine
->
-> > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off
->
-> > > > > -smp
->
-> > > > > 20,sockets=20,cores=1,threads=1 -numa
->
-> > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-> > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config
->
-> > > > > -nodefaults -chardev
->
-> > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li
->
-> > > > > nux/
->
-> > > > > moni
->
-> > > > > tor.sock,server,nowait -mon
->
-> > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc
->
-> > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown
->
-> > > > > -boot strict=on -device
->
-> > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id
->
-> > > > > =dri
->
-> > > > > ve-v
->
-> > > > > irtio-disk0,cache=none -device
->
-> > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-
->
-> > > > > disk
->
-> > > > > 0,id
->
-> > > > > =virtio-disk0,bootindex=1 -drive
->
-> > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1
->
-> > > > > -netdev
->
-> > > > > tap,fd=35,id=hostnet0 -device
->
-> > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b
->
-> > > > > us=p
->
-> > > > > ci.4
->
-> > > > > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > > > > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
->
-> > > > > timestamp=on
->
-> > > > >
->
-> > > > > I am also very curious about this issue, in the linux kernel
->
-> > > > > code, maybe double
->
-> > > > check in function pci_bridge_check_ranges triggered this problem.
->
-> > > >
->
-> > > > If you can get the stacktrace in Linux when it tries to write
->
-> > > > this fffff value, that would be quite helpful.
->
-> > > >
->
-> > >
->
-> > > After I add mdelay(100) in function pci_bridge_check_ranges, this
->
-> > > phenomenon is easier to reproduce, below is my modify in kernel:
->
-> > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
->
-> > > index cb389277..86e232d 100644
->
-> > > --- a/drivers/pci/setup-bus.c
->
-> > > +++ b/drivers/pci/setup-bus.c
->
-> > > @@ -27,7 +27,7 @@
->
-> > >  #include <linux/slab.h>
->
-> > >  #include <linux/acpi.h>
->
-> > >  #include "pci.h"
->
-> > > -
->
-> > > +#include <linux/delay.h>
->
-> > >  unsigned int pci_flags;
->
-> > >
->
-> > >  struct pci_dev_resource {
->
-> > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct
->
-> > > pci_bus
->
-> > *bus)
->
-> > >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> > >                                                0xffffffff);
->
-> > >                 pci_read_config_dword(bridge,
->
-> > > PCI_PREF_BASE_UPPER32, &tmp);
->
-> > > +               mdelay(100);
->
-> > > +               printk(KERN_ERR "sleep\n");
->
-> > > +                dump_stack();
->
-> > >                 if (!tmp)
->
-> > >                         b_res[2].flags &= ~IORESOURCE_MEM_64;
->
-> > >                 pci_write_config_dword(bridge,
->
-> > > PCI_PREF_BASE_UPPER32,
->
-> > >
->
-> >
->
-> > OK!
->
-> > I just sent a Linux patch that should help.
->
-> > I would appreciate it if you will give it a try and if that helps
->
-> > reply to it with a
->
-> > Tested-by: tag.
->
-> >
->
->
->
-> I tested this patch and it works fine on my machine.
->
->
->
-> But I have another question, if we only fix this problem in the
->
-> kernel, the Linux version that has been released does not work well on the
->
-virtualization platform.
->
-> Is there a way to fix this problem in the backend?
->
->
-There could we a way to work around this.
->
-Does below help?
-I am sorry to tell you, I tested this patch and it doesn't work fine.
-
->
->
-diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-236a20eaa8..7834cac4b0 100644
->
---- a/hw/i386/acpi-build.c
->
-+++ b/hw/i386/acpi-build.c
->
-@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-*parent_scope, PCIBus *bus,
->
->
-aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
->
-aml_append(method,
->
--            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check
->
-*/)
->
-+            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device
->
-+ Check Light */)
->
-);
->
-aml_append(method,
->
-aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request
->
-*/)
-
-On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote:
->
-> There could we a way to work around this.
->
-> Does below help?
->
->
-I am sorry to tell you, I tested this patch and it doesn't work fine.
-What happens?
-
->
->
->
-> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> 236a20eaa8..7834cac4b0 100644
->
-> --- a/hw/i386/acpi-build.c
->
-> +++ b/hw/i386/acpi-build.c
->
-> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> *parent_scope, PCIBus *bus,
->
->
->
->          aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
->
->          aml_append(method,
->
-> -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check
->
-> */)
->
-> +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device
->
-> + Check Light */)
->
->          );
->
->          aml_append(method,
->
->              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request
->
-> */)
-
-On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote:
->
-> On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote:
->
-> > On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-> > > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > > > > > Hi all,
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > In our test, we configured VM with several pci-bridges and
->
-> > > > > > > > a virtio-net nic been attached with bus 4,
->
-> > > > > > > >
->
-> > > > > > > > After VM is startup, We ping this nic from host to judge
->
-> > > > > > > > if it is working normally. Then, we hot add pci devices to
->
-> > > > > > > > this VM with bus
->
-> > > 0.
->
-> > > > > > > >
->
-> > > > > > > > We  found the virtio-net NIC in bus 4 is not working (can
->
-> > > > > > > > not
->
-> > > > > > > > connect) occasionally, as it kick virtio backend failure with
->
-> > > > > > > > error
->
-> below:
->
-> > > > > > > >
->
-> > > > > > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > memory-region: pci_bridge_pci
->
-> > > > > > > >
->
-> > > > > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
->
-> > > > > > > > pci_bridge_pci
->
-> > > > > > > >
->
-> > > > > > > >     00000000fc800000-00000000fc803fff (prio 1, RW):
->
-> > > > > > > > virtio-pci
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > > > > > virtio-pci-common
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > > > > > virtio-pci-isr
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > > > > > virtio-pci-device
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > > > > > virtio-pci-notify  <- io mem unassigned
->
-> > > > > > > >
->
-> > > > > > > >       â¦
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > We caught an exceptional address changing while this
->
-> > > > > > > > problem happened, show as
->
-> > > > > > > > follow:
->
-> > > > > > > >
->
-> > > > > > > > Before pci_bridge_update_mappingsï¼
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc000000-00000000fc1fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc200000-00000000fc3fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc400000-00000000fc5fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc600000-00000000fc7fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc800000-00000000fc9fffff
->
-> > > > > > > > <- correct Adress Spce
->
-> > > > > > > >
->
-> > > > > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fca00000-00000000fcbfffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fcc00000-00000000fcdfffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fce00000-00000000fcffffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fce00000-00000000fcffffff
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > After pci_bridge_update_mappingsï¼
->
-> > > > > > > >
->
-> > > > > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fda00000-00000000fdbfffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fdc00000-00000000fddfffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fde00000-00000000fdffffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fde00000-00000000fdffffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe000000-00000000fe1fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe200000-00000000fe3fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe400000-00000000fe5fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe600000-00000000fe7fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe800000-00000000fe9fffff
->
-> > > > > > > >
->
-> > > > > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW):
->
-> > > > > > > > alias
->
-> > > > > pci_bridge_pref_mem
->
-> > > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <-
->
-> > > > > > > > Exceptional
->
-> > > Adress
->
-> > > > > > > Space
->
-> > > > > > >
->
-> > > > > > > This one is empty though right?
->
-> > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > We have figured out why this address becomes this value,
->
-> > > > > > > > according to pci spec,  pci driver can get BAR address
->
-> > > > > > > > size by writing 0xffffffff to
->
-> > > > > > > >
->
-> > > > > > > > the pci register firstly, and then read back the value from
->
-> > > > > > > > this
->
-> register.
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > OK however as you show below the BAR being sized is the BAR
->
-> > > > > > > if a bridge. Are you then adding a bridge device by hotplug?
->
-> > > > > >
->
-> > > > > > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > > > > > interesting phenomenon is If I hot plug the device to other
->
-> > > > > > bus, this doesn't
->
-> > > > > happened.
->
-> > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > > We didn't handle this value  specially while process pci
->
-> > > > > > > > write in qemu, the function call stack is:
->
-> > > > > > > >
->
-> > > > > > > > Pci_bridge_dev_write_config
->
-> > > > > > > >
->
-> > > > > > > > -> pci_bridge_write_config
->
-> > > > > > > >
->
-> > > > > > > > -> pci_default_write_config (we update the config[address]
->
-> > > > > > > > -> value here to
->
-> > > > > > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > > > > > >
->
-> > > > > > > > -> pci_bridge_update_mappings
->
-> > > > > > > >
->
-> > > > > > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > > > > > >
->
-> > > > > > > > -> pci_bridge_region_init
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use
->
-> > > > > > > > -> the
->
-> > > > > > > > wrong value
->
-> > > > > > > > fffffffffc800000)
->
-> > > > > > > >
->
-> > > > > > > >                                                 ->
->
-> > > > > > > > memory_region_transaction_commit
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > So, as we can see, we use the wrong base address in qemu
->
-> > > > > > > > to update the memory regions, though, we update the base
->
-> > > > > > > > address to
->
-> > > > > > > >
->
-> > > > > > > > The correct value after pci driver in VM write the
->
-> > > > > > > > original value back, the virtio NIC in bus 4 may still
->
-> > > > > > > > sends net packets concurrently with
->
-> > > > > > > >
->
-> > > > > > > > The wrong memory region address.
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > We have tried to skip the memory region update action in
->
-> > > > > > > > qemu while detect pci write with 0xffffffff value, and it
->
-> > > > > > > > does work, but
->
-> > > > > > > >
->
-> > > > > > > > This seems to be not gently.
->
-> > > > > > >
->
-> > > > > > > For sure. But I'm still puzzled as to why does Linux try to
->
-> > > > > > > size the BAR of the bridge while a device behind it is used.
->
-> > > > > > >
->
-> > > > > > > Can you pls post your QEMU command line?
->
-> > > > > >
->
-> > > > > > My QEMU command line:
->
-> > > > > > /root/xyd/qemu-system-x86_64 -name
->
-> > > > > > guest=Linux,debug-threads=on -S -object
->
-> > > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom
->
-> > > > > > ain-
->
-> > > > > > 194-
->
-> > > > > > Linux/master-key.aes -machine
->
-> > > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off
->
-> > > > > > -smp
->
-> > > > > > 20,sockets=20,cores=1,threads=1 -numa
->
-> > > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-> > > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config
->
-> > > > > > -nodefaults -chardev
->
-> > > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li
->
-> > > > > > nux/
->
-> > > > > > moni
->
-> > > > > > tor.sock,server,nowait -mon
->
-> > > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc
->
-> > > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown
->
-> > > > > > -boot strict=on -device
->
-> > > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id
->
-> > > > > > =dri
->
-> > > > > > ve-v
->
-> > > > > > irtio-disk0,cache=none -device
->
-> > > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-
->
-> > > > > > disk
->
-> > > > > > 0,id
->
-> > > > > > =virtio-disk0,bootindex=1 -drive
->
-> > > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1
->
-> > > > > > -netdev
->
-> > > > > > tap,fd=35,id=hostnet0 -device
->
-> > > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b
->
-> > > > > > us=p
->
-> > > > > > ci.4
->
-> > > > > > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > > > > > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
->
-> > > > > > timestamp=on
->
-> > > > > >
->
-> > > > > > I am also very curious about this issue, in the linux kernel
->
-> > > > > > code, maybe double
->
-> > > > > check in function pci_bridge_check_ranges triggered this problem.
->
-> > > > >
->
-> > > > > If you can get the stacktrace in Linux when it tries to write
->
-> > > > > this fffff value, that would be quite helpful.
->
-> > > > >
->
-> > > >
->
-> > > > After I add mdelay(100) in function pci_bridge_check_ranges, this
->
-> > > > phenomenon is easier to reproduce, below is my modify in kernel:
->
-> > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
->
-> > > > index cb389277..86e232d 100644
->
-> > > > --- a/drivers/pci/setup-bus.c
->
-> > > > +++ b/drivers/pci/setup-bus.c
->
-> > > > @@ -27,7 +27,7 @@
->
-> > > >  #include <linux/slab.h>
->
-> > > >  #include <linux/acpi.h>
->
-> > > >  #include "pci.h"
->
-> > > > -
->
-> > > > +#include <linux/delay.h>
->
-> > > >  unsigned int pci_flags;
->
-> > > >
->
-> > > >  struct pci_dev_resource {
->
-> > > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct
->
-> > > > pci_bus
->
-> > > *bus)
->
-> > > >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> > > >                                                0xffffffff);
->
-> > > >                 pci_read_config_dword(bridge,
->
-> > > > PCI_PREF_BASE_UPPER32, &tmp);
->
-> > > > +               mdelay(100);
->
-> > > > +               printk(KERN_ERR "sleep\n");
->
-> > > > +                dump_stack();
->
-> > > >                 if (!tmp)
->
-> > > >                         b_res[2].flags &= ~IORESOURCE_MEM_64;
->
-> > > >                 pci_write_config_dword(bridge,
->
-> > > > PCI_PREF_BASE_UPPER32,
->
-> > > >
->
-> > >
->
-> > > OK!
->
-> > > I just sent a Linux patch that should help.
->
-> > > I would appreciate it if you will give it a try and if that helps
->
-> > > reply to it with a
->
-> > > Tested-by: tag.
->
-> > >
->
-> >
->
-> > I tested this patch and it works fine on my machine.
->
-> >
->
-> > But I have another question, if we only fix this problem in the
->
-> > kernel, the Linux version that has been released does not work well on the
->
-> virtualization platform.
->
-> > Is there a way to fix this problem in the backend?
->
->
->
-> There could we a way to work around this.
->
-> Does below help?
->
->
-I am sorry to tell you, I tested this patch and it doesn't work fine.
->
->
->
->
-> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> 236a20eaa8..7834cac4b0 100644
->
-> --- a/hw/i386/acpi-build.c
->
-> +++ b/hw/i386/acpi-build.c
->
-> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> *parent_scope, PCIBus *bus,
->
->
->
->          aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
->
->          aml_append(method,
->
-> -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check
->
-> */)
->
-> +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device
->
-> + Check Light */)
->
->          );
->
->          aml_append(method,
->
->              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request
->
-> */)
-Oh I see, another bug:
-
-        case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
-                acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT 
-event\n");
-                /* TBD: Exactly what does 'light' mean? */
-                break;
-
-And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type)
-and friends all just ignore this event type.
-
-
-
--- 
-MST
-
->
-> > > > > > > > > Hi all,
->
-> > > > > > > > >
->
-> > > > > > > > >
->
-> > > > > > > > >
->
-> > > > > > > > > In our test, we configured VM with several pci-bridges
->
-> > > > > > > > > and a virtio-net nic been attached with bus 4,
->
-> > > > > > > > >
->
-> > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > judge if it is working normally. Then, we hot add pci
->
-> > > > > > > > > devices to this VM with bus
->
-> > > > 0.
->
-> > > > > > > > >
->
-> > > > > > > > > We  found the virtio-net NIC in bus 4 is not working
->
-> > > > > > > > > (can not
->
-> > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > failure with error
->
-> > > But I have another question, if we only fix this problem in the
->
-> > > kernel, the Linux version that has been released does not work
->
-> > > well on the
->
-> > virtualization platform.
->
-> > > Is there a way to fix this problem in the backend?
->
-> >
->
-> > There could we a way to work around this.
->
-> > Does below help?
->
->
->
-> I am sorry to tell you, I tested this patch and it doesn't work fine.
->
->
->
-> >
->
-> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> > 236a20eaa8..7834cac4b0 100644
->
-> > --- a/hw/i386/acpi-build.c
->
-> > +++ b/hw/i386/acpi-build.c
->
-> > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> > *parent_scope, PCIBus *bus,
->
-> >
->
-> >          aml_append(method, aml_store(aml_int(bsel_val),
->
-aml_name("BNUM")));
->
-> >          aml_append(method,
->
-> > -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device
->
-> > Check
->
-*/)
->
-> > +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /*
->
-> > + Device Check Light */)
->
-> >          );
->
-> >          aml_append(method,
->
-> >              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject
->
-> > Request */)
->
->
->
-Oh I see, another bug:
->
->
-case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
->
-acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT
->
-event\n");
->
-/* TBD: Exactly what does 'light' mean? */
->
-break;
->
->
-And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type)
->
-and friends all just ignore this event type.
->
->
->
->
---
->
-MST
-Hi Michael,
-
-If we want to fix this problem on the backend, it is not enough to consider 
-only PCI
-device hot plugging, because I found that if we use a command like
-"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to reproduce.
-
-From the perspective of device emulation, when guest writes 0xffffffff to the 
-BAR,
-guest just want to get the size of the region but not really updating the 
-address space.
-So I made the following patch to avoid  update pci mapping.
-
-Do you think this make sense?
-
-[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
-
-When guest writes 0xffffffff to the BAR, guest just want to get the size of the 
-region
-but not really updating the address space.
-So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings 
-or pci_bridge_update_mappings.
-
-Signed-off-by: xuyandong <address@hidden>
----
- hw/pci/pci.c        | 6 ++++--
- hw/pci/pci_bridge.c | 8 +++++---
- 2 files changed, 9 insertions(+), 5 deletions(-)
-
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index 56b13b3..ef368e1 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
-addr, uint32_t val_in, int
- {
-     int i, was_irq_disabled = pci_irq_disabled(d);
-     uint32_t val = val_in;
-+    uint64_t barmask = (1 << l*8) - 1;
- 
-     for (i = 0; i < l; val >>= 8, ++i) {
-         uint8_t wmask = d->wmask[addr + i];
-@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
-addr, uint32_t val_in, int
-         d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
-         d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
-     }
--    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
-+    if ((val_in != barmask &&
-+       (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
-         ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
--        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
-+        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
-         range_covers_byte(addr, l, PCI_COMMAND))
-         pci_update_mappings(d);
- 
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
-index ee9dff2..f2bad79 100644
---- a/hw/pci/pci_bridge.c
-+++ b/hw/pci/pci_bridge.c
-@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
-     PCIBridge *s = PCI_BRIDGE(d);
-     uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
-     uint16_t newctl;
-+    uint64_t barmask = (1 << len * 8) - 1;
- 
-     pci_default_write_config(d, address, val, len);
- 
-     if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
- 
--        /* io base/limit */
--        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
-+        (val != barmask &&
-+       /* io base/limit */
-+        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
- 
-         /* memory base/limit, prefetchable base/limit and
-            io base/limit upper 16 */
--        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
-+        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
- 
-         /* vga enable */
-         ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
--- 
-1.8.3.1
-
-On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote:
->
-> > > > > > > > > > Hi all,
->
-> > > > > > > > > >
->
-> > > > > > > > > >
->
-> > > > > > > > > >
->
-> > > > > > > > > > In our test, we configured VM with several pci-bridges
->
-> > > > > > > > > > and a virtio-net nic been attached with bus 4,
->
-> > > > > > > > > >
->
-> > > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > > judge if it is working normally. Then, we hot add pci
->
-> > > > > > > > > > devices to this VM with bus
->
-> > > > > 0.
->
-> > > > > > > > > >
->
-> > > > > > > > > > We  found the virtio-net NIC in bus 4 is not working
->
-> > > > > > > > > > (can not
->
-> > > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > > failure with error
->
->
-> > > > But I have another question, if we only fix this problem in the
->
-> > > > kernel, the Linux version that has been released does not work
->
-> > > > well on the
->
-> > > virtualization platform.
->
-> > > > Is there a way to fix this problem in the backend?
->
-> > >
->
-> > > There could we a way to work around this.
->
-> > > Does below help?
->
-> >
->
-> > I am sorry to tell you, I tested this patch and it doesn't work fine.
->
-> >
->
-> > >
->
-> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> > > 236a20eaa8..7834cac4b0 100644
->
-> > > --- a/hw/i386/acpi-build.c
->
-> > > +++ b/hw/i386/acpi-build.c
->
-> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> > > *parent_scope, PCIBus *bus,
->
-> > >
->
-> > >          aml_append(method, aml_store(aml_int(bsel_val),
->
-> aml_name("BNUM")));
->
-> > >          aml_append(method,
->
-> > > -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device
->
-> > > Check
->
-> */)
->
-> > > +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /*
->
-> > > + Device Check Light */)
->
-> > >          );
->
-> > >          aml_append(method,
->
-> > >              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject
->
-> > > Request */)
->
->
->
->
->
-> Oh I see, another bug:
->
->
->
->         case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
->
->                 acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT
->
-> event\n");
->
->                 /* TBD: Exactly what does 'light' mean? */
->
->                 break;
->
->
->
-> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type)
->
-> and friends all just ignore this event type.
->
->
->
->
->
->
->
-> --
->
-> MST
->
->
-Hi Michael,
->
->
-If we want to fix this problem on the backend, it is not enough to consider
->
-only PCI
->
-device hot plugging, because I found that if we use a command like
->
-"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to
->
-reproduce.
->
->
-From the perspective of device emulation, when guest writes 0xffffffff to the
->
-BAR,
->
-guest just want to get the size of the region but not really updating the
->
-address space.
->
-So I made the following patch to avoid  update pci mapping.
->
->
-Do you think this make sense?
->
->
-[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
->
->
-When guest writes 0xffffffff to the BAR, guest just want to get the size of
->
-the region
->
-but not really updating the address space.
->
-So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings
->
-or pci_bridge_update_mappings.
->
->
-Signed-off-by: xuyandong <address@hidden>
-I see how that will address the common case however there are a bunch of
-issues here.  First of all it's easy to trigger the update by some other
-action like VM migration.  More importantly it's just possible that
-guest actually does want to set the low 32 bit of the address to all
-ones.  For example, that is clearly listed as a way to disable all
-devices behind the bridge in the pci to pci bridge spec.
-
-Given upstream is dragging it's feet I'm open to adding a flag
-that will help keep guests going as a temporary measure.
-We will need to think about ways to restrict this as much as
-we can.
-
-
->
----
->
-hw/pci/pci.c        | 6 ++++--
->
-hw/pci/pci_bridge.c | 8 +++++---
->
-2 files changed, 9 insertions(+), 5 deletions(-)
->
->
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
->
-index 56b13b3..ef368e1 100644
->
---- a/hw/pci/pci.c
->
-+++ b/hw/pci/pci.c
->
-@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t
->
-addr, uint32_t val_in, int
->
-{
->
-int i, was_irq_disabled = pci_irq_disabled(d);
->
-uint32_t val = val_in;
->
-+    uint64_t barmask = (1 << l*8) - 1;
->
->
-for (i = 0; i < l; val >>= 8, ++i) {
->
-uint8_t wmask = d->wmask[addr + i];
->
-@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t
->
-addr, uint32_t val_in, int
->
-d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
->
-d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
->
-}
->
--    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-+    if ((val_in != barmask &&
->
-+     (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
->
--        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
->
-+        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
->
-range_covers_byte(addr, l, PCI_COMMAND))
->
-pci_update_mappings(d);
->
->
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
-index ee9dff2..f2bad79 100644
->
---- a/hw/pci/pci_bridge.c
->
-+++ b/hw/pci/pci_bridge.c
->
-@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
->
-PCIBridge *s = PCI_BRIDGE(d);
->
-uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
->
-uint16_t newctl;
->
-+    uint64_t barmask = (1 << len * 8) - 1;
->
->
-pci_default_write_config(d, address, val, len);
->
->
-if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
--        /* io base/limit */
->
--        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-+        (val != barmask &&
->
-+     /* io base/limit */
->
-+        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
-/* memory base/limit, prefetchable base/limit and
->
-io base/limit upper 16 */
->
--        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-+        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
->
->
-/* vga enable */
->
-ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
---
->
-1.8.3.1
->
->
->
-
->
------Original Message-----
->
-From: Michael S. Tsirkin [
-mailto:address@hidden
->
-Sent: Monday, January 07, 2019 11:06 PM
->
-To: xuyandong <address@hidden>
->
-Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu-
->
-address@hidden; Zhanghailiang <address@hidden>;
->
-wangxin (U) <address@hidden>; Huangweidong (C)
->
-<address@hidden>
->
-Subject: Re: [BUG]Unassigned mem write during pci device hot-plug
->
->
-On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote:
->
-> > > > > > > > > > > Hi all,
->
-> > > > > > > > > > >
->
-> > > > > > > > > > >
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > In our test, we configured VM with several
->
-> > > > > > > > > > > pci-bridges and a virtio-net nic been attached
->
-> > > > > > > > > > > with bus 4,
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > > > judge if it is working normally. Then, we hot add
->
-> > > > > > > > > > > pci devices to this VM with bus
->
-> > > > > > 0.
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > We  found the virtio-net NIC in bus 4 is not
->
-> > > > > > > > > > > working (can not
->
-> > > > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > > > failure with error
->
->
->
-> > > > > But I have another question, if we only fix this problem in
->
-> > > > > the kernel, the Linux version that has been released does not
->
-> > > > > work well on the
->
-> > > > virtualization platform.
->
-> > > > > Is there a way to fix this problem in the backend?
->
->
->
-> Hi Michael,
->
->
->
-> If we want to fix this problem on the backend, it is not enough to
->
-> consider only PCI device hot plugging, because I found that if we use
->
-> a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is very
->
-easy to reproduce.
->
->
->
-> From the perspective of device emulation, when guest writes 0xffffffff
->
-> to the BAR, guest just want to get the size of the region but not really
->
-updating the address space.
->
-> So I made the following patch to avoid  update pci mapping.
->
->
->
-> Do you think this make sense?
->
->
->
-> [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
->
->
->
-> When guest writes 0xffffffff to the BAR, guest just want to get the
->
-> size of the region but not really updating the address space.
->
-> So when guest writes 0xffffffff to BAR, we need avoid
->
-> pci_update_mappings or pci_bridge_update_mappings.
->
->
->
-> Signed-off-by: xuyandong <address@hidden>
->
->
-I see how that will address the common case however there are a bunch of
->
-issues here.  First of all it's easy to trigger the update by some other
->
-action like
->
-VM migration.  More importantly it's just possible that guest actually does
->
-want
->
-to set the low 32 bit of the address to all ones.  For example, that is
->
-clearly
->
-listed as a way to disable all devices behind the bridge in the pci to pci
->
-bridge
->
-spec.
-Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable 
-Base Upper 32 Bits
-to meet the kernel double check problem.
-Do you think there is still risk?
-
->
->
-Given upstream is dragging it's feet I'm open to adding a flag that will help
->
-keep guests going as a temporary measure.
->
-We will need to think about ways to restrict this as much as we can.
->
->
->
-> ---
->
->  hw/pci/pci.c        | 6 ++++--
->
->  hw/pci/pci_bridge.c | 8 +++++---
->
->  2 files changed, 9 insertions(+), 5 deletions(-)
->
->
->
-> diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644
->
-> --- a/hw/pci/pci.c
->
-> +++ b/hw/pci/pci.c
->
-> @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d,
->
-> uint32_t addr, uint32_t val_in, int  {
->
->      int i, was_irq_disabled = pci_irq_disabled(d);
->
->      uint32_t val = val_in;
->
-> +    uint64_t barmask = (1 << l*8) - 1;
->
->
->
->      for (i = 0; i < l; val >>= 8, ++i) {
->
->          uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@
->
-> void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in,
->
-int
->
->          d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val &
->
-> wmask);
->
->          d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear
->
-> */
->
->      }
->
-> -    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-> +    if ((val_in != barmask &&
->
-> +   (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
->          ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
->
-> -        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
->
-> +        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
->
->          range_covers_byte(addr, l, PCI_COMMAND))
->
->          pci_update_mappings(d);
->
->
->
-> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index
->
-> ee9dff2..f2bad79 100644
->
-> --- a/hw/pci/pci_bridge.c
->
-> +++ b/hw/pci/pci_bridge.c
->
-> @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
->
->      PCIBridge *s = PCI_BRIDGE(d);
->
->      uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
->
->      uint16_t newctl;
->
-> +    uint64_t barmask = (1 << len * 8) - 1;
->
->
->
->      pci_default_write_config(d, address, val, len);
->
->
->
->      if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
->
-> -        /* io base/limit */
->
-> -        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> +        (val != barmask &&
->
-> +   /* io base/limit */
->
-> +        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
->
->          /* memory base/limit, prefetchable base/limit and
->
->             io base/limit upper 16 */
->
-> -        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-> +        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
->
->
->
->          /* vga enable */
->
->          ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
-> --
->
-> 1.8.3.1
->
->
->
->
->
->
-
-On Mon, Jan 07, 2019 at 03:28:36PM +0000, xuyandong wrote:
->
->
->
-> -----Original Message-----
->
-> From: Michael S. Tsirkin [
-mailto:address@hidden
->
-> Sent: Monday, January 07, 2019 11:06 PM
->
-> To: xuyandong <address@hidden>
->
-> Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu-
->
-> address@hidden; Zhanghailiang <address@hidden>;
->
-> wangxin (U) <address@hidden>; Huangweidong (C)
->
-> <address@hidden>
->
-> Subject: Re: [BUG]Unassigned mem write during pci device hot-plug
->
->
->
-> On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote:
->
-> > > > > > > > > > > > Hi all,
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > > In our test, we configured VM with several
->
-> > > > > > > > > > > > pci-bridges and a virtio-net nic been attached
->
-> > > > > > > > > > > > with bus 4,
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > > > > judge if it is working normally. Then, we hot add
->
-> > > > > > > > > > > > pci devices to this VM with bus
->
-> > > > > > > 0.
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > > We  found the virtio-net NIC in bus 4 is not
->
-> > > > > > > > > > > > working (can not
->
-> > > > > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > > > > failure with error
->
-> >
->
-> > > > > > But I have another question, if we only fix this problem in
->
-> > > > > > the kernel, the Linux version that has been released does not
->
-> > > > > > work well on the
->
-> > > > > virtualization platform.
->
-> > > > > > Is there a way to fix this problem in the backend?
->
-> >
->
-> > Hi Michael,
->
-> >
->
-> > If we want to fix this problem on the backend, it is not enough to
->
-> > consider only PCI device hot plugging, because I found that if we use
->
-> > a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is
->
-> > very
->
-> easy to reproduce.
->
-> >
->
-> > From the perspective of device emulation, when guest writes 0xffffffff
->
-> > to the BAR, guest just want to get the size of the region but not really
->
-> updating the address space.
->
-> > So I made the following patch to avoid  update pci mapping.
->
-> >
->
-> > Do you think this make sense?
->
-> >
->
-> > [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
->
-> >
->
-> > When guest writes 0xffffffff to the BAR, guest just want to get the
->
-> > size of the region but not really updating the address space.
->
-> > So when guest writes 0xffffffff to BAR, we need avoid
->
-> > pci_update_mappings or pci_bridge_update_mappings.
->
-> >
->
-> > Signed-off-by: xuyandong <address@hidden>
->
->
->
-> I see how that will address the common case however there are a bunch of
->
-> issues here.  First of all it's easy to trigger the update by some other
->
-> action like
->
-> VM migration.  More importantly it's just possible that guest actually does
->
-> want
->
-> to set the low 32 bit of the address to all ones.  For example, that is
->
-> clearly
->
-> listed as a way to disable all devices behind the bridge in the pci to pci
->
-> bridge
->
-> spec.
->
->
-Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable
->
-Base Upper 32 Bits
->
-to meet the kernel double check problem.
->
-Do you think there is still risk?
-Well it's non zero since spec says such a write should disable all
-accesses. Just an idea: why not add an option to disable upper 32 bit?
-That is ugly and limits space but spec compliant.
-
->
->
->
-> Given upstream is dragging it's feet I'm open to adding a flag that will
->
-> help
->
-> keep guests going as a temporary measure.
->
-> We will need to think about ways to restrict this as much as we can.
->
->
->
->
->
-> > ---
->
-> >  hw/pci/pci.c        | 6 ++++--
->
-> >  hw/pci/pci_bridge.c | 8 +++++---
->
-> >  2 files changed, 9 insertions(+), 5 deletions(-)
->
-> >
->
-> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644
->
-> > --- a/hw/pci/pci.c
->
-> > +++ b/hw/pci/pci.c
->
-> > @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d,
->
-> > uint32_t addr, uint32_t val_in, int  {
->
-> >      int i, was_irq_disabled = pci_irq_disabled(d);
->
-> >      uint32_t val = val_in;
->
-> > +    uint64_t barmask = (1 << l*8) - 1;
->
-> >
->
-> >      for (i = 0; i < l; val >>= 8, ++i) {
->
-> >          uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@
->
-> > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t
->
-> > val_in,
->
-> int
->
-> >          d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val &
->
-> > wmask);
->
-> >          d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to
->
-> > Clear */
->
-> >      }
->
-> > -    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-> > +    if ((val_in != barmask &&
->
-> > + (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-> >          ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
->
-> > -        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
->
-> > +        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
->
-> >          range_covers_byte(addr, l, PCI_COMMAND))
->
-> >          pci_update_mappings(d);
->
-> >
->
-> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index
->
-> > ee9dff2..f2bad79 100644
->
-> > --- a/hw/pci/pci_bridge.c
->
-> > +++ b/hw/pci/pci_bridge.c
->
-> > @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> >      PCIBridge *s = PCI_BRIDGE(d);
->
-> >      uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
->
-> >      uint16_t newctl;
->
-> > +    uint64_t barmask = (1 << len * 8) - 1;
->
-> >
->
-> >      pci_default_write_config(d, address, val, len);
->
-> >
->
-> >      if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> >
->
-> > -        /* io base/limit */
->
-> > -        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> > +        (val != barmask &&
->
-> > + /* io base/limit */
->
-> > +        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> >
->
-> >          /* memory base/limit, prefetchable base/limit and
->
-> >             io base/limit upper 16 */
->
-> > -        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-> > +        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
->
-> >
->
-> >          /* vga enable */
->
-> >          ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
-> > --
->
-> > 1.8.3.1
->
-> >
->
-> >
->
-> >
-
->
------Original Message-----
->
-From: xuyandong
->
-Sent: Monday, January 07, 2019 10:37 PM
->
-To: 'Michael S. Tsirkin' <address@hidden>
->
-Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu-
->
-address@hidden; Zhanghailiang <address@hidden>;
->
-wangxin (U) <address@hidden>; Huangweidong (C)
->
-<address@hidden>
->
-Subject: RE: [BUG]Unassigned mem write during pci device hot-plug
->
->
-> > > > > > > > > > Hi all,
->
-> > > > > > > > > >
->
-> > > > > > > > > >
->
-> > > > > > > > > >
->
-> > > > > > > > > > In our test, we configured VM with several
->
-> > > > > > > > > > pci-bridges and a virtio-net nic been attached with
->
-> > > > > > > > > > bus 4,
->
-> > > > > > > > > >
->
-> > > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > > judge if it is working normally. Then, we hot add
->
-> > > > > > > > > > pci devices to this VM with bus
->
-> > > > > 0.
->
-> > > > > > > > > >
->
-> > > > > > > > > > We  found the virtio-net NIC in bus 4 is not working
->
-> > > > > > > > > > (can not
->
-> > > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > > failure with error
->
->
-> > > > But I have another question, if we only fix this problem in the
->
-> > > > kernel, the Linux version that has been released does not work
->
-> > > > well on the
->
-> > > virtualization platform.
->
-> > > > Is there a way to fix this problem in the backend?
->
-> > >
->
-> > > There could we a way to work around this.
->
-> > > Does below help?
->
-> >
->
-> > I am sorry to tell you, I tested this patch and it doesn't work fine.
->
-> >
->
-> > >
->
-> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> > > 236a20eaa8..7834cac4b0 100644
->
-> > > --- a/hw/i386/acpi-build.c
->
-> > > +++ b/hw/i386/acpi-build.c
->
-> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> > > *parent_scope, PCIBus *bus,
->
-> > >
->
-> > >          aml_append(method, aml_store(aml_int(bsel_val),
->
-> aml_name("BNUM")));
->
-> > >          aml_append(method,
->
-> > > -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device
->
-Check
->
-> */)
->
-> > > +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /*
->
-> > > + Device Check Light */)
->
-> > >          );
->
-> > >          aml_append(method,
->
-> > >              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/*
->
-> > > Eject Request */)
->
->
->
->
->
-> Oh I see, another bug:
->
->
->
->         case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
->
->                 acpi_handle_debug(handle,
->
-> "ACPI_NOTIFY_DEVICE_CHECK_LIGHT event\n");
->
->                 /* TBD: Exactly what does 'light' mean? */
->
->                 break;
->
->
->
-> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32
->
-> type) and friends all just ignore this event type.
->
->
->
->
->
->
->
-> --
->
-> MST
->
->
-Hi Michael,
->
->
-If we want to fix this problem on the backend, it is not enough to consider
->
-only
->
-PCI device hot plugging, because I found that if we use a command like "echo
->
-1 >
->
-/sys/bus/pci/rescan" in guest, this problem is very easy to reproduce.
->
->
-From the perspective of device emulation, when guest writes 0xffffffff to the
->
-BAR, guest just want to get the size of the region but not really updating the
->
-address space.
->
-So I made the following patch to avoid  update pci mapping.
->
->
-Do you think this make sense?
->
->
-[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
->
->
-When guest writes 0xffffffff to the BAR, guest just want to get the size of
->
-the
->
-region but not really updating the address space.
->
-So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings or
->
-pci_bridge_update_mappings.
->
->
-Signed-off-by: xuyandong <address@hidden>
->
----
->
-hw/pci/pci.c        | 6 ++++--
->
-hw/pci/pci_bridge.c | 8 +++++---
->
-2 files changed, 9 insertions(+), 5 deletions(-)
->
->
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644
->
---- a/hw/pci/pci.c
->
-+++ b/hw/pci/pci.c
->
-@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t
->
-addr, uint32_t val_in, int  {
->
-int i, was_irq_disabled = pci_irq_disabled(d);
->
-uint32_t val = val_in;
->
-+    uint64_t barmask = (1 << l*8) - 1;
->
->
-for (i = 0; i < l; val >>= 8, ++i) {
->
-uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@ void
->
-pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
->
-d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
->
-d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
->
-}
->
--    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-+    if ((val_in != barmask &&
->
-+     (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
->
--        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
->
-+        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
->
-range_covers_byte(addr, l, PCI_COMMAND))
->
-pci_update_mappings(d);
->
->
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index ee9dff2..f2bad79
->
-100644
->
---- a/hw/pci/pci_bridge.c
->
-+++ b/hw/pci/pci_bridge.c
->
-@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
->
-PCIBridge *s = PCI_BRIDGE(d);
->
-uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
->
-uint16_t newctl;
->
-+    uint64_t barmask = (1 << len * 8) - 1;
->
->
-pci_default_write_config(d, address, val, len);
->
->
-if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
--        /* io base/limit */
->
--        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-+        (val != barmask &&
->
-+     /* io base/limit */
->
-+        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
-/* memory base/limit, prefetchable base/limit and
->
-io base/limit upper 16 */
->
--        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-+        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
->
->
-/* vga enable */
->
-ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
---
->
-1.8.3.1
->
->
-Sorry, please ignore the patch above.
-
-Here is the patch I want to post:
-
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index 56b13b3..38a300f 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
-addr, uint32_t val_in, int
- {
-     int i, was_irq_disabled = pci_irq_disabled(d);
-     uint32_t val = val_in;
-+    uint64_t barmask = ((uint64_t)1 << l*8) - 1;
- 
-     for (i = 0; i < l; val >>= 8, ++i) {
-         uint8_t wmask = d->wmask[addr + i];
-@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
-addr, uint32_t val_in, int
-         d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
-         d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
-     }
--    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
-+    if ((val_in != barmask &&
-+       (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
-         ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
--        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
-+        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
-         range_covers_byte(addr, l, PCI_COMMAND))
-         pci_update_mappings(d);
- 
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
-index ee9dff2..b8f7d48 100644
---- a/hw/pci/pci_bridge.c
-+++ b/hw/pci/pci_bridge.c
-@@ -253,20 +253,22 @@ void pci_bridge_write_config(PCIDevice *d,
-     PCIBridge *s = PCI_BRIDGE(d);
-     uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
-     uint16_t newctl;
-+    uint64_t barmask = ((uint64_t)1 << len * 8) - 1;
- 
-     pci_default_write_config(d, address, val, len);
- 
-     if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
- 
--        /* io base/limit */
--        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
-+        /* vga enable */
-+        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2) ||
- 
--        /* memory base/limit, prefetchable base/limit and
--           io base/limit upper 16 */
--        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
-+        (val != barmask &&
-+        /* io base/limit */
-+         (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
- 
--        /* vga enable */
--        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
-+         /* memory base/limit, prefetchable base/limit and
-+            io base/limit upper 16 */
-+         ranges_overlap(address, len, PCI_MEMORY_BASE, 20)))) {
-         pci_bridge_update_mappings(s);
-     }
- 
--- 
-1.8.3.1
-
diff --git a/classification_output/04/other/70416488 b/classification_output/04/other/70416488
deleted file mode 100644
index 6c0da5fdf..000000000
--- a/classification_output/04/other/70416488
+++ /dev/null
@@ -1,1187 +0,0 @@
-other: 0.980
-semantic: 0.975
-graphic: 0.972
-assembly: 0.953
-instruction: 0.947
-device: 0.945
-mistranslation: 0.942
-vnc: 0.910
-KVM: 0.908
-boot: 0.897
-network: 0.881
-socket: 0.870
-
-[Bug Report] smmuv3 event 0x10 report when running virtio-blk-pci
-
-Hi All,
-
-When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
-during kernel booting up.
-
-qemu command which I use is as below:
-
-qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
--kernel Image -initrd minifs.cpio.gz \
--enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
--append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
--device 
-pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 
-\
--device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
--device 
-virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
--drive file=/home/boot.img,if=none,id=drive0,format=raw
-
-smmuv3 event 0x10 log:
-[...]
-[    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
-[    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
-[    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
-[    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks (1.07 
-GB/1.00 GiB)
-[    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
-[    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
-[    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
-[    1.968381] clk: Disabling unused clocks
-[    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
-[    1.968990] PM: genpd: Disabling unused power domains
-[    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.969814] ALSA device list:
-[    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.970471]   No soundcards found.
-[    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
-[    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
-[    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
-[    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
-[    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.975005] Freeing unused kernel memory: 10112K
-[    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.975442] Run init as init process
-
-Another information is that if "maxcpus=3" is removed from the kernel command 
-line,
-it will be OK.
-
-I am not sure if there is a bug about vsmmu. It will be very appreciated if 
-anyone
-know this issue or can take a look at it.
-
-Thanks,
-Zhou
-
-On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->
-Hi All,
->
->
-When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
-during kernel booting up.
-Does it still do this if you either:
- (1) use the v9.1.0 release (commit fd1952d814da)
- (2) use "-machine virt-9.1" instead of "-machine virt"
-
-?
-
-My suspicion is that this will have started happening now that
-we expose an SMMU with two-stage translation support to the guest
-in the "virt" machine type (which we do not if you either
-use virt-9.1 or in the v9.1.0 release).
-
-I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
-the two-stage support).
-
->
-qemu command which I use is as below:
->
->
-qemu-system-aarch64 -machine
->
-virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
--kernel Image -initrd minifs.cpio.gz \
->
--enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
--append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
--device
->
-pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
-\
->
--device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
--device
->
-virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
--drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->
-smmuv3 event 0x10 log:
->
-[...]
->
-[    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
-[    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
-[    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
-[    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
-(1.07 GB/1.00 GiB)
->
-[    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
-[    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.968381] clk: Disabling unused clocks
->
-[    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.968990] PM: genpd: Disabling unused power domains
->
-[    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.969814] ALSA device list:
->
-[    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.970471]   No soundcards found.
->
-[    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.975005] Freeing unused kernel memory: 10112K
->
-[    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.975442] Run init as init process
->
->
-Another information is that if "maxcpus=3" is removed from the kernel command
->
-line,
->
-it will be OK.
->
->
-I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
-anyone
->
-know this issue or can take a look at it.
-thanks
--- PMM
-
-On 2024/9/9 22:31, Peter Maydell wrote:
->
-On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->
->
-> Hi All,
->
->
->
-> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
-> during kernel booting up.
->
->
-Does it still do this if you either:
->
-(1) use the v9.1.0 release (commit fd1952d814da)
->
-(2) use "-machine virt-9.1" instead of "-machine virt"
-I tested above two cases, the problem is still there.
-
->
->
-?
->
->
-My suspicion is that this will have started happening now that
->
-we expose an SMMU with two-stage translation support to the guest
->
-in the "virt" machine type (which we do not if you either
->
-use virt-9.1 or in the v9.1.0 release).
->
->
-I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
->
-the two-stage support).
->
->
-> qemu command which I use is as below:
->
->
->
-> qemu-system-aarch64 -machine
->
-> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
-> -kernel Image -initrd minifs.cpio.gz \
->
-> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
-> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
-> -device
->
-> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->  \
->
-> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
-> -device
->
-> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
-> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->
->
-> smmuv3 event 0x10 log:
->
-> [...]
->
-> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
-> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
-> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
-> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
-> (1.07 GB/1.00 GiB)
->
-> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
-> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.968381] clk: Disabling unused clocks
->
-> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.968990] PM: genpd: Disabling unused power domains
->
-> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.969814] ALSA device list:
->
-> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.970471]   No soundcards found.
->
-> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.975005] Freeing unused kernel memory: 10112K
->
-> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.975442] Run init as init process
->
->
->
-> Another information is that if "maxcpus=3" is removed from the kernel
->
-> command line,
->
-> it will be OK.
->
->
->
-> I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
-> anyone
->
-> know this issue or can take a look at it.
->
->
-thanks
->
--- PMM
->
-.
-
-Hi Zhou,
-On 9/10/24 03:24, Zhou Wang via wrote:
->
-On 2024/9/9 22:31, Peter Maydell wrote:
->
-> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->> Hi All,
->
->>
->
->> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
->> during kernel booting up.
->
-> Does it still do this if you either:
->
->  (1) use the v9.1.0 release (commit fd1952d814da)
->
->  (2) use "-machine virt-9.1" instead of "-machine virt"
->
-I tested above two cases, the problem is still there.
-Thank you for reporting. I am able to reproduce and effectively the
-maxcpus kernel option is triggering the issue. It works without. I will
-come back to you asap.
-
-Eric
->
->
-> ?
->
->
->
-> My suspicion is that this will have started happening now that
->
-> we expose an SMMU with two-stage translation support to the guest
->
-> in the "virt" machine type (which we do not if you either
->
-> use virt-9.1 or in the v9.1.0 release).
->
->
->
-> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
->
-> the two-stage support).
->
->
->
->> qemu command which I use is as below:
->
->>
->
->> qemu-system-aarch64 -machine
->
->> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
->> -kernel Image -initrd minifs.cpio.gz \
->
->> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
->> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
->> -device
->
->> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->>  \
->
->> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
->> -device
->
->> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
->> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->>
->
->> smmuv3 event 0x10 log:
->
->> [...]
->
->> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
->> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
->> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
->> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
->> (1.07 GB/1.00 GiB)
->
->> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
->> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.968381] clk: Disabling unused clocks
->
->> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.968990] PM: genpd: Disabling unused power domains
->
->> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.969814] ALSA device list:
->
->> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.970471]   No soundcards found.
->
->> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975005] Freeing unused kernel memory: 10112K
->
->> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975442] Run init as init process
->
->>
->
->> Another information is that if "maxcpus=3" is removed from the kernel
->
->> command line,
->
->> it will be OK.
->
->>
->
->> I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
->> anyone
->
->> know this issue or can take a look at it.
->
-> thanks
->
-> -- PMM
->
-> .
-
-Hi,
-
-On 9/10/24 03:24, Zhou Wang via wrote:
->
-On 2024/9/9 22:31, Peter Maydell wrote:
->
-> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->> Hi All,
->
->>
->
->> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
->> during kernel booting up.
->
-> Does it still do this if you either:
->
->  (1) use the v9.1.0 release (commit fd1952d814da)
->
->  (2) use "-machine virt-9.1" instead of "-machine virt"
->
-I tested above two cases, the problem is still there.
-I have not much progressed yet but I see it comes with
-qemu traces.
-
-smmuv3-iommu-memory-region-0-0 translation failed for iova=0x0
-(SMMU_EVT_F_TRANSLATION)
-../..
-qemu-system-aarch64: virtio-blk failed to set guest notifier (-22),
-ensure -accel kvm is set.
-qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to
-userspace (slower).
-
-the PCIe Host bridge seems to cause that translation failure at iova=0
-
-Also virtio-iommu has the same issue:
-qemu-system-aarch64: virtio_iommu_translate no mapping for 0x0 for sid=1024
-qemu-system-aarch64: virtio-blk failed to set guest notifier (-22),
-ensure -accel kvm is set.
-qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to
-userspace (slower).
-
-Only happens with maxcpus=3. Note the virtio-blk-pci is not protected by
-the vIOMMU in your case.
-
-Thanks
-
-Eric
-
->
->
-> ?
->
->
->
-> My suspicion is that this will have started happening now that
->
-> we expose an SMMU with two-stage translation support to the guest
->
-> in the "virt" machine type (which we do not if you either
->
-> use virt-9.1 or in the v9.1.0 release).
->
->
->
-> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
->
-> the two-stage support).
->
->
->
->> qemu command which I use is as below:
->
->>
->
->> qemu-system-aarch64 -machine
->
->> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
->> -kernel Image -initrd minifs.cpio.gz \
->
->> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
->> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
->> -device
->
->> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->>  \
->
->> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
->> -device
->
->> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
->> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->>
->
->> smmuv3 event 0x10 log:
->
->> [...]
->
->> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
->> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
->> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
->> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
->> (1.07 GB/1.00 GiB)
->
->> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
->> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.968381] clk: Disabling unused clocks
->
->> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.968990] PM: genpd: Disabling unused power domains
->
->> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.969814] ALSA device list:
->
->> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.970471]   No soundcards found.
->
->> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975005] Freeing unused kernel memory: 10112K
->
->> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975442] Run init as init process
->
->>
->
->> Another information is that if "maxcpus=3" is removed from the kernel
->
->> command line,
->
->> it will be OK.
->
->>
->
->> I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
->> anyone
->
->> know this issue or can take a look at it.
->
-> thanks
->
-> -- PMM
->
-> .
-
-Hi Zhou,
-
-On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->
-Hi All,
->
->
-When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
-during kernel booting up.
->
->
-qemu command which I use is as below:
->
->
-qemu-system-aarch64 -machine
->
-virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
--kernel Image -initrd minifs.cpio.gz \
->
--enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
--append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
--device
->
-pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
-\
->
--device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
--device
->
-virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
--drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->
-smmuv3 event 0x10 log:
->
-[...]
->
-[    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
-[    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
-[    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
-[    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
-(1.07 GB/1.00 GiB)
->
-[    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
-[    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.968381] clk: Disabling unused clocks
->
-[    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.968990] PM: genpd: Disabling unused power domains
->
-[    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.969814] ALSA device list:
->
-[    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.970471]   No soundcards found.
->
-[    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.975005] Freeing unused kernel memory: 10112K
->
-[    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.975442] Run init as init process
->
->
-Another information is that if "maxcpus=3" is removed from the kernel command
->
-line,
->
-it will be OK.
->
-That's interesting, not sure how that would be related.
-
->
-I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
-anyone
->
-know this issue or can take a look at it.
->
-Can you please provide logs with adding "-d trace:smmu*" to qemu invocation.
-
-Also if possible, can you please provide which Linux kernel version
-you are using, I will see if I can repro.
-
-Thanks,
-Mostafa
-
->
-Thanks,
->
-Zhou
->
->
->
-
-On 2024/9/9 22:47, Mostafa Saleh wrote:
->
-Hi Zhou,
->
->
-On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->
->
-> Hi All,
->
->
->
-> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
-> during kernel booting up.
->
->
->
-> qemu command which I use is as below:
->
->
->
-> qemu-system-aarch64 -machine
->
-> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
-> -kernel Image -initrd minifs.cpio.gz \
->
-> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
-> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
-> -device
->
-> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->  \
->
-> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
-> -device
->
-> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
-> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->
->
-> smmuv3 event 0x10 log:
->
-> [...]
->
-> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
-> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
-> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
-> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
-> (1.07 GB/1.00 GiB)
->
-> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
-> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.968381] clk: Disabling unused clocks
->
-> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.968990] PM: genpd: Disabling unused power domains
->
-> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.969814] ALSA device list:
->
-> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.970471]   No soundcards found.
->
-> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.975005] Freeing unused kernel memory: 10112K
->
-> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.975442] Run init as init process
->
->
->
-> Another information is that if "maxcpus=3" is removed from the kernel
->
-> command line,
->
-> it will be OK.
->
->
->
->
-That's interesting, not sure how that would be related.
->
->
-> I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
-> anyone
->
-> know this issue or can take a look at it.
->
->
->
->
-Can you please provide logs with adding "-d trace:smmu*" to qemu invocation.
-Sure. Please see the attached log(using above qemu commit and command).
-
->
->
-Also if possible, can you please provide which Linux kernel version
->
-you are using, I will see if I can repro.
-I just use the latest mainline kernel(commit b831f83e40a2) with defconfig.
-
-Thanks,
-Zhou
-
->
->
-Thanks,
->
-Mostafa
->
->
-> Thanks,
->
-> Zhou
->
->
->
->
->
->
->
->
-.
-qemu_boot_log.txt
-Description:
-Text document
-
-On Tue, Sep 10, 2024 at 2:51â¯AM Zhou Wang <wangzhou1@hisilicon.com> wrote:
->
->
-On 2024/9/9 22:47, Mostafa Saleh wrote:
->
-> Hi Zhou,
->
->
->
-> On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->>
->
->> Hi All,
->
->>
->
->> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event
->
->> 0x10
->
->> during kernel booting up.
->
->>
->
->> qemu command which I use is as below:
->
->>
->
->> qemu-system-aarch64 -machine
->
->> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
->> -kernel Image -initrd minifs.cpio.gz \
->
->> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
->> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
->> -device
->
->> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->>  \
->
->> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1
->
->> \
->
->> -device
->
->> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
->> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->>
->
->> smmuv3 event 0x10 log:
->
->> [...]
->
->> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
->> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
->> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
->> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
->> (1.07 GB/1.00 GiB)
->
->> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
->> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.968381] clk: Disabling unused clocks
->
->> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.968990] PM: genpd: Disabling unused power domains
->
->> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.969814] ALSA device list:
->
->> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.970471]   No soundcards found.
->
->> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975005] Freeing unused kernel memory: 10112K
->
->> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975442] Run init as init process
->
->>
->
->> Another information is that if "maxcpus=3" is removed from the kernel
->
->> command line,
->
->> it will be OK.
->
->>
->
->
->
-> That's interesting, not sure how that would be related.
->
->
->
->> I am not sure if there is a bug about vsmmu. It will be very appreciated
->
->> if anyone
->
->> know this issue or can take a look at it.
->
->>
->
->
->
-> Can you please provide logs with adding "-d trace:smmu*" to qemu invocation.
->
->
-Sure. Please see the attached log(using above qemu commit and command).
->
-Thanks a lot, it seems the SMMUv3 indeed receives a translation
-request with addr 0x0 which causes this event.
-I don't see any kind of modification (alignment) of the address in this path.
-So my hunch it's not related to the SMMUv3 and the initiator is
-issuing bogus addresses.
-
->
->
->
-> Also if possible, can you please provide which Linux kernel version
->
-> you are using, I will see if I can repro.
->
->
-I just use the latest mainline kernel(commit b831f83e40a2) with defconfig.
->
-I see, I can't repro in my setup which has no "--enable-kvm" and with
-"-cpu max" instead of host.
-I will try other options and see if I can repro.
-
-Thanks,
-Mostafa
->
-Thanks,
->
-Zhou
->
->
->
->
-> Thanks,
->
-> Mostafa
->
->
->
->> Thanks,
->
->> Zhou
->
->>
->
->>
->
->>
->
->
->
-> .
-
diff --git a/classification_output/04/other/74715356 b/classification_output/04/other/74715356
deleted file mode 100644
index 8725e032f..000000000
--- a/classification_output/04/other/74715356
+++ /dev/null
@@ -1,134 +0,0 @@
-other: 0.927
-semantic: 0.916
-instruction: 0.910
-assembly: 0.905
-device: 0.900
-graphic: 0.894
-boot: 0.881
-mistranslation: 0.870
-KVM: 0.863
-vnc: 0.850
-socket: 0.843
-network: 0.838
-
-[Bug] x86 EFLAGS refresh is not happening correctly
-
-Hello,
-I'm posting this here instead of opening an issue as it is not clear to me if this is a bug or not.
-The issue is located in function "cpu_compute_eflags" in target/i386/cpu.h
-(
-https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/cpu.h#L2071
-)
-This function is exectued in an out of cpu loop context.
-It is used to synchronize TCG internal eflags registers (CC_OP, CC_SRC,Â  etc...) with the CPU eflags field upon loop exit.
-It does:
-Â  Â  eflags
-|=
-cpu_cc_compute_all
-(
-env
-,
-CC_OP
-)
-|
-(
-env
-->
-df
-&
-DF_MASK
-);
-Shouldn't it be:
-Â  Â  Â
-eflags
-=
-cpu_cc_compute_all
-(
-env
-,
-CC_OP
-)
-|
-(
-env
-->
-df
-&
-DF_MASK
-);
-as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
-Thanks,
-Kind regards,
-Stevie
-
-On 05/08/21 11:51, Stevie Lavern wrote:
-Shouldn't it be:
-eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
-as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
-No, both are wrong.  env->eflags contains flags other than the
-arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
-The right code is in helper_read_eflags.  You can move it into
-cpu_compute_eflags, and make helper_read_eflags use it.
-Paolo
-
-On 05/08/21 13:24, Paolo Bonzini wrote:
-On 05/08/21 11:51, Stevie Lavern wrote:
-Shouldn't it be:
-eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
-as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
-No, both are wrong.Â  env->eflags contains flags other than the
-arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
-The right code is in helper_read_eflags.Â  You can move it into
-cpu_compute_eflags, and make helper_read_eflags use it.
-Ah, actually the two are really the same, the TF/VM bits do not apply to
-cpu_compute_eflags so it's correct.
-What seems wrong is migration of the EFLAGS register.  There should be
-code in cpu_pre_save and cpu_post_load to special-case it and setup
-CC_DST/CC_OP as done in cpu_load_eflags.
-Also, cpu_load_eflags should assert that update_mask does not include
-any of the arithmetic flags.
-Paolo
-
-Thank for your reply!
-It's still a bit cryptic for me.
-I think i need to precise that I'm using a x86_64 custom user-mode,base on linux user-mode, that i'm developing (unfortunately i cannot share the code) with modifications in the translation loop (I've added cpu loop exits on specific instructions which are not control flow instructions).
-If my understanding is correct, in the user-mode case 'cpu_compute_eflags' is called directly by 'x86_cpu_exec_exit' with the intention of synchronizing the CPU env->eflags field with its real value (represented by the CC_* fields).
-I'm not sure how 'cpu_pre_save' and 'cpu_post_load' are involved in this case.
-Â
-As you said in your first email, 'helper_read_eflags' seems to be the correct way to go.
-Here is some detail about my current experimentation/understanding of this "issue":
-With the current implementationÂ
-Â  Â  Â  Â  Â
-eflags |= cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
-if I exit the loop with a CC_OP different from CC_OP_EFLAGS, I found that the resulting env->eflags may be invalid.
-In my test case, the loop was exiting with eflags = 0x44 and CC_OP = CC_OP_SUBL with CC_DST=1, CC_SRC=258, CC_SRC2=0.
-While 'cpu_cc_compute_all' computes the correct flags (ZF:0, PF:0), the result will still be 0x44 (ZF:1, PF:1) due to the 'or' operation, thus leading to an incorrect eflags value loaded into the CPU env.Â
-In my case, after loop reentry, it led to an invalid branch to be taken.
-Thanks for your time!
-Regards
-Stevie
-Â
-On Thu, Aug 5, 2021 at 1:33 PM Paolo Bonzini <
-pbonzini@redhat.com
-> wrote:
-On 05/08/21 13:24, Paolo Bonzini wrote:
-> On 05/08/21 11:51, Stevie Lavern wrote:
->>
->> Shouldn't it be:
->> eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
->> as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
->
-> No, both are wrong.Â  env->eflags contains flags other than the
-> arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
->
-> The right code is in helper_read_eflags.Â  You can move it into
-> cpu_compute_eflags, and make helper_read_eflags use it.
-Ah, actually the two are really the same, the TF/VM bits do not apply to
-cpu_compute_eflags so it's correct.
-What seems wrong is migration of the EFLAGS register.Â  There should be
-code in cpu_pre_save and cpu_post_load to special-case it and setup
-CC_DST/CC_OP as done in cpu_load_eflags.
-Also, cpu_load_eflags should assert that update_mask does not include
-any of the arithmetic flags.
-Paolo
-
diff --git a/classification_output/04/other/79834768 b/classification_output/04/other/79834768
deleted file mode 100644
index a773a4984..000000000
--- a/classification_output/04/other/79834768
+++ /dev/null
@@ -1,417 +0,0 @@
-other: 0.943
-graphic: 0.933
-semantic: 0.920
-assembly: 0.918
-device: 0.915
-socket: 0.912
-instruction: 0.911
-vnc: 0.885
-boot: 0.880
-mistranslation: 0.877
-KVM: 0.840
-network: 0.830
-
-[Qemu-devel] [BUG] Windows 7 got stuck easily while run PCMark10 application
-
-Hiï¼
-
-We hit a bug in our test while run PCMark 10 in a windows 7 VM,
-The VM got stuck and the wallclock was hang after several minutes running
-PCMark 10 in it.
-It is quite easily to reproduce the bug with the upstream KVM and Qemu.
-
-We found that KVM can not inject any RTC irq to VM after it was hang, it fails 
-to
-Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.
-
-static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
-                  int irq_level, bool line_status)
-{
-â¦ â¦
-         if (!irq_level) {
-                  ioapic->irr &= ~mask;
-                  ret = 1;
-                  goto out;
-         }
-â¦ â¦
-         if ((edge && old_irr == ioapic->irr) ||
-             (!edge && entry.fields.remote_irr)) {
-                  ret = 0;
-                  goto out;
-         }
-
-According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs
-register C to to clear the irq flag, and pull down the irq electric pin.
-
-For Qemu, we will emulate the reading operation in cmos_ioport_read(),
-but Guest OS will fire a write operation before to tell which register will be 
-read
-after this write, where we use s->cmos_index to record the following register 
-to read.
-
-But in our test, we found that there is a possible situation that Vcpu fails to 
-read
-RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading
-registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
-so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
-but before it tries to read register C, another vcpu1 is going to read RTC_YEAR,
-it changes s->cmos_index to RTC_YEAR by a writing action.
-The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we 
-will miss
-calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never 
-inject RTC irq,
-and Windows VM will hang.
-static void cmos_ioport_write(void *opaque, hwaddr addr,
-                              uint64_t data, unsigned size)
-{
-    RTCState *s = opaque;
-
-    if ((addr & 1) == 0) {
-        s->cmos_index = data & 0x7f;
-    }
-â¦â¦
-static uint64_t cmos_ioport_read(void *opaque, hwaddr addr,
-                                 unsigned size)
-{
-    RTCState *s = opaque;
-    int ret;
-    if ((addr & 1) == 0) {
-        return 0xff;
-    } else {
-        switch(s->cmos_index) {
-
-According to CMOS spec, âany write to PROT 0070h should be followed by an 
-action to PROT 0071h or the RTC
-Will be RTC will be left in an unknown stateâ, but it seems that we can not 
-ensure this sequence in qemu/kvm.
-
-Any ideas ?
-
-Thanks,
-Hailiang
-
-Pls see the trace of kvm_pio:
-
-       CPU 1/KVM-15567 [003] .... 209311.762579: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762582: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x89
-       CPU 1/KVM-15567 [003] .... 209311.762590: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x17
-       CPU 0/KVM-15566 [005] .... 209311.762611: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0xc
-       CPU 1/KVM-15567 [003] .... 209311.762615: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762619: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x88
-       CPU 1/KVM-15567 [003] .... 209311.762627: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x12
-       CPU 0/KVM-15566 [005] .... 209311.762632: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x12
-       CPU 1/KVM-15567 [003] .... 209311.762633: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 0/KVM-15566 [005] .... 209311.762634: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0xc           <--- Firstly write to 0x70, cmo_index = 0xc & 
-0x7f = 0xc
-       CPU 1/KVM-15567 [003] .... 209311.762636: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x86       <-- Secondly write to 0x70, cmo_index = 0x86 & 
-0x7f = 0x6, cover the cmo_index result of first time
-       CPU 0/KVM-15566 [005] .... 209311.762641: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x6      <--  vcpu0 read 0x6 because cmo_index is 0x6 now
-       CPU 1/KVM-15567 [003] .... 209311.762644: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x6     <-  vcpu1 read 0x6
-       CPU 1/KVM-15567 [003] .... 209311.762649: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762669: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x87
-       CPU 1/KVM-15567 [003] .... 209311.762678: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x1
-       CPU 1/KVM-15567 [003] .... 209311.762683: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762686: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x84
-       CPU 1/KVM-15567 [003] .... 209311.762693: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x10
-       CPU 1/KVM-15567 [003] .... 209311.762699: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762702: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x82
-       CPU 1/KVM-15567 [003] .... 209311.762709: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x25
-       CPU 1/KVM-15567 [003] .... 209311.762714: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762717: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x80
-
-
-Regards,
--Gonglei
-
-From: Zhanghailiang
-Sent: Friday, December 01, 2017 3:03 AM
-To: address@hidden; address@hidden; Paolo Bonzini
-Cc: Huangweidong (C); Gonglei (Arei); wangxin (U); Xiexiangyou
-Subject: [BUG] Windows 7 got stuck easily while run PCMark10 application
-
-Hiï¼
-
-We hit a bug in our test while run PCMark 10 in a windows 7 VM,
-The VM got stuck and the wallclock was hang after several minutes running
-PCMark 10 in it.
-It is quite easily to reproduce the bug with the upstream KVM and Qemu.
-
-We found that KVM can not inject any RTC irq to VM after it was hang, it fails 
-to
-Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.
-
-static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
-                  int irq_level, bool line_status)
-{
-â¦ â¦
-         if (!irq_level) {
-                  ioapic->irr &= ~mask;
-                  ret = 1;
-                  goto out;
-         }
-â¦ â¦
-         if ((edge && old_irr == ioapic->irr) ||
-             (!edge && entry.fields.remote_irr)) {
-                  ret = 0;
-                  goto out;
-         }
-
-According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs
-register C to to clear the irq flag, and pull down the irq electric pin.
-
-For Qemu, we will emulate the reading operation in cmos_ioport_read(),
-but Guest OS will fire a write operation before to tell which register will be 
-read
-after this write, where we use s->cmos_index to record the following register 
-to read.
-
-But in our test, we found that there is a possible situation that Vcpu fails to 
-read
-RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading
-registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
-so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
-but before it tries to read register C, another vcpu1 is going to read RTC_YEAR,
-it changes s->cmos_index to RTC_YEAR by a writing action.
-The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we 
-will miss
-calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never 
-inject RTC irq,
-and Windows VM will hang.
-static void cmos_ioport_write(void *opaque, hwaddr addr,
-                              uint64_t data, unsigned size)
-{
-    RTCState *s = opaque;
-
-    if ((addr & 1) == 0) {
-        s->cmos_index = data & 0x7f;
-    }
-â¦â¦
-static uint64_t cmos_ioport_read(void *opaque, hwaddr addr,
-                                 unsigned size)
-{
-    RTCState *s = opaque;
-    int ret;
-    if ((addr & 1) == 0) {
-        return 0xff;
-    } else {
-        switch(s->cmos_index) {
-
-According to CMOS spec, âany write to PROT 0070h should be followed by an 
-action to PROT 0071h or the RTC
-Will be RTC will be left in an unknown stateâ, but it seems that we can not 
-ensure this sequence in qemu/kvm.
-
-Any ideas ?
-
-Thanks,
-Hailiang
-
-On 01/12/2017 08:08, Gonglei (Arei) wrote:
->
-First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
->
-Â Â Â Â Â Â  CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc>
->
-Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6> Â Â Â Â Â Â  CPU 1/KVM-15567
->
-kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because
->
-cmos_index is 0x6 now:> Â Â Â Â Â Â  CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size
->
-1 count 1 val 0x6> vcpu1 read 0x6:> Â Â Â Â Â Â  CPU 1/KVM-15567 kvm_pio: pio_read
->
-at 0x71 size 1 count 1 val 0x6
-This seems to be a Windows bug.  The easiest workaround that I
-can think of is to clear the interrupts already when 0xc is written,
-without waiting for the read (because REG_C can only be read).
-
-What do you think?
-
-Thanks,
-
-Paolo
-
-I also think it's windows bug, the problem is that it doesn't occur on xen 
-platform. And there are some other works need to be done while reading REG_C. 
-So I wrote that patch.
-
-Thanks,
-Gonglei
-åä»¶äººï¼Paolo Bonzini
-æ¶ä»¶äººï¼é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin
-æéï¼é»ä¼æ ,çæ¬£,è°¢ç¥¥æ
-æ¶é´ï¼2017-12-02 01:10:08
-ä¸»é¢:Re: [BUG] Windows 7 got stuck easily while run PCMark10 application
-
-On 01/12/2017 08:08, Gonglei (Arei) wrote:
->
-First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
->
-CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc>
->
-Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>        CPU 1/KVM-15567
->
-kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because
->
-cmos_index is 0x6 now:>        CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size
->
-1 count 1 val 0x6> vcpu1 read 0x6:>        CPU 1/KVM-15567 kvm_pio: pio_read
->
-at 0x71 size 1 count 1 val 0x6
-This seems to be a Windows bug.  The easiest workaround that I
-can think of is to clear the interrupts already when 0xc is written,
-without waiting for the read (because REG_C can only be read).
-
-What do you think?
-
-Thanks,
-
-Paolo
-
-On 01/12/2017 18:45, Gonglei (Arei) wrote:
->
-I also think it's windows bug, the problem is that it doesn't occur on
->
-xen platform.
-It's a race, it may just be that RTC PIO is faster in Xen because it's
-implemented in the hypervisor.
-
-I will try reporting it to Microsoft.
-
-Thanks,
-
-Paolo
-
->
-Thanks,
->
-Gonglei
->
-*åä»¶äººï¼*Paolo Bonzini
->
-*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin
->
-*æéï¼*é»ä¼æ ,çæ¬£,è°¢ç¥¥æ
->
-*æ¶é´ï¼*2017-12-02 01:10:08
->
-*ä¸»é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application
->
->
-On 01/12/2017 08:08, Gonglei (Arei) wrote:
->
-> First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
->
-> Â Â Â Â Â Â  CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc>
->
-> Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6> Â Â Â Â Â Â  CPU 1/KVM-15567
->
-> kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because
->
-> cmos_index is 0x6 now:> Â Â Â Â Â Â  CPU 0/KVM-15566 kvm_pio: pio_read at 0x71
->
-> size 1 count 1 val 0x6> vcpu1
->
-read 0x6:> Â Â Â Â Â Â  CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count
->
-1 val 0x6
->
-This seems to be a Windows bug.Â  The easiest workaround that I
->
-can think of is to clear the interrupts already when 0xc is written,
->
-without waiting for the read (because REG_C can only be read).
->
->
-What do you think?
->
->
-Thanks,
->
->
-Paolo
-
-On 2017/12/2 2:37, Paolo Bonzini wrote:
-On 01/12/2017 18:45, Gonglei (Arei) wrote:
-I also think it's windows bug, the problem is that it doesn't occur on
-xen platform.
-It's a race, it may just be that RTC PIO is faster in Xen because it's
-implemented in the hypervisor.
-No, In Xen, it does not has such problem because it injects the RTC irq without
-checking whether its previous irq been cleared or not, which we do has such 
-checking
-in KVM.
-
-static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
-        int irq_level, bool line_status)
-{
-   ... ...
-    if (!irq_level) {
-        ioapic->irr &= ~mask; -->clear the RTC irq in irr, Or we will can not 
-inject RTC irq.
-        ret = 1;
-        goto out;
-    }
-
-I agree that we move the operation of clearing RTC irq from cmos_ioport_read() 
-to
-cmos_ioport_write() to ensure the action been done.
-
-Thanks,
-Hailiang
-I will try reporting it to Microsoft.
-
-Thanks,
-
-Paolo
-Thanks,
-Gonglei
-*åä»¶äººï¼*Paolo Bonzini
-*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin
-*æéï¼*é»ä¼æ ,çæ¬£,è°¢ç¥¥æ
-*æ¶é´ï¼*2017-12-02 01:10:08
-*ä¸»é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application
-
-On 01/12/2017 08:08, Gonglei (Arei) wrote:
-First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
-        CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> Second write to 
-0x70, cmos_index = 0x86 & 0x7f = 0x6>        CPU 1/KVM-15567 kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x86> vcpu0 read 0x6 because cmos_index is 0x6 now:>        CPU 
-0/KVM-15566 kvm_pio: pio_read at 0x71 size 1 count 1 val 0x6> vcpu1
-read 0x6:>        CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count
-1 val 0x6
-This seems to be a Windows bug.  The easiest workaround that I
-can think of is to clear the interrupts already when 0xc is written,
-without waiting for the read (because REG_C can only be read).
-
-What do you think?
-
-Thanks,
-
-Paolo
-.
-
diff --git a/classification_output/04/other/81775929 b/classification_output/04/other/81775929
deleted file mode 100644
index 1688f663b..000000000
--- a/classification_output/04/other/81775929
+++ /dev/null
@@ -1,243 +0,0 @@
-other: 0.877
-assembly: 0.849
-vnc: 0.825
-semantic: 0.825
-graphic: 0.818
-instruction: 0.816
-KVM: 0.815
-mistranslation: 0.811
-socket: 0.810
-device: 0.777
-network: 0.759
-boot: 0.742
-
-[Qemu-devel] [BUG] Monitor QMP is broken ?
-
-Hello!
-
- I have updated my qemu to the recent version and it seems to have lost 
-compatibility with
-libvirt. The error message is:
---- cut ---
-internal error: unable to execute QEMU command 'qmp_capabilities': QMP input 
-object member
-'id' is unexpected
---- cut ---
- What does it mean? Is it intentional or not?
-
-Kind regards,
-Pavel Fedin
-Expert Engineer
-Samsung Electronics Research center Russia
-
-Hello! 
-
->
-I have updated my qemu to the recent version and it seems to have lost
->
-compatibility
-with
->
-libvirt. The error message is:
->
---- cut ---
->
-internal error: unable to execute QEMU command 'qmp_capabilities': QMP input
->
-object
->
-member
->
-'id' is unexpected
->
---- cut ---
->
-What does it mean? Is it intentional or not?
-I have found the problem. It is caused by commit
-65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to use the 
-removed
-asynchronous interface but it still feeds in JSONs with 'id' field set to 
-something. So i
-think the related fragment in qmp_check_input_obj() function should be brought 
-back
-
-Kind regards,
-Pavel Fedin
-Expert Engineer
-Samsung Electronics Research center Russia
-
-On Fri, Jun 05, 2015 at 04:58:46PM +0300, Pavel Fedin wrote:
->
-Hello!
->
->
->  I have updated my qemu to the recent version and it seems to have lost
->
-> compatibility
->
-with
->
-> libvirt. The error message is:
->
-> --- cut ---
->
-> internal error: unable to execute QEMU command 'qmp_capabilities': QMP
->
-> input object
->
-> member
->
-> 'id' is unexpected
->
-> --- cut ---
->
->  What does it mean? Is it intentional or not?
->
->
-I have found the problem. It is caused by commit
->
-65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to use the
->
-removed
->
-asynchronous interface but it still feeds in JSONs with 'id' field set to
->
-something. So i
->
-think the related fragment in qmp_check_input_obj() function should be
->
-brought back
-If QMP is rejecting the 'id' parameter that is a regression bug.
-
-[quote]
-The QMP spec says
-
-2.3 Issuing Commands
---------------------
-
-The format for command execution is:
-
-{ "execute": json-string, "arguments": json-object, "id": json-value }
-
- Where,
-
-- The "execute" member identifies the command to be executed by the Server
-- The "arguments" member is used to pass any arguments required for the
-  execution of the command, it is optional when no arguments are
-  required. Each command documents what contents will be considered
-  valid when handling the json-argument
-- The "id" member is a transaction identification associated with the
-  command execution, it is optional and will be part of the response if
-  provided. The "id" member can be any json-value, although most
-  clients merely use a json-number incremented for each successive
-  command
-
-
-2.4 Commands Responses
-----------------------
-
-There are two possible responses which the Server will issue as the result
-of a command execution: success or error.
-
-2.4.1 success
--------------
-
-The format of a success response is:
-
-{ "return": json-value, "id": json-value }
-
- Where,
-
-- The "return" member contains the data returned by the command, which
-  is defined on a per-command basis (usually a json-object or
-  json-array of json-objects, but sometimes a json-number, json-string,
-  or json-array of json-strings); it is an empty json-object if the
-  command does not return data
-- The "id" member contains the transaction identification associated
-  with the command execution if issued by the Client
-
-[/quote]
-
-And as such, libvirt chose to /always/ send an 'id' parameter in all
-commands it issues.
-
-We don't however validate the id in the reply, though arguably we
-should have done so.
-
-Regards,
-Daniel
--- 
-|:
-http://berrange.com
--o-
-http://www.flickr.com/photos/dberrange/
-:|
-|:
-http://libvirt.org
--o-
-http://virt-manager.org
-:|
-|:
-http://autobuild.org
--o-
-http://search.cpan.org/~danberr/
-:|
-|:
-http://entangle-photo.org
--o-
-http://live.gnome.org/gtk-vnc
-:|
-
-"Daniel P. Berrange" <address@hidden> writes:
-
->
-On Fri, Jun 05, 2015 at 04:58:46PM +0300, Pavel Fedin wrote:
->
->  Hello!
->
->
->
-> >  I have updated my qemu to the recent version and it seems to have
->
-> > lost compatibility
->
-> with
->
-> > libvirt. The error message is:
->
-> > --- cut ---
->
-> > internal error: unable to execute QEMU command 'qmp_capabilities':
->
-> > QMP input object
->
-> > member
->
-> > 'id' is unexpected
->
-> > --- cut ---
->
-> >  What does it mean? Is it intentional or not?
->
->
->
->  I have found the problem. It is caused by commit
->
-> 65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to
->
-> use the removed
->
-> asynchronous interface but it still feeds in JSONs with 'id' field
->
-> set to something. So i
->
-> think the related fragment in qmp_check_input_obj() function should
->
-> be brought back
->
->
-If QMP is rejecting the 'id' parameter that is a regression bug.
-It is definitely a regression, my fault, and I'll get it fixed a.s.a.p.
-
-[...]
-
diff --git a/classification_output/04/other/85542195 b/classification_output/04/other/85542195
deleted file mode 100644
index 314b8f440..000000000
--- a/classification_output/04/other/85542195
+++ /dev/null
@@ -1,128 +0,0 @@
-other: 0.944
-semantic: 0.941
-assembly: 0.941
-graphic: 0.938
-device: 0.936
-instruction: 0.935
-boot: 0.932
-vnc: 0.923
-mistranslation: 0.907
-socket: 0.905
-network: 0.899
-KVM: 0.898
-
-[Qemu-devel] [Bug in qemu-system-ppc running Mac OS 9 on Windows 10]
-
-Hi all,
-
-I've been experiencing issues when installing Mac OS 9.x using
-qemu-system-ppc.exe in Windows 10. After booting from CD image,
-partitioning a fresh disk image often hangs Qemu. When using a
-pre-partitioned disk image, the OS installation process halts
-somewhere during the process. The issues can be resolved by setting
-qemu-system-ppc.exe to run in Windows 7 compatibility mode.
-AFAIK all Qemu builds for Windows since Mac OS 9 became available as
-guest are affected.
-The issue is reproducible by installing Qemu for Windows from Stephan
-Weil on Windows 10 and boot/install Mac OS 9.x
-
-Best regards and thanks for looking into this,
-Howard
-
-On Nov 25, 2016, at 9:26 AM, address@hidden wrote:
-Hi all,
-
-I've been experiencing issues when installing Mac OS 9.x using
-qemu-system-ppc.exe in Windows 10. After booting from CD image,
-partitioning a fresh disk image often hangs Qemu. When using a
-pre-partitioned disk image, the OS installation process halts
-somewhere during the process. The issues can be resolved by setting
-qemu-system-ppc.exe to run in Windows 7 compatibility mode.
-AFAIK all Qemu builds for Windows since Mac OS 9 became available as
-guest are affected.
-The issue is reproducible by installing Qemu for Windows from Stephan
-Weil on Windows 10 and boot/install Mac OS 9.x
-
-Best regards and thanks for looking into this,
-Howard
-I assume there was some kind of behavior change for some of the
-Windows API between Windows 7 and Windows 10, that is my guess as to
-why the compatibility mode works. Could you run 'make check' on your
-system, once in Windows 7 and once in Windows 10. Maybe the tests
-will tell us something. I'm hoping that one of the tests succeeds in
-Windows 7 and fails in Windows 10. That would help us pinpoint what
-the problem is.
-What I mean by run in Windows 7 is set the mingw environment to run
-in Windows 7 compatibility mode (if possible). If you have Windows 7
-on another partition you could boot from, that would be better.
-Good luck.
-p.s. use 'make check -k' to allow all the tests to run (even if one
-or more of the tests fails).
-
->
-> Hi all,
->
->
->
-> I've been experiencing issues when installing Mac OS 9.x using
->
-> qemu-system-ppc.exe in Windows 10. After booting from CD image,
->
-> partitioning a fresh disk image often hangs Qemu. When using a
->
-> pre-partitioned disk image, the OS installation process halts
->
-> somewhere during the process. The issues can be resolved by setting
->
-> qemu-system-ppc.exe to run in Windows 7 compatibility mode.
->
-> AFAIK all Qemu builds for Windows since Mac OS 9 became available as
->
-> guest are affected.
->
-> The issue is reproducible by installing Qemu for Windows from Stephan
->
-> Weil on Windows 10 and boot/install Mac OS 9.x
->
->
->
-> Best regards and thanks for looking into this,
->
-> Howard
->
->
->
-I assume there was some kind of behavior change for some of the Windows API
->
-between Windows 7 and Windows 10, that is my guess as to why the
->
-compatibility mode works. Could you run 'make check' on your system, once in
->
-Windows 7 and once in Windows 10. Maybe the tests will tell us something.
->
-I'm hoping that one of the tests succeeds in Windows 7 and fails in Windows
->
-10. That would help us pinpoint what the problem is.
->
->
-What I mean by run in Windows 7 is set the mingw environment to run in
->
-Windows 7 compatibility mode (if possible). If you have Windows 7 on another
->
-partition you could boot from, that would be better.
->
->
-Good luck.
->
->
-p.s. use 'make check -k' to allow all the tests to run (even if one or more
->
-of the tests fails).
-Hi,
-
-Thank you for you suggestion, but I have no means to run the check you
-suggest. I cross-compile from Linux.
-
-Best regards,
-Howard
-
diff --git a/classification_output/04/other/88225572 b/classification_output/04/other/88225572
deleted file mode 100644
index e65837b57..000000000
--- a/classification_output/04/other/88225572
+++ /dev/null
@@ -1,2908 +0,0 @@
-other: 0.987
-assembly: 0.981
-semantic: 0.976
-graphic: 0.974
-instruction: 0.974
-device: 0.970
-boot: 0.969
-vnc: 0.958
-socket: 0.955
-network: 0.950
-mistranslation: 0.942
-KVM: 0.924
-
-[BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
-
-Hi,
-
-I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
-think it's because io completion hits use-after-free when device is
-already gone. Is this a known bug that has been fixed? (I went through
-the git log but didn't find anything obvious).
-
-gdb backtrace is:
-
-Core was generated by `/usr/local/libexec/qemu-kvm -name 
-sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
-Program terminated with signal 11, Segmentation fault.
-#0 object_get_class (obj=obj@entry=0x0) at 
-/usr/src/debug/qemu-4.0/qom/object.c:903
-903        return obj->class;
-(gdb) bt
-#0  object_get_class (obj=obj@entry=0x0) at 
-/usr/src/debug/qemu-4.0/qom/object.c:903
-#1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
-Â  Â  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
-#2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
-Â  Â  opaque=0x558a2f2fd420, ret=0)
-Â  Â  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
-#3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
-Â  Â  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
-#4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
-Â  Â  i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
-#5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
-#6 Â 0x00007fff9ed75780 in ?? ()
-#7 Â 0x0000000000000000 in ?? ()
-
-It seems like qemu was completing a discard/write_zero request, but
-parent BusState was already freed & set to NULL.
-
-Do we need to drain all pending request before unrealizing virtio-blk
-device? Like the following patch proposed?
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
-If more info is needed, please let me know.
-
-Thanks,
-Eryu
-
-On Tue, 31 Dec 2019 18:34:34 +0800
-Eryu Guan <address@hidden> wrote:
-
->
-Hi,
->
->
-I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-think it's because io completion hits use-after-free when device is
->
-already gone. Is this a known bug that has been fixed? (I went through
->
-the git log but didn't find anything obvious).
->
->
-gdb backtrace is:
->
->
-Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-Program terminated with signal 11, Segmentation fault.
->
-#0 object_get_class (obj=obj@entry=0x0) at
->
-/usr/src/debug/qemu-4.0/qom/object.c:903
->
-903        return obj->class;
->
-(gdb) bt
->
-#0  object_get_class (obj=obj@entry=0x0) at
->
-/usr/src/debug/qemu-4.0/qom/object.c:903
->
-#1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-Â  Â  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-#2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-Â  Â  opaque=0x558a2f2fd420, ret=0)
->
-Â  Â  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-#3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-Â  Â  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-#4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-Â  Â  i1=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-#5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-#6 Â 0x00007fff9ed75780 in ?? ()
->
-#7 Â 0x0000000000000000 in ?? ()
->
->
-It seems like qemu was completing a discard/write_zero request, but
->
-parent BusState was already freed & set to NULL.
->
->
-Do we need to drain all pending request before unrealizing virtio-blk
->
-device? Like the following patch proposed?
->
->
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
->
-If more info is needed, please let me know.
-may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
->
-Thanks,
->
-Eryu
->
-
-On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-On Tue, 31 Dec 2019 18:34:34 +0800
->
-Eryu Guan <address@hidden> wrote:
->
->
-> Hi,
->
->
->
-> I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> think it's because io completion hits use-after-free when device is
->
-> already gone. Is this a known bug that has been fixed? (I went through
->
-> the git log but didn't find anything obvious).
->
->
->
-> gdb backtrace is:
->
->
->
-> Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> Program terminated with signal 11, Segmentation fault.
->
-> #0 object_get_class (obj=obj@entry=0x0) at
->
-> /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> 903        return obj->class;
->
-> (gdb) bt
->
-> #0  object_get_class (obj=obj@entry=0x0) at
->
-> /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> #1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> Â  Â  vector=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> #2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> Â  Â  opaque=0x558a2f2fd420, ret=0)
->
-> Â  Â  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> #3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> Â  Â  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> #4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> Â  Â  i1=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> #5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> #6 Â 0x00007fff9ed75780 in ?? ()
->
-> #7 Â 0x0000000000000000 in ?? ()
->
->
->
-> It seems like qemu was completing a discard/write_zero request, but
->
-> parent BusState was already freed & set to NULL.
->
->
->
-> Do we need to drain all pending request before unrealizing virtio-blk
->
-> device? Like the following patch proposed?
->
->
->
->
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
->
->
-> If more info is needed, please let me know.
->
->
-may be this will help:
-https://patchwork.kernel.org/patch/11213047/
-Yeah, this looks promising! I'll try it out (though it's a one-time
-crash for me). Thanks!
-
-Eryu
-
-On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> On Tue, 31 Dec 2019 18:34:34 +0800
->
-> Eryu Guan <address@hidden> wrote:
->
->
->
-> > Hi,
->
-> >
->
-> > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> > think it's because io completion hits use-after-free when device is
->
-> > already gone. Is this a known bug that has been fixed? (I went through
->
-> > the git log but didn't find anything obvious).
->
-> >
->
-> > gdb backtrace is:
->
-> >
->
-> > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > Program terminated with signal 11, Segmentation fault.
->
-> > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > 903        return obj->class;
->
-> > (gdb) bt
->
-> > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > #1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> > Â  Â  vector=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > #2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> > Â  Â  opaque=0x558a2f2fd420, ret=0)
->
-> > Â  Â  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > #3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > Â  Â  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > #4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> > Â  Â  i1=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > #5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > #6 Â 0x00007fff9ed75780 in ?? ()
->
-> > #7 Â 0x0000000000000000 in ?? ()
->
-> >
->
-> > It seems like qemu was completing a discard/write_zero request, but
->
-> > parent BusState was already freed & set to NULL.
->
-> >
->
-> > Do we need to drain all pending request before unrealizing virtio-blk
->
-> > device? Like the following patch proposed?
->
-> >
->
-> >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> >
->
-> > If more info is needed, please let me know.
->
->
->
-> may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
->
-Yeah, this looks promising! I'll try it out (though it's a one-time
->
-crash for me). Thanks!
-After applying this patch, I don't see the original segfaut and
-backtrace, but I see this crash
-
-[Thread debugging using libthread_db enabled]
-Using host libthread_db library "/lib64/libthread_db.so.1".
-Core was generated by `/usr/local/libexec/qemu-kvm -name 
-sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
-Program terminated with signal 11, Segmentation fault.
-#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, 
-addr=0, val=<optimized out>, size=<optimized out>) at 
-/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
-1324        VirtIOPCIProxy *proxy = 
-VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
-Missing separate debuginfos, use: debuginfo-install 
-glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 
-libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 
-libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 
-pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
-(gdb) bt
-#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, 
-addr=0, val=<optimized out>, size=<optimized out>) at 
-/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
-#1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, 
-addr=<optimized out>, value=<optimized out>, size=<optimized out>, 
-shift=<optimized out>, mask=<optimized out>, attrs=...) at 
-/usr/src/debug/qemu-4.0/memory.c:502
-#2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, 
-value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized 
-out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 
-<memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
-    at /usr/src/debug/qemu-4.0/memory.c:568
-#3  0x0000561216837c66 in memory_region_dispatch_write 
-(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, 
-attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
-#4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, 
-addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 
-0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, 
-l=<optimized out>, mr=0x56121846d340)
-    at /usr/src/debug/qemu-4.0/exec.c:3279
-#5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, 
-attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at 
-/usr/src/debug/qemu-4.0/exec.c:3318
-#6  0x00005612167e4a1b in address_space_write (as=<optimized out>, 
-addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at 
-/usr/src/debug/qemu-4.0/exec.c:3408
-#7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized 
-out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 
-0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) 
-at /usr/src/debug/qemu-4.0/exec.c:3419
-#8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at 
-/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
-#9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) 
-at /usr/src/debug/qemu-4.0/cpus.c:1281
-#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at 
-/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
-#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
-#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
-
-And I searched and found
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
-backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
-blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
-bug.
-
-But I can still hit the bug even after applying the commit. Do I miss
-anything?
-
-Thanks,
-Eryu
->
-Eryu
-
-On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
->
-On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > Eryu Guan <address@hidden> wrote:
->
-> >
->
-> > > Hi,
->
-> > >
->
-> > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> > > think it's because io completion hits use-after-free when device is
->
-> > > already gone. Is this a known bug that has been fixed? (I went through
->
-> > > the git log but didn't find anything obvious).
->
-> > >
->
-> > > gdb backtrace is:
->
-> > >
->
-> > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > Program terminated with signal 11, Segmentation fault.
->
-> > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > 903        return obj->class;
->
-> > > (gdb) bt
->
-> > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> > >     vector=<optimized out>) at
->
-> > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> > >     opaque=0x558a2f2fd420, ret=0)
->
-> > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> > >     i1=<optimized out>) at
->
-> > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > #7  0x0000000000000000 in ?? ()
->
-> > >
->
-> > > It seems like qemu was completing a discard/write_zero request, but
->
-> > > parent BusState was already freed & set to NULL.
->
-> > >
->
-> > > Do we need to drain all pending request before unrealizing virtio-blk
->
-> > > device? Like the following patch proposed?
->
-> > >
->
-> > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > >
->
-> > > If more info is needed, please let me know.
->
-> >
->
-> > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
->
->
-> Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> crash for me). Thanks!
->
->
-After applying this patch, I don't see the original segfaut and
->
-backtrace, but I see this crash
->
->
-[Thread debugging using libthread_db enabled]
->
-Using host libthread_db library "/lib64/libthread_db.so.1".
->
-Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-Program terminated with signal 11, Segmentation fault.
->
-#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-addr=0, val=<optimized out>, size=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-1324        VirtIOPCIProxy *proxy =
->
-VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-Missing separate debuginfos, use: debuginfo-install
->
-glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-(gdb) bt
->
-#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-addr=0, val=<optimized out>, size=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-#1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>,
->
-addr=<optimized out>, value=<optimized out>, size=<optimized out>,
->
-shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-/usr/src/debug/qemu-4.0/memory.c:502
->
-#2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340,
->
-attrs=...)
->
-at /usr/src/debug/qemu-4.0/memory.c:568
->
-#3  0x0000561216837c66 in memory_region_dispatch_write
->
-(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-#4  0x00005612167e036f in flatview_write_continue
->
-(fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
->
-at /usr/src/debug/qemu-4.0/exec.c:3279
->
-#5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out
->
-of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
->
-#6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/exec.c:3408
->
-#7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-addr=<optimized out>, attrs=..., attrs@entry=...,
->
-buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-len=<optimized out>, is_write=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/exec.c:3419
->
-#8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
->
-/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-#9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-(arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
->
-And I searched and found
->
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-bug.
->
->
-But I can still hit the bug even after applying the commit. Do I miss
->
-anything?
-Hi Eryu,
-This backtrace seems to be caused by this bug (there were two bugs in
-1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
-Although the solution hasn't been tested on virtio-blk yet, you may
-want to apply this patch:
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
-Let me know if this works.
-
-Best regards, Julia Suvorova.
-
-On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
->
->
-> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > Eryu Guan <address@hidden> wrote:
->
-> > >
->
-> > > > Hi,
->
-> > > >
->
-> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> > > > think it's because io completion hits use-after-free when device is
->
-> > > > already gone. Is this a known bug that has been fixed? (I went through
->
-> > > > the git log but didn't find anything obvious).
->
-> > > >
->
-> > > > gdb backtrace is:
->
-> > > >
->
-> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > 903        return obj->class;
->
-> > > > (gdb) bt
->
-> > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> > > >     vector=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> > > >     i1=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > #7  0x0000000000000000 in ?? ()
->
-> > > >
->
-> > > > It seems like qemu was completing a discard/write_zero request, but
->
-> > > > parent BusState was already freed & set to NULL.
->
-> > > >
->
-> > > > Do we need to drain all pending request before unrealizing virtio-blk
->
-> > > > device? Like the following patch proposed?
->
-> > > >
->
-> > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > >
->
-> > > > If more info is needed, please let me know.
->
-> > >
->
-> > > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
-> >
->
-> > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > crash for me). Thanks!
->
->
->
-> After applying this patch, I don't see the original segfaut and
->
-> backtrace, but I see this crash
->
->
->
-> [Thread debugging using libthread_db enabled]
->
-> Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> Program terminated with signal 11, Segmentation fault.
->
-> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> addr=0, val=<optimized out>, size=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> 1324        VirtIOPCIProxy *proxy =
->
-> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> Missing separate debuginfos, use: debuginfo-install
->
-> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> (gdb) bt
->
-> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> addr=0, val=<optimized out>, size=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>,
->
-> addr=<optimized out>, value=<optimized out>, size=<optimized out>,
->
-> shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> /usr/src/debug/qemu-4.0/memory.c:502
->
-> #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-> access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340,
->
-> attrs=...)
->
->     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> #4  0x00005612167e036f in flatview_write_continue
->
-> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> mr=0x56121846d340)
->
->     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028
->
-> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
->
-> #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>)
->
-> at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> len=<optimized out>, is_write=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/exec.c:3419
->
-> #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
->
-> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
->
->
-> And I searched and found
->
->
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-> bug.
->
->
->
-> But I can still hit the bug even after applying the commit. Do I miss
->
-> anything?
->
->
-Hi Eryu,
->
-This backtrace seems to be caused by this bug (there were two bugs in
->
-1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-Although the solution hasn't been tested on virtio-blk yet, you may
->
-want to apply this patch:
->
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-Let me know if this works.
-Will try it out, thanks a lot!
-
-Eryu
-
-On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
->
->
-> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > Eryu Guan <address@hidden> wrote:
->
-> > >
->
-> > > > Hi,
->
-> > > >
->
-> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> > > > think it's because io completion hits use-after-free when device is
->
-> > > > already gone. Is this a known bug that has been fixed? (I went through
->
-> > > > the git log but didn't find anything obvious).
->
-> > > >
->
-> > > > gdb backtrace is:
->
-> > > >
->
-> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > 903        return obj->class;
->
-> > > > (gdb) bt
->
-> > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> > > >     vector=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> > > >     i1=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > #7  0x0000000000000000 in ?? ()
->
-> > > >
->
-> > > > It seems like qemu was completing a discard/write_zero request, but
->
-> > > > parent BusState was already freed & set to NULL.
->
-> > > >
->
-> > > > Do we need to drain all pending request before unrealizing virtio-blk
->
-> > > > device? Like the following patch proposed?
->
-> > > >
->
-> > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > >
->
-> > > > If more info is needed, please let me know.
->
-> > >
->
-> > > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
-> >
->
-> > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > crash for me). Thanks!
->
->
->
-> After applying this patch, I don't see the original segfaut and
->
-> backtrace, but I see this crash
->
->
->
-> [Thread debugging using libthread_db enabled]
->
-> Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> Program terminated with signal 11, Segmentation fault.
->
-> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> addr=0, val=<optimized out>, size=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> 1324        VirtIOPCIProxy *proxy =
->
-> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> Missing separate debuginfos, use: debuginfo-install
->
-> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> (gdb) bt
->
-> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> addr=0, val=<optimized out>, size=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>,
->
-> addr=<optimized out>, value=<optimized out>, size=<optimized out>,
->
-> shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> /usr/src/debug/qemu-4.0/memory.c:502
->
-> #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-> access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340,
->
-> attrs=...)
->
->     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> #4  0x00005612167e036f in flatview_write_continue
->
-> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> mr=0x56121846d340)
->
->     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028
->
-> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
->
-> #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>)
->
-> at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> len=<optimized out>, is_write=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/exec.c:3419
->
-> #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
->
-> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
->
->
-> And I searched and found
->
->
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-> bug.
->
->
->
-> But I can still hit the bug even after applying the commit. Do I miss
->
-> anything?
->
->
-Hi Eryu,
->
-This backtrace seems to be caused by this bug (there were two bugs in
->
-1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-Although the solution hasn't been tested on virtio-blk yet, you may
->
-want to apply this patch:
->
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-Let me know if this works.
-Unfortunately, I still see the same segfault & backtrace after applying
-commit 421afd2fe8dd ("virtio: reset region cache when on queue
-deletion")
-
-Anything I can help to debug?
-
-Thanks,
-Eryu
-
-On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
->
-On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-> On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
-> >
->
-> > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > > Eryu Guan <address@hidden> wrote:
->
-> > > >
->
-> > > > > Hi,
->
-> > > > >
->
-> > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox,
->
-> > > > > I
->
-> > > > > think it's because io completion hits use-after-free when device is
->
-> > > > > already gone. Is this a known bug that has been fixed? (I went
->
-> > > > > through
->
-> > > > > the git log but didn't find anything obvious).
->
-> > > > >
->
-> > > > > gdb backtrace is:
->
-> > > > >
->
-> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > 903        return obj->class;
->
-> > > > > (gdb) bt
->
-> > > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> > > > >     vector=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> > > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> > > > >     i1=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > > #7  0x0000000000000000 in ?? ()
->
-> > > > >
->
-> > > > > It seems like qemu was completing a discard/write_zero request, but
->
-> > > > > parent BusState was already freed & set to NULL.
->
-> > > > >
->
-> > > > > Do we need to drain all pending request before unrealizing
->
-> > > > > virtio-blk
->
-> > > > > device? Like the following patch proposed?
->
-> > > > >
->
-> > > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > > >
->
-> > > > > If more info is needed, please let me know.
->
-> > > >
->
-> > > > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
-> > >
->
-> > > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > > crash for me). Thanks!
->
-> >
->
-> > After applying this patch, I don't see the original segfaut and
->
-> > backtrace, but I see this crash
->
-> >
->
-> > [Thread debugging using libthread_db enabled]
->
-> > Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> > Program terminated with signal 11, Segmentation fault.
->
-> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> > addr=0, val=<optimized out>, size=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > 1324        VirtIOPCIProxy *proxy =
->
-> > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> > Missing separate debuginfos, use: debuginfo-install
->
-> > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> > (gdb) bt
->
-> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> > addr=0, val=<optimized out>, size=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized
->
-> > out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>,
->
-> > shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> > /usr/src/debug/qemu-4.0/memory.c:502
->
-> > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-> > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-> > access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-> > access_fn=0x561216835ac0 <memory_region_write_accessor>,
->
-> > mr=0x56121846d340, attrs=...)
->
-> >     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> > #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> > #4  0x00005612167e036f in flatview_write_continue
->
-> > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> > mr=0x56121846d340)
->
-> >     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028
->
-> > out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
->
-> > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
->
-> > out>) at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> > addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > len=<optimized out>, is_write=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/exec.c:3419
->
-> > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
->
-> > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
-> >
->
-> > And I searched and found
->
-> >
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-> > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-> > bug.
->
-> >
->
-> > But I can still hit the bug even after applying the commit. Do I miss
->
-> > anything?
->
->
->
-> Hi Eryu,
->
-> This backtrace seems to be caused by this bug (there were two bugs in
->
-> 1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-> Although the solution hasn't been tested on virtio-blk yet, you may
->
-> want to apply this patch:
->
->
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-> Let me know if this works.
->
->
-Unfortunately, I still see the same segfault & backtrace after applying
->
-commit 421afd2fe8dd ("virtio: reset region cache when on queue
->
-deletion")
->
->
-Anything I can help to debug?
-Please post the QEMU command-line and the QMP commands use to remove the
-device.
-
-The backtrace shows a vcpu thread submitting a request.  The device
-seems to be partially destroyed.  That's surprising because the monitor
-and the vcpu thread should use the QEMU global mutex to avoid race
-conditions.  Maybe seeing the QMP commands will make it clearer...
-
-Stefan
-signature.asc
-Description:
-PGP signature
-
-On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
->
-On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
->
-> On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-> > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
-> > >
->
-> > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > > > Eryu Guan <address@hidden> wrote:
->
-> > > > >
->
-> > > > > > Hi,
->
-> > > > > >
->
-> > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata
->
-> > > > > > sandbox, I
->
-> > > > > > think it's because io completion hits use-after-free when device
->
-> > > > > > is
->
-> > > > > > already gone. Is this a known bug that has been fixed? (I went
->
-> > > > > > through
->
-> > > > > > the git log but didn't find anything obvious).
->
-> > > > > >
->
-> > > > > > gdb backtrace is:
->
-> > > > > >
->
-> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > 903        return obj->class;
->
-> > > > > > (gdb) bt
->
-> > > > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector
->
-> > > > > > (vdev=0x558a2e7751d0,
->
-> > > > > >     vector=<optimized out>) at
->
-> > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > > > #2  0x0000558a2bfdcb1e in
->
-> > > > > > virtio_blk_discard_write_zeroes_complete (
->
-> > > > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized
->
-> > > > > > out>,
->
-> > > > > >     i1=<optimized out>) at
->
-> > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > > > #7  0x0000000000000000 in ?? ()
->
-> > > > > >
->
-> > > > > > It seems like qemu was completing a discard/write_zero request,
->
-> > > > > > but
->
-> > > > > > parent BusState was already freed & set to NULL.
->
-> > > > > >
->
-> > > > > > Do we need to drain all pending request before unrealizing
->
-> > > > > > virtio-blk
->
-> > > > > > device? Like the following patch proposed?
->
-> > > > > >
->
-> > > > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > > > >
->
-> > > > > > If more info is needed, please let me know.
->
-> > > > >
->
-> > > > > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
-> > > >
->
-> > > > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > > > crash for me). Thanks!
->
-> > >
->
-> > > After applying this patch, I don't see the original segfaut and
->
-> > > backtrace, but I see this crash
->
-> > >
->
-> > > [Thread debugging using libthread_db enabled]
->
-> > > Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> > > Program terminated with signal 11, Segmentation fault.
->
-> > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
->
-> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > 1324        VirtIOPCIProxy *proxy =
->
-> > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> > > Missing separate debuginfos, use: debuginfo-install
->
-> > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> > > (gdb) bt
->
-> > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
->
-> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized
->
-> > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized
->
-> > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> > > /usr/src/debug/qemu-4.0/memory.c:502
->
-> > > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-> > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-> > > access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-> > > access_fn=0x561216835ac0 <memory_region_write_accessor>,
->
-> > > mr=0x56121846d340, attrs=...)
->
-> > >     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> > > #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> > > #4  0x00005612167e036f in flatview_write_continue
->
-> > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> > > mr=0x56121846d340)
->
-> > >     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address
->
-> > > 0x7fce7dd97028 out of bounds>, len=2) at
->
-> > > /usr/src/debug/qemu-4.0/exec.c:3318
->
-> > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
->
-> > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> > > addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > > len=<optimized out>, is_write=<optimized out>) at
->
-> > > /usr/src/debug/qemu-4.0/exec.c:3419
->
-> > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00)
->
-> > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
-> > >
->
-> > > And I searched and found
->
-> > >
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-> > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-> > > bug.
->
-> > >
->
-> > > But I can still hit the bug even after applying the commit. Do I miss
->
-> > > anything?
->
-> >
->
-> > Hi Eryu,
->
-> > This backtrace seems to be caused by this bug (there were two bugs in
->
-> > 1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-> > Although the solution hasn't been tested on virtio-blk yet, you may
->
-> > want to apply this patch:
->
-> >
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-> > Let me know if this works.
->
->
->
-> Unfortunately, I still see the same segfault & backtrace after applying
->
-> commit 421afd2fe8dd ("virtio: reset region cache when on queue
->
-> deletion")
->
->
->
-> Anything I can help to debug?
->
->
-Please post the QEMU command-line and the QMP commands use to remove the
->
-device.
-It's a normal kata instance using virtio-fs as rootfs.
-
-/usr/local/libexec/qemu-kvm -name 
-sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
- -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine 
-q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
- -cpu host -qmp 
-unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
- \
- -qmp 
-unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
- \
- -m 2048M,slots=10,maxmem=773893M -device 
-pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
- -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device 
-virtconsole,chardev=charconsole0,id=console0 \
- -chardev 
-socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait
- \
- -device 
-virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \
- -chardev 
-socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait
- \
- -device nvdimm,id=nv0,memdev=mem0 -object 
-memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456
- \
- -object rng-random,id=rng0,filename=/dev/urandom -device 
-virtio-rng,rng=rng0,romfile= \
- -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
- -chardev 
-socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait
- \
- -chardev 
-socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock
- \
- -device 
-vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M 
--netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
- -device 
-driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile=
- \
- -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults 
--nographic -daemonize \
- -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on 
--numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
- -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 
-i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 
-console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 
-root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro 
-rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 
-agent.use_vsock=false init=/usr/lib/systemd/systemd 
-systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service 
-systemd.mask=systemd-networkd.socket \
- -pidfile 
-/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid 
-\
- -smp 1,cores=1,threads=1,sockets=96,maxcpus=96
-
-QMP command to delete device (the device id is just an example, not the
-one caused the crash):
-
-"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
-
-which has been hot plugged by:
-"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
-"{\"return\": {}}"
-"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
-"{\"return\": {}}"
-
->
->
-The backtrace shows a vcpu thread submitting a request.  The device
->
-seems to be partially destroyed.  That's surprising because the monitor
->
-and the vcpu thread should use the QEMU global mutex to avoid race
->
-conditions.  Maybe seeing the QMP commands will make it clearer...
->
->
-Stefan
-Thanks!
-
-Eryu
-
-On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote:
->
-On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
->
-> On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
->
-> > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-> > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
-> > > >
->
-> > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > > > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > > > > Eryu Guan <address@hidden> wrote:
->
-> > > > > >
->
-> > > > > > > Hi,
->
-> > > > > > >
->
-> > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata
->
-> > > > > > > sandbox, I
->
-> > > > > > > think it's because io completion hits use-after-free when
->
-> > > > > > > device is
->
-> > > > > > > already gone. Is this a known bug that has been fixed? (I went
->
-> > > > > > > through
->
-> > > > > > > the git log but didn't find anything obvious).
->
-> > > > > > >
->
-> > > > > > > gdb backtrace is:
->
-> > > > > > >
->
-> > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > > 903        return obj->class;
->
-> > > > > > > (gdb) bt
->
-> > > > > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector
->
-> > > > > > > (vdev=0x558a2e7751d0,
->
-> > > > > > >     vector=<optimized out>) at
->
-> > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > > > > #2  0x0000558a2bfdcb1e in
->
-> > > > > > > virtio_blk_discard_write_zeroes_complete (
->
-> > > > > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized
->
-> > > > > > > out>,
->
-> > > > > > >     i1=<optimized out>) at
->
-> > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > > > > #7  0x0000000000000000 in ?? ()
->
-> > > > > > >
->
-> > > > > > > It seems like qemu was completing a discard/write_zero request,
->
-> > > > > > > but
->
-> > > > > > > parent BusState was already freed & set to NULL.
->
-> > > > > > >
->
-> > > > > > > Do we need to drain all pending request before unrealizing
->
-> > > > > > > virtio-blk
->
-> > > > > > > device? Like the following patch proposed?
->
-> > > > > > >
->
-> > > > > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > > > > >
->
-> > > > > > > If more info is needed, please let me know.
->
-> > > > > >
->
-> > > > > > may be this will help:
->
-> > > > > >
-https://patchwork.kernel.org/patch/11213047/
->
-> > > > >
->
-> > > > > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > > > > crash for me). Thanks!
->
-> > > >
->
-> > > > After applying this patch, I don't see the original segfaut and
->
-> > > > backtrace, but I see this crash
->
-> > > >
->
-> > > > [Thread debugging using libthread_db enabled]
->
-> > > > Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
->
-> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > > 1324        VirtIOPCIProxy *proxy =
->
-> > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> > > > Missing separate debuginfos, use: debuginfo-install
->
-> > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> > > > libstdc++-4.8.5-28.alios7.1.x86_64
->
-> > > > numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64
->
-> > > > zlib-1.2.7-16.2.alios7.x86_64
->
-> > > > (gdb) bt
->
-> > > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
->
-> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized
->
-> > > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized
->
-> > > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> > > > /usr/src/debug/qemu-4.0/memory.c:502
->
-> > > > #2  0x0000561216833c5d in access_with_adjusted_size
->
-> > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8,
->
-> > > > size=size@entry=2, access_size_min=<optimized out>,
->
-> > > > access_size_max=<optimized out>, access_fn=0x561216835ac0
->
-> > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
->
-> > > >     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> > > > #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> > > > #4  0x00005612167e036f in flatview_write_continue
->
-> > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> > > > mr=0x56121846d340)
->
-> > > >     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> > > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address
->
-> > > > 0x7fce7dd97028 out of bounds>, len=2) at
->
-> > > > /usr/src/debug/qemu-4.0/exec.c:3318
->
-> > > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> > > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
->
-> > > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> > > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> > > > addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > > > len=<optimized out>, is_write=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/exec.c:3419
->
-> > > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00)
->
-> > > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> > > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
-> > > >
->
-> > > > And I searched and found
->
-> > > >
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the
->
-> > > > same
->
-> > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this
->
-> > > > particular
->
-> > > > bug.
->
-> > > >
->
-> > > > But I can still hit the bug even after applying the commit. Do I miss
->
-> > > > anything?
->
-> > >
->
-> > > Hi Eryu,
->
-> > > This backtrace seems to be caused by this bug (there were two bugs in
->
-> > > 1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-> > > Although the solution hasn't been tested on virtio-blk yet, you may
->
-> > > want to apply this patch:
->
-> > >
->
-> > >
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-> > > Let me know if this works.
->
-> >
->
-> > Unfortunately, I still see the same segfault & backtrace after applying
->
-> > commit 421afd2fe8dd ("virtio: reset region cache when on queue
->
-> > deletion")
->
-> >
->
-> > Anything I can help to debug?
->
->
->
-> Please post the QEMU command-line and the QMP commands use to remove the
->
-> device.
->
->
-It's a normal kata instance using virtio-fs as rootfs.
->
->
-/usr/local/libexec/qemu-kvm -name
->
-sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
->
--uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine
->
-q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
->
--cpu host -qmp
->
-unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
->
-\
->
--qmp
->
-unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
->
-\
->
--m 2048M,slots=10,maxmem=773893M -device
->
-pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
->
--device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device
->
-virtconsole,chardev=charconsole0,id=console0 \
->
--chardev
->
-socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait
->
-\
->
--device
->
-virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \
->
--chardev
->
-socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait
->
-\
->
--device nvdimm,id=nv0,memdev=mem0 -object
->
-memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456
->
-\
->
--object rng-random,id=rng0,filename=/dev/urandom -device
->
-virtio-rng,rng=rng0,romfile= \
->
--device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
->
--chardev
->
-socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait
->
-\
->
--chardev
->
-socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock
->
-\
->
--device
->
-vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M
->
--netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
->
--device
->
-driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile=
->
-\
->
--global kvm-pit.lost_tick_policy=discard -vga none -no-user-config
->
--nodefaults -nographic -daemonize \
->
--object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on
->
--numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
->
--append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1
->
-i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k
->
-console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0
->
-pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro
->
-ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96
->
-agent.use_vsock=false init=/usr/lib/systemd/systemd
->
-systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service
->
-systemd.mask=systemd-networkd.socket \
->
--pidfile
->
-/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid
->
-\
->
--smp 1,cores=1,threads=1,sockets=96,maxcpus=96
->
->
-QMP command to delete device (the device id is just an example, not the
->
-one caused the crash):
->
->
-"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
->
->
-which has been hot plugged by:
->
-"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
->
-"{\"return\": {}}"
->
-"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
->
-"{\"return\": {}}"
-Thanks.  I wasn't able to reproduce this crash with qemu.git/master.
-
-One thing that is strange about the latest backtrace you posted: QEMU is
-dispatching the memory access instead of using the ioeventfd code that
-that virtio-blk-pci normally takes when a virtqueue is notified.  I
-guess this means ioeventfd has already been disabled due to the hot
-unplug.
-
-Could you try with machine type "i440fx" instead of "q35"?  I wonder if
-pci-bridge/shpc is part of the problem.
-
-Stefan
-signature.asc
-Description:
-PGP signature
-
-On Tue, Jan 14, 2020 at 04:16:24PM +0000, Stefan Hajnoczi wrote:
->
-On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote:
->
-> On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
->
-> > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
->
-> > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-> > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
-> > > > >
->
-> > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > > > > > Eryu Guan <address@hidden> wrote:
->
-> > > > > > >
->
-> > > > > > > > Hi,
->
-> > > > > > > >
->
-> > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata
->
-> > > > > > > > sandbox, I
->
-> > > > > > > > think it's because io completion hits use-after-free when
->
-> > > > > > > > device is
->
-> > > > > > > > already gone. Is this a known bug that has been fixed? (I
->
-> > > > > > > > went through
->
-> > > > > > > > the git log but didn't find anything obvious).
->
-> > > > > > > >
->
-> > > > > > > > gdb backtrace is:
->
-> > > > > > > >
->
-> > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > > > 903        return obj->class;
->
-> > > > > > > > (gdb) bt
->
-> > > > > > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector
->
-> > > > > > > > (vdev=0x558a2e7751d0,
->
-> > > > > > > >     vector=<optimized out>) at
->
-> > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > > > > > #2  0x0000558a2bfdcb1e in
->
-> > > > > > > > virtio_blk_discard_write_zeroes_complete (
->
-> > > > > > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > > > > > #3  0x0000558a2c261c7e in blk_aio_complete
->
-> > > > > > > > (acb=0x558a2eed7420)
->
-> > > > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized
->
-> > > > > > > > out>,
->
-> > > > > > > >     i1=<optimized out>) at
->
-> > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > > > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > > > > > #7  0x0000000000000000 in ?? ()
->
-> > > > > > > >
->
-> > > > > > > > It seems like qemu was completing a discard/write_zero
->
-> > > > > > > > request, but
->
-> > > > > > > > parent BusState was already freed & set to NULL.
->
-> > > > > > > >
->
-> > > > > > > > Do we need to drain all pending request before unrealizing
->
-> > > > > > > > virtio-blk
->
-> > > > > > > > device? Like the following patch proposed?
->
-> > > > > > > >
->
-> > > > > > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > > > > > >
->
-> > > > > > > > If more info is needed, please let me know.
->
-> > > > > > >
->
-> > > > > > > may be this will help:
->
-> > > > > > >
-https://patchwork.kernel.org/patch/11213047/
->
-> > > > > >
->
-> > > > > > Yeah, this looks promising! I'll try it out (though it's a
->
-> > > > > > one-time
->
-> > > > > > crash for me). Thanks!
->
-> > > > >
->
-> > > > > After applying this patch, I don't see the original segfaut and
->
-> > > > > backtrace, but I see this crash
->
-> > > > >
->
-> > > > > [Thread debugging using libthread_db enabled]
->
-> > > > > Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>,
->
-> > > > > size=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > > > 1324        VirtIOPCIProxy *proxy =
->
-> > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> > > > > Missing separate debuginfos, use: debuginfo-install
->
-> > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> > > > > libstdc++-4.8.5-28.alios7.1.x86_64
->
-> > > > > numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> > > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> > > > > (gdb) bt
->
-> > > > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>,
->
-> > > > > size=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > > > #1  0x0000561216835b22 in memory_region_write_accessor
->
-> > > > > (mr=<optimized out>, addr=<optimized out>, value=<optimized out>,
->
-> > > > > size=<optimized out>, shift=<optimized out>, mask=<optimized out>,
->
-> > > > > attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
->
-> > > > > #2  0x0000561216833c5d in access_with_adjusted_size
->
-> > > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8,
->
-> > > > > size=size@entry=2, access_size_min=<optimized out>,
->
-> > > > > access_size_max=<optimized out>, access_fn=0x561216835ac0
->
-> > > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
->
-> > > > >     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> > > > > #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> > > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> > > > > #4  0x00005612167e036f in flatview_write_continue
->
-> > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304,
->
-> > > > > attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out
->
-> > > > > of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized
->
-> > > > > out>, mr=0x56121846d340)
->
-> > > > >     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> > > > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address
->
-> > > > > 0x7fce7dd97028 out of bounds>, len=2) at
->
-> > > > > /usr/src/debug/qemu-4.0/exec.c:3318
->
-> > > > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> > > > > addr=<optimized out>, attrs=..., buf=<optimized out>,
->
-> > > > > len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> > > > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> > > > > addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> > > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of
->
-> > > > > bounds>, len=<optimized out>, is_write=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/exec.c:3419
->
-> > > > > #8  0x0000561216849da1 in kvm_cpu_exec
->
-> > > > > (cpu=cpu@entry=0x56121849aa00) at
->
-> > > > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> > > > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> > > > > (arg=arg@entry=0x56121849aa00) at
->
-> > > > > /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>)
->
-> > > > > at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> > > > > #11 0x00007fce7bef6e25 in start_thread () from
->
-> > > > > /lib64/libpthread.so.0
->
-> > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
-> > > > >
->
-> > > > > And I searched and found
->
-> > > > >
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the
->
-> > > > > same
->
-> > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk:
->
-> > > > > Add
->
-> > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this
->
-> > > > > particular
->
-> > > > > bug.
->
-> > > > >
->
-> > > > > But I can still hit the bug even after applying the commit. Do I
->
-> > > > > miss
->
-> > > > > anything?
->
-> > > >
->
-> > > > Hi Eryu,
->
-> > > > This backtrace seems to be caused by this bug (there were two bugs in
->
-> > > > 1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-> > > > Although the solution hasn't been tested on virtio-blk yet, you may
->
-> > > > want to apply this patch:
->
-> > > >
->
-> > > >
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-> > > > Let me know if this works.
->
-> > >
->
-> > > Unfortunately, I still see the same segfault & backtrace after applying
->
-> > > commit 421afd2fe8dd ("virtio: reset region cache when on queue
->
-> > > deletion")
->
-> > >
->
-> > > Anything I can help to debug?
->
-> >
->
-> > Please post the QEMU command-line and the QMP commands use to remove the
->
-> > device.
->
->
->
-> It's a normal kata instance using virtio-fs as rootfs.
->
->
->
-> /usr/local/libexec/qemu-kvm -name
->
-> sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
->
->  -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine
->
-> q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
->
->  -cpu host -qmp
->
-> unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
->
->  \
->
->  -qmp
->
-> unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
->
->  \
->
->  -m 2048M,slots=10,maxmem=773893M -device
->
-> pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
->
->  -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device
->
-> virtconsole,chardev=charconsole0,id=console0 \
->
->  -chardev
->
-> socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait
->
->  \
->
->  -device
->
-> virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10
->
->  \
->
->  -chardev
->
-> socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait
->
->  \
->
->  -device nvdimm,id=nv0,memdev=mem0 -object
->
-> memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456
->
->  \
->
->  -object rng-random,id=rng0,filename=/dev/urandom -device
->
-> virtio-rng,rng=rng0,romfile= \
->
->  -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
->
->  -chardev
->
-> socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait
->
->  \
->
->  -chardev
->
-> socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock
->
->  \
->
->  -device
->
-> vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M
->
->  -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
->
->  -device
->
-> driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile=
->
->  \
->
->  -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config
->
-> -nodefaults -nographic -daemonize \
->
->  -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on
->
-> -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
->
->  -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1
->
-> i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp
->
-> reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests
->
-> net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1
->
-> rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet
->
-> systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false
->
-> init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target
->
-> systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \
->
->  -pidfile
->
-> /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid
->
->  \
->
->  -smp 1,cores=1,threads=1,sockets=96,maxcpus=96
->
->
->
-> QMP command to delete device (the device id is just an example, not the
->
-> one caused the crash):
->
->
->
-> "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
->
->
->
-> which has been hot plugged by:
->
-> "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
->
-> "{\"return\": {}}"
->
-> "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
->
-> "{\"return\": {}}"
->
->
-Thanks.  I wasn't able to reproduce this crash with qemu.git/master.
->
->
-One thing that is strange about the latest backtrace you posted: QEMU is
->
-dispatching the memory access instead of using the ioeventfd code that
->
-that virtio-blk-pci normally takes when a virtqueue is notified.  I
->
-guess this means ioeventfd has already been disabled due to the hot
->
-unplug.
->
->
-Could you try with machine type "i440fx" instead of "q35"?  I wonder if
->
-pci-bridge/shpc is part of the problem.
-Sure, will try it. But it may take some time, as the test bed is busy
-with other testing tasks. I'll report back once I got the results.
-
-Thanks,
-Eryu
-
diff --git a/classification_output/04/other/88281850 b/classification_output/04/other/88281850
deleted file mode 100644
index 10bfcf8be..000000000
--- a/classification_output/04/other/88281850
+++ /dev/null
@@ -1,289 +0,0 @@
-other: 0.983
-instruction: 0.978
-graphic: 0.974
-network: 0.973
-assembly: 0.972
-device: 0.970
-semantic: 0.968
-boot: 0.967
-socket: 0.966
-mistranslation: 0.948
-vnc: 0.945
-KVM: 0.881
-
-[Bug] Take more 150s to boot qemu on ARM64
-
-Hi all,
-I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes
-about 150s between beginning to run qemu command and beginng to boot
-Linux kernel ("EFI stub: Booting Linux Kernel...").
-But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel
-code and it finds c2445d387850 ("srcu: Add contention check to
-call_srcu() srcu_data ->lock acquisition").
-The qemu (qemu version is 6.2.92) command i run is :
-
-./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
---trace "kvm*" \
--cpu host \
--machine virt,accel=kvm,gic-version=3  \
--machine smp.cpus=2,smp.sockets=2 \
--no-reboot \
--nographic \
--monitor unix:/home/cx/qmp-test,server,nowait \
--bios /home/cx/boot/QEMU_EFI.fd \
--kernel /home/cx/boot/Image  \
--device
-pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
-\
--device vfio-pci,host=7d:01.3,id=net0 \
--device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
--drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
--append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
--net none \
--D /home/cx/qemu_log.txt
-I am not familiar with rcu code, and don't know how it causes the issue.
-Do you have any idea about this issue?
-Best Regard,
-
-Xiang Chen
-
-On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote:
->
-Hi all,
->
->
-I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes about
->
-150s between beginning to run qemu command and beginng to boot Linux kernel
->
-("EFI stub: Booting Linux Kernel...").
->
->
-But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code
->
-and it finds c2445d387850 ("srcu: Add contention check to call_srcu()
->
-srcu_data ->lock acquisition").
->
->
-The qemu (qemu version is 6.2.92) command i run is :
->
->
-./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
->
---trace "kvm*" \
->
--cpu host \
->
--machine virt,accel=kvm,gic-version=3  \
->
--machine smp.cpus=2,smp.sockets=2 \
->
--no-reboot \
->
--nographic \
->
--monitor unix:/home/cx/qmp-test,server,nowait \
->
--bios /home/cx/boot/QEMU_EFI.fd \
->
--kernel /home/cx/boot/Image  \
->
--device
->
-pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
->
-\
->
--device vfio-pci,host=7d:01.3,id=net0 \
->
--device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
->
--drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
->
--append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
->
--net none \
->
--D /home/cx/qemu_log.txt
->
->
-I am not familiar with rcu code, and don't know how it causes the issue. Do
->
-you have any idea about this issue?
-Please see the discussion here:
-https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
-Though that report requires ACPI to be forced on to get the
-delay, which results in more than 9,000 back-to-back calls to
-synchronize_srcu_expedited().  I cannot reproduce this on my setup, even
-with an artificial tight loop invoking synchronize_srcu_expedited(),
-but then again I don't have ARM hardware.
-
-My current guess is that the following patch, but with larger values for
-SRCU_MAX_NODELAY_PHASE.  Here "larger" might well be up in the hundreds,
-or perhaps even larger.
-
-If you get a chance to experiment with this, could you please reply
-to the discussion at the above URL?  (Or let me know, and I can CC
-you on the next message in that thread.)
-
-                                                Thanx, Paul
-
-------------------------------------------------------------------------
-
-diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
-index 50ba70f019dea..0db7873f4e95b 100644
---- a/kernel/rcu/srcutree.c
-+++ b/kernel/rcu/srcutree.c
-@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
- 
- #define SRCU_INTERVAL          1       // Base delay if no expedited GPs 
-pending.
- #define SRCU_MAX_INTERVAL      10      // Maximum incremental delay from slow 
-readers.
--#define SRCU_MAX_NODELAY_PHASE 1       // Maximum per-GP-phase consecutive 
-no-delay instances.
-+#define SRCU_MAX_NODELAY_PHASE 3       // Maximum per-GP-phase consecutive 
-no-delay instances.
- #define SRCU_MAX_NODELAY       100     // Maximum consecutive no-delay 
-instances.
- 
- /*
-@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
-  */
- static unsigned long srcu_get_delay(struct srcu_struct *ssp)
- {
-+       unsigned long gpstart;
-+       unsigned long j;
-        unsigned long jbase = SRCU_INTERVAL;
- 
-        if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), 
-READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
-                jbase = 0;
--       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
--               jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
--       if (!jbase) {
--               WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
--               if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
--                       jbase = 1;
-+       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
-+               j = jiffies - 1;
-+               gpstart = READ_ONCE(ssp->srcu_gp_start);
-+               if (time_after(j, gpstart))
-+                       jbase += j - gpstart;
-+               if (!jbase) {
-+                       WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
-+                       if (READ_ONCE(ssp->srcu_n_exp_nodelay) > 
-SRCU_MAX_NODELAY_PHASE)
-+                               jbase = 1;
-+               }
-        }
-        return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
- }
-
-å¨ 2022/6/13 21:22, Paul E. McKenney åé:
-On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote:
-Hi all,
-
-I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes about
-150s between beginning to run qemu command and beginng to boot Linux kernel
-("EFI stub: Booting Linux Kernel...").
-
-But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code
-and it finds c2445d387850 ("srcu: Add contention check to call_srcu()
-srcu_data ->lock acquisition").
-
-The qemu (qemu version is 6.2.92) command i run is :
-
-./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
---trace "kvm*" \
--cpu host \
--machine virt,accel=kvm,gic-version=3  \
--machine smp.cpus=2,smp.sockets=2 \
--no-reboot \
--nographic \
--monitor unix:/home/cx/qmp-test,server,nowait \
--bios /home/cx/boot/QEMU_EFI.fd \
--kernel /home/cx/boot/Image  \
--device 
-pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
-\
--device vfio-pci,host=7d:01.3,id=net0 \
--device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
--drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
--append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
--net none \
--D /home/cx/qemu_log.txt
-
-I am not familiar with rcu code, and don't know how it causes the issue. Do
-you have any idea about this issue?
-Please see the discussion here:
-https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
-Though that report requires ACPI to be forced on to get the
-delay, which results in more than 9,000 back-to-back calls to
-synchronize_srcu_expedited().  I cannot reproduce this on my setup, even
-with an artificial tight loop invoking synchronize_srcu_expedited(),
-but then again I don't have ARM hardware.
-
-My current guess is that the following patch, but with larger values for
-SRCU_MAX_NODELAY_PHASE.  Here "larger" might well be up in the hundreds,
-or perhaps even larger.
-
-If you get a chance to experiment with this, could you please reply
-to the discussion at the above URL?  (Or let me know, and I can CC
-you on the next message in that thread.)
-Ok, thanks, i will reply it on above URL.
-Thanx, Paul
-
-------------------------------------------------------------------------
-
-diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
-index 50ba70f019dea..0db7873f4e95b 100644
---- a/kernel/rcu/srcutree.c
-+++ b/kernel/rcu/srcutree.c
-@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
-#define SRCU_INTERVAL		1	// Base delay if no expedited GPs pending.
-#define SRCU_MAX_INTERVAL     10      // Maximum incremental delay from slow 
-readers.
--#define SRCU_MAX_NODELAY_PHASE 1       // Maximum per-GP-phase consecutive 
-no-delay instances.
-+#define SRCU_MAX_NODELAY_PHASE 3       // Maximum per-GP-phase consecutive 
-no-delay instances.
-  #define SRCU_MAX_NODELAY      100     // Maximum consecutive no-delay 
-instances.
-/*
-@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
-   */
-  static unsigned long srcu_get_delay(struct srcu_struct *ssp)
-  {
-+       unsigned long gpstart;
-+       unsigned long j;
-        unsigned long jbase = SRCU_INTERVAL;
-if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
-jbase = 0;
--       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
--               jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
--       if (!jbase) {
--               WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
--               if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
--                       jbase = 1;
-+       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
-+               j = jiffies - 1;
-+               gpstart = READ_ONCE(ssp->srcu_gp_start);
-+               if (time_after(j, gpstart))
-+                       jbase += j - gpstart;
-+               if (!jbase) {
-+                       WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
-+                       if (READ_ONCE(ssp->srcu_n_exp_nodelay) > 
-SRCU_MAX_NODELAY_PHASE)
-+                               jbase = 1;
-+               }
-        }
-        return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
-  }
-.
-
diff --git a/classification_output/04/other/92957605 b/classification_output/04/other/92957605
deleted file mode 100644
index fe9db1c7d..000000000
--- a/classification_output/04/other/92957605
+++ /dev/null
@@ -1,426 +0,0 @@
-other: 0.997
-semantic: 0.995
-device: 0.993
-socket: 0.993
-instruction: 0.993
-assembly: 0.992
-boot: 0.992
-network: 0.989
-graphic: 0.986
-KVM: 0.982
-vnc: 0.981
-mistranslation: 0.974
-
-[Qemu-devel] Fwd:  [BUG] Failed to compile using gcc7.1
-
-Hi all,
-I encountered the same problem on gcc 7.1.1 and found Qu's mail in
-this list from google search.
-
-Temporarily fix it by specifying the string length in snprintf
-directive. Hope this is helpful to other people encountered the same
-problem.
-
-@@ -1,9 +1,7 @@
----
---- a/block/blkdebug.c
--                 "blkdebug:%s:%s", s->config_file ?: "",
---- a/block/blkverify.c
--                 "blkverify:%s:%s",
---- a/hw/usb/bus.c
--        snprintf(downstream->path, sizeof(downstream->path), "%s.%d",
--        snprintf(downstream->path, sizeof(downstream->path), "%d", portnr);
---
-+++ b/block/blkdebug.c
-+                 "blkdebug:%.2037s:%.2037s", s->config_file ?: "",
-+++ b/block/blkverify.c
-+                 "blkverify:%.2038s:%.2038s",
-+++ b/hw/usb/bus.c
-+        snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d",
-+        snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr);
-
-Tsung-en Hsiao
-
->
-Qu Wenruo Wrote:
->
->
-Hi all,
->
->
-After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc.
->
->
-The error is:
->
->
-------
->
-CC      block/blkdebug.o
->
-block/blkdebug.c: In function 'blkdebug_refresh_filename':
->
->
-block/blkdebug.c:693:31: error: '%s' directive output may be truncated
->
-writing up to 4095 bytes into a region of size 4086
->
-[-Werror=format-truncation=]
->
->
-"blkdebug:%s:%s", s->config_file ?: "",
->
-^~
->
-In file included from /usr/include/stdio.h:939:0,
->
-from /home/adam/qemu/include/qemu/osdep.h:68,
->
-from block/blkdebug.c:25:
->
->
-/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11
->
-or more bytes (assuming 4106) into a destination of size 4096
->
->
-return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
->
-^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
->
-__bos (__s), __fmt, __va_arg_pack ());
->
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
->
-cc1: all warnings being treated as errors
->
-make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1
->
-------
->
->
-It seems that gcc 7 is introducing more restrict check for printf.
->
->
-If using clang, although there are some extra warning, it can at least pass
->
-the compile.
->
->
-Thanks,
->
-Qu
-
-Hi Tsung-en,
-
-On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote:
-Hi all,
-I encountered the same problem on gcc 7.1.1 and found Qu's mail in
-this list from google search.
-
-Temporarily fix it by specifying the string length in snprintf
-directive. Hope this is helpful to other people encountered the same
-problem.
-Thank your for sharing this.
-@@ -1,9 +1,7 @@
----
---- a/block/blkdebug.c
--                 "blkdebug:%s:%s", s->config_file ?: "",
---- a/block/blkverify.c
--                 "blkverify:%s:%s",
---- a/hw/usb/bus.c
--        snprintf(downstream->path, sizeof(downstream->path), "%s.%d",
--        snprintf(downstream->path, sizeof(downstream->path), "%d", portnr);
---
-+++ b/block/blkdebug.c
-+                 "blkdebug:%.2037s:%.2037s", s->config_file ?: "",
-It is a rather funny way to silent this warning :) Truncating the
-filename until it fits.
-However I don't think it is the correct way since there is indeed an
-overflow of bs->exact_filename.
-Apparently exact_filename from "block/block_int.h" is defined to hold a
-pathname:
-char exact_filename[PATH_MAX];
-but is used for more than that (for example in blkdebug.c it might use
-until 10+2*PATH_MAX chars).
-I suppose it started as a buffer to hold a pathname then more block
-drivers were added and this buffer ended used differently.
-If it is a multi-purpose buffer one safer option might be to declare it
-as a GString* and use g_string_printf().
-I CC'ed the block folks to have their feedback.
-
-Regards,
-
-Phil.
-+++ b/block/blkverify.c
-+                 "blkverify:%.2038s:%.2038s",
-+++ b/hw/usb/bus.c
-+        snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d",
-+        snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr);
-
-Tsung-en Hsiao
-Qu Wenruo Wrote:
-
-Hi all,
-
-After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc.
-
-The error is:
-
-------
- CC      block/blkdebug.o
-block/blkdebug.c: In function 'blkdebug_refresh_filename':
-
-block/blkdebug.c:693:31: error: '%s' directive output may be truncated writing 
-up to 4095 bytes into a region of size 4086 [-Werror=format-truncation=]
-
-                 "blkdebug:%s:%s", s->config_file ?: "",
-                              ^~
-In file included from /usr/include/stdio.h:939:0,
-                from /home/adam/qemu/include/qemu/osdep.h:68,
-                from block/blkdebug.c:25:
-
-/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11 or 
-more bytes (assuming 4106) into a destination of size 4096
-
-  return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
-         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-       __bos (__s), __fmt, __va_arg_pack ());
-       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-cc1: all warnings being treated as errors
-make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1
-------
-
-It seems that gcc 7 is introducing more restrict check for printf.
-
-If using clang, although there are some extra warning, it can at least pass the 
-compile.
-
-Thanks,
-Qu
-
-On 2017-06-12 05:19, Philippe Mathieu-DaudÃ© wrote:
->
-Hi Tsung-en,
->
->
-On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote:
->
-> Hi all,
->
-> I encountered the same problem on gcc 7.1.1 and found Qu's mail in
->
-> this list from google search.
->
->
->
-> Temporarily fix it by specifying the string length in snprintf
->
-> directive. Hope this is helpful to other people encountered the same
->
-> problem.
->
->
-Thank your for sharing this.
->
->
->
->
-> @@ -1,9 +1,7 @@
->
-> ---
->
-> --- a/block/blkdebug.c
->
-> -                 "blkdebug:%s:%s", s->config_file ?: "",
->
-> --- a/block/blkverify.c
->
-> -                 "blkverify:%s:%s",
->
-> --- a/hw/usb/bus.c
->
-> -        snprintf(downstream->path, sizeof(downstream->path), "%s.%d",
->
-> -        snprintf(downstream->path, sizeof(downstream->path), "%d",
->
-> portnr);
->
-> --
->
-> +++ b/block/blkdebug.c
->
-> +                 "blkdebug:%.2037s:%.2037s", s->config_file ?: "",
->
->
-It is a rather funny way to silent this warning :) Truncating the
->
-filename until it fits.
->
->
-However I don't think it is the correct way since there is indeed an
->
-overflow of bs->exact_filename.
->
->
-Apparently exact_filename from "block/block_int.h" is defined to hold a
->
-pathname:
->
-char exact_filename[PATH_MAX];
->
->
-but is used for more than that (for example in blkdebug.c it might use
->
-until 10+2*PATH_MAX chars).
-In any case, truncating the filenames will do just as much as truncating
-the result: You'll get an unusable filename.
-
->
-I suppose it started as a buffer to hold a pathname then more block
->
-drivers were added and this buffer ended used differently.
->
->
-If it is a multi-purpose buffer one safer option might be to declare it
->
-as a GString* and use g_string_printf().
-What it is supposed to be now is just an information string we can print
-to the user, because strings are nicer than JSON objects. There are some
-commands that take a filename for identifying a block node, but I dream
-we can get rid of them in 3.0...
-
-The right solution is to remove it altogether and have a
-"char *bdrv_filename(BlockDriverState *bs)" function (which generates
-the filename every time it's called). I've been working on this for some
-years now, actually, but it was never pressing enough to get it finished
-(so I never had enough time).
-
-What we can do in the meantime is to not generate a plain filename if it
-won't fit into bs->exact_filename.
-
-(The easiest way to do this probably would be to truncate
-bs->exact_filename back to an empty string if snprintf() returns a value
-greater than or equal to the length of bs->exact_filename.)
-
-What to do about hw/usb/bus.c I don't know (I guess the best solution
-would be to ignore the warning, but I don't suppose that is going to work).
-
-Max
-
->
->
-I CC'ed the block folks to have their feedback.
->
->
-Regards,
->
->
-Phil.
->
->
-> +++ b/block/blkverify.c
->
-> +                 "blkverify:%.2038s:%.2038s",
->
-> +++ b/hw/usb/bus.c
->
-> +        snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d",
->
-> +        snprintf(downstream->path, sizeof(downstream->path), "%.12d",
->
-> portnr);
->
->
->
-> Tsung-en Hsiao
->
->
->
->> Qu Wenruo Wrote:
->
->>
->
->> Hi all,
->
->>
->
->> After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with
->
->> gcc.
->
->>
->
->> The error is:
->
->>
->
->> ------
->
->>  CC      block/blkdebug.o
->
->> block/blkdebug.c: In function 'blkdebug_refresh_filename':
->
->>
->
->> block/blkdebug.c:693:31: error: '%s' directive output may be
->
->> truncated writing up to 4095 bytes into a region of size 4086
->
->> [-Werror=format-truncation=]
->
->>
->
->>                  "blkdebug:%s:%s", s->config_file ?: "",
->
->>                               ^~
->
->> In file included from /usr/include/stdio.h:939:0,
->
->>                 from /home/adam/qemu/include/qemu/osdep.h:68,
->
->>                 from block/blkdebug.c:25:
->
->>
->
->> /usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk'
->
->> output 11 or more bytes (assuming 4106) into a destination of size 4096
->
->>
->
->>   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
->
->>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
->
->>        __bos (__s), __fmt, __va_arg_pack ());
->
->>        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
->
->> cc1: all warnings being treated as errors
->
->> make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1
->
->> ------
->
->>
->
->> It seems that gcc 7 is introducing more restrict check for printf.
->
->>
->
->> If using clang, although there are some extra warning, it can at
->
->> least pass the compile.
->
->>
->
->> Thanks,
->
->> Qu
->
->
-signature.asc
-Description:
-OpenPGP digital signature
-
diff --git a/classification_output/04/other/95154278 b/classification_output/04/other/95154278
deleted file mode 100644
index d23c902d5..000000000
--- a/classification_output/04/other/95154278
+++ /dev/null
@@ -1,163 +0,0 @@
-other: 0.953
-device: 0.951
-graphic: 0.950
-vnc: 0.948
-assembly: 0.944
-instruction: 0.938
-semantic: 0.937
-KVM: 0.916
-socket: 0.913
-network: 0.913
-boot: 0.902
-mistranslation: 0.897
-
-[Qemu-devel] [BUG] checkpatch.pl hangs on target/mips/msa_helper.c
-
-If  checkpatch.pl is applied (using switch "-f") on file 
-target/mips/msa_helper.c, it will hang.
-
-There is a workaround for this particular file:
-
-These lines in msa_helper.c:
-
-        uint## BITS ##_t S = _S, T = _T;                            \
-        uint## BITS ##_t as, at, xs, xt, xd;                        \
-
-should be replaced with:
-
-        uint## BITS ## _t S = _S, T = _T;                           \
-        uint## BITS ## _t as, at, xs, xt, xd;                       \
-
-(a space is added after the second "##" in each line)
-
-The workaround is found by partial deleting and undeleting of the code in 
-msa_helper.c in binary search fashion.
-
-This workaround will soon be submitted by me as a patch within a series on misc 
-MIPS issues.
-
-I took a look at checkpatch.pl code, and it looks it is fairly complicated to 
-fix the issue, since it happens in the code segment involving intricate logic 
-conditions.
-
-Regards,
-Aleksandar
-
-On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote:
->
-If  checkpatch.pl is applied (using switch "-f") on file
->
-target/mips/msa_helper.c, it will hang.
->
->
-There is a workaround for this particular file:
->
->
-These lines in msa_helper.c:
->
->
-uint## BITS ##_t S = _S, T = _T;                            \
->
-uint## BITS ##_t as, at, xs, xt, xd;                        \
->
->
-should be replaced with:
->
->
-uint## BITS ## _t S = _S, T = _T;                           \
->
-uint## BITS ## _t as, at, xs, xt, xd;                       \
->
->
-(a space is added after the second "##" in each line)
->
->
-The workaround is found by partial deleting and undeleting of the code in
->
-msa_helper.c in binary search fashion.
->
->
-This workaround will soon be submitted by me as a patch within a series on
->
-misc MIPS issues.
->
->
-I took a look at checkpatch.pl code, and it looks it is fairly complicated to
->
-fix the issue, since it happens in the code segment involving intricate logic
->
-conditions.
-Thanks for figuring this out, Aleksandar.  Not sure if anyone else has
-the apetite to fix checkpatch.pl.
-
-Stefan
-signature.asc
-Description:
-PGP signature
-
-On 07/11/2018 09:36 AM, Stefan Hajnoczi wrote:
->
-On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote:
->
-> If  checkpatch.pl is applied (using switch "-f") on file
->
-> target/mips/msa_helper.c, it will hang.
->
->
->
-> There is a workaround for this particular file:
->
->
->
-> These lines in msa_helper.c:
->
->
->
->         uint## BITS ##_t S = _S, T = _T;                            \
->
->         uint## BITS ##_t as, at, xs, xt, xd;                        \
->
->
->
-> should be replaced with:
->
->
->
->         uint## BITS ## _t S = _S, T = _T;                           \
->
->         uint## BITS ## _t as, at, xs, xt, xd;                       \
->
->
->
-> (a space is added after the second "##" in each line)
->
->
->
-> The workaround is found by partial deleting and undeleting of the code in
->
-> msa_helper.c in binary search fashion.
->
->
->
-> This workaround will soon be submitted by me as a patch within a series on
->
-> misc MIPS issues.
->
->
->
-> I took a look at checkpatch.pl code, and it looks it is fairly complicated
->
-> to fix the issue, since it happens in the code segment involving intricate
->
-> logic conditions.
->
->
-Thanks for figuring this out, Aleksandar.  Not sure if anyone else has
->
-the apetite to fix checkpatch.pl.
-Anyone else but Paolo ;P
-http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01250.html
-signature.asc
-Description:
-OpenPGP digital signature
-